IBM Sterling products unofficial blog

Database, IBM Sterling B2B Integrator, troubleshooting

Housekeeping jobs in Sterling B2B Integrator explained

B2B Integrator delete old records from correlation_set

Sterling B2B Integrator Database tables are maintained by several housekeeping jobs. In this blog, I will explain how these system processes work. I will also present some best practices and important KPI to avoid the Database unnecessary growth.

BP Index Service :

Called also the BP business process moving service, is responsible for indexing new BP archive information and calculates their archive_date.

Every time a BP is run, it will create two records in the ARCHIVE_INFO table with a NULL archive_date. This important housekeeping process will update the archive_date and the removal_method (purge or archive) of completed BPs.

Facts about the BP Index process:

  • Schedule name is ‘Schedule_IndexBusinessProcessService’.
  • Runs every 10 minutes by default.
  • Calculates the Lifespan and Removal Method for the BP based on the Workflow Definition.
  • Updates the records in ARCHIVE_INFO, setting the ARCHIVE_DATE equal to the Lifespan, and the ARCHIVE_FLAG equal to the Removal Method from WF_INST_S.
  • Lifespan is the date after which the process becomes eligible for the next service
  • Removal Method is either 0 (Archive/backup) or 1,2 (Purge)
  • If a BP fails Index, it is remarked with an ARCHIVE_FLAG of -5.
  • Index works only on BPs that are Completed or Terminated.
  • Mailbox documents will have an initial 10 years archive_date (lifespan).
  • Index updates archive_info records with 10 years lifespan when the lifespan is reset following a mailbox delete.

BP Index best practices :

  • BP Index  count should be always low (below 50000 records on average systems).
  • Adjust the schedule frequency and the max to process parameter in the Index BP. For example: run the schedule every 5 minutes with a doubled batch size of 20k on busy systems.
  • Investigate failing to index BPs (archive_flag -5).
  • Monitor archive.log for Index errors. You can activate DEBUG logging for a short period to measure the performance of Index operations.
  • Don’t ignore failing Index email alerts.

Index failures side effects:

  • High Index backlog affects the BP recovery performance. BP recovery is responsible for recovering BPs and it will either terminate, resume, restart failing or hanging BPs.
  • The Database BP transactional records in various tables (more than 50 tables) will NEVER be deleted if they are not marked by the Index (Risk of Database growth).

KPI : BP index count

Select count(*) from ARCHIVE_INFO where ARCHIVE_FLAG in (-1,-2,-5)

Linkage Purge

The Linkage Purge cleans the WORKFLOW_LINKAGE table. This table contains the parent-child  BP information .

The schedule runs once every night and need to run more frequently on loaded systems with an adjusted max BP if needed.

To adjust the number of records this housekeeping process deletes each run, change the Linkage_Purge BP source to include:

<assign to="max_business_processes">180000</assign>

KPI : workflow_linkage table count

Select count(*) from WORKFLOW_LINKAGE

If the count is too high, read this post: why is my WORKFLOW_LINKAGE table growing?

Purge Service

The Purge service is responsible for deleting expired Database records from transactional tables (more than 50 tables). It deletes also documents from the File System. The Purge schedule runs every 10 minutes by default.

The Purge process read the ARCHIVE_INFO table, and looks for Workflow IDs that have an ARCHIVE_FLAG of either 1 or 2, and an ARCHIVE_DATE of less than the current system date. It uses the WORKFLOW_ID to deletes all eligible BP records from various tables and FS.

By default it reads 2000 WF_IDs each time until all the eligible records are deleted.

Purge best practices:

  • Adjust your BPs lifespan and add a number of hours to your number of days retention period (default is usually a number of days). This little-known trick will help you avoid doing big Database deletes during your peak time. Adding a number of hours for example: 12 hours will make the records created at 7 am eligible for purge after the lifespan number of days not at 7 am (Peak period) but at 7 pm (7+12=19).
  • Use SQL queries to understand your Database and more particularly Purge related KPIs.
  • Split you Documents on Disk (dod) by day/hour/minute to make the FS deletes faster.
  • Use the High Performance Purge (called also External Purge) read the following section.

SQL for purge counts :

select ARCHIVE_FLAG , count(*) from ARCHIVE_INFO group by ARCHIVE_FLAG

SELECT ARCHIVE_FLAG,GROUP_ID, MIN(ARCHIVE_DATE) , MAX(ARCHIVE_DATE)   FROM ARCHIVE_INFO where ARCHIVE_FLAG >=0 GROUP BY ARCHIVE_FLAG, GROUP_ID

External Purge

Called also high performance Purge, it runs in a separate JVM, Multithreaded and can run when B2B Integrator is down and the database is available.

External Purge is not enabled by default and should run on one cluster node only. You need to disable the default Purge service when running the external purge.

Use control_extpurge.sh start / stop to start/stop the Purge external process.

To monitor check Install/logs/PURGE and the lock manager in B2B Integrator will contain locks called : Hpp.Purge and PURGE_SERVICE.

AssociateBPsToDocs

WE saw earlier that the Purge process will delete records based on their WF_ID. How about records with WF_ID of 0 and -1. We call these unassociated documents.

The Schedule_AssociateBPsToDocs is responsible for associating expired and deleted documents/records with WF_ID of 0 and -1 and mark them to be eligible for Purge. this is a very important little-known house keeping process.

It looks through the DOCUMENT table for eligible records with workflow_id 0 OR -1 that have a reset Document lifespan (such as deleted mailbox messages, OR documents without workflow_id or data_flow records etc). It use the complex getUnassociatedDocsWithDocLifespan query to find unassociated documents. Adter that, the process will updates their WF_ID from 0 or -1 to the associateBPsToDocs workflow_id (associate them) so the Purge will delete them later after some time.

Let me explain this point : for example, if the Schedule_AssociateBPsToDocs is set with a WF_ID of ‘12345’, then these unassociated Documents with a WF_ID of ‘0’,‘-1’ will have their WF ID updated to ‘12345’ . Then the Purge will read the archive_info table and find this WF_ID 12345 with an archive_date which is the lifespan of the AssociateBPsToDocs BP.

When this archive_date is due, the Purge will delete the flagged records in the following tables: DOCUMENT, DOCUMENT_LIFESPAN, DOCUMENT_EXTENSION, TRANS_DATA, CORRELATION_SET.

The 10 years default document lifespan

We cannot talk about the Associate process without talking about the famous B2B Integrator 10 years lifespan!

When a Mailbox message is received, it is stored in that Mailbox until either the lifespan expires or the Extractable Count reaches 0.

To prevent BPs and the associated Documents for these Mailbox messages from being Purged, the lifespan is set to 10 years plus the normal BP lifespan by the Index process.

Once the Mailbox message expires or if it’s Extractable count reaches 0, the system then marks the process to be re-Indexed.

The Index process then takes the BPs that are marked to be re-Indexed and then calculates the proper Removal Method and Lifespan, and updates the records in ARCHIVE_INFO with this information.

The Associate process will act when mailbox documents are deleted using a mailbox delete job.

AssociateBPsToDocs best practices

  • Debug/trace in archive.log
  • Reduce the default 2 days waiting time on busy systems.

jdbcService.docWithWFID_1WaitTime

  • Reduce the associate lifespan from default lifespan to a few hours on busy systems.
  • Adjust max2process in the AssociateBPsToDocs BP:

<assign to="max2process">10000</assign>

KPI for AssociateBPsToDocs

SQL taken from jdbcService.query.getUnassociatedDocsWithDocLifespan.oracle :

SELECT count(distinct D.DOC_ID) FROM DOCUMENT D, DOCUMENT_LIFESPAN DL WHERE D.DOC_ID = DL.DOC_ID(+) and not exists (select * from document_lifespan dl2 where dl2.doc_id = d.doc_id and (dl2.workflow_id > 0 or (D.CREATE_TIME + (DL2.LIFE_SPAN / 1440)) >= SYSDATE)) AND (D.WORKFLOW_ID = -1 OR D.WORKFLOW_ID = -2 OR D.WORKFLOW_ID = 0) and ((D.CREATE_TIME + (DL.LIFE_SPAN / 1440)) < SYSDATE OR dl.DOC_ID IS NULL) and D.CREATE_TIME < SYSDATE -2 AND NOT EXISTS (SELECT * FROM WORKFLOW_CONTEXT WC WHERE WC.DOC_ID=D.DOC_ID)

Backup Service (Archive)

The Schedule_BackupService runs every morning at 2:00 AM by default. It archives data that has an ARCHIVE_FLAG of 0 in ARCHIVE_INFO and uses the records in WF_INST_S to calculate the eligible data for the backup. Once Archived to ARC_DATA directory by default, the ARCHIVE_FLAG is changed from 0 to 1 or 2  to indicate the BP is now ready for Purge.

If your BPs’ removal method is set to Backup or Archive, then its related records will NOT be Purged from the Database and FS until they are archived successfully.

Archive debug and errors are written to archive.log

Finally, understanding house keeping jobs in B2B Integrator and its Database important tables with the help of SQL queries will make managing your Application Database much easier.

Leave a Reply

We’ve detected that you’re using an ad blocker. Please disable it to support our website.