Enterprise Batch Processing with Jakarta Batch – Part 2

Jakarta EE

Continuing from where thelast blog post left off, let’s delve deeper into the intricacies of configuring the chunk in Jakarta Batch. As we’ve seen, a chunk represents a set of items to be processed as a batch. Now we will explore how to control this process, manage potential errors, and ensure efficient execution.

Configuring the Chunk: Size Matters

One of the critical configurations of a chunk is its size. The chunk size determines how many items the batch job processes before sending them to the writer. It’s essential to understand that the right chunk size can significantly impact the performance of your batch job. If the size is too small, you could encounter overhead inefficiencies. If it’s too large, memory constraints or transaction timeouts could become a problem.

The following XML snippet illustrates how you might specify a chunk size in your job XML:

<chunk checkpoint-policy="item" item-count="100">
   <reader ref="myItemReader" />
   <processor ref="myItemProcessor" />
   <writer ref="myItemWriter" />
</chunk>

In this example, item-count=”100″ specifies that the job processes 100 items before invoking the writer. Knowing the ideal chunk size will eventually come down to you measuring and finding out based on your workload.

Error Handling in Chunks

Error handling is another crucial aspect of chunk configuration. In batch processing, it’s not uncommon to encounter a situation where a particular item fails to process due to a data issue or a transient system error. Jakarta Batch provides mechanisms to handle such errors gracefully.

You can specify a skippable-exception-classes element in the chunk to define which exceptions should not cause the job to fail but rather skip the problematic item:

<chunk>
   <skippable-exception-classes>
       <include class="jakarta.persistence.NoResultException"/>
   </skippable-exception-classes>

</chunk>

In this setup, if a NoResultException is thrown, the item will be skipped, and the job will continue processing the next item.

Retrying After Failures

Sometimes, failures are not due to the item itself but rather temporary issues like a network outage. Jakarta Batch allows for retrying such items:

<chunk>
   <retryable-exception-classes>
       <include class="java.net.SocketTimeoutException"/>
   </retryable-exception-classes>
</chunk>

Here, if a SocketTimeoutException occurs, the job will retry processing the item before deciding it can’t be processed.

Checkpointing for Consistency

Checkpointing is a strategy to ensure that a job can recover from a failure without having to start over from the beginning. By default, the checkpoint occurs after each chunk (defined by the `item-count`). However, you can also use a custom checkpoint policy if your business logic requires it:

<chunk checkpoint-policy="custom" item-count="100">

</chunk>

This level of control can be crucial when dealing with large datasets where restarting a job from the beginning would be very costly in terms of time and resources.

Optimizing Performance

Lastly, consider the transactional behavior and the impact on performance. Using a persistent step-scoped or job-scoped data repository can minimise transaction times and optimise the performance of your batch job.

For instance, employing an in-memory database for intermediate processing steps can drastically reduce the I/O time, making the chunk processing much faster.

Summary

This blog post has taken a closer look at how to configure a chunk in Jakarta Batch. We’ve covered the importance of chunk size, error handling, retry logic, checkpointing, and performance optimization. Each of these aspects plays a vital role in creating an efficient, robust, and fault-tolerant batch job.

In the next instalment(coming next week!) we will discuss tasks, an alternative to chunks, and when to use each within your Jakarta Batch jobs. We’ll also explore the ways to monitor and manage the life cycle of a batch job for optimal operation. Stay tuned to take your Jakarta Batch skills to the next level!

Comments (2)

Post a comment

Your email address will not be published. Required fields are marked *

Payara needs the contact information you provide to us to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our Legal & Privacy Policy.

This site uses Akismet to reduce spam. Learn how your comment data is processed.

  1. Jan Nilsson

    It would be interesting to read about database transactions i JPA in combination with chunk size, checkpoints etc. Is there a best practice?

    1. Luqman Saeed

      Yes you are right. I have written migration batch jobs that imported north of 2million records. For such a dataset, we set the chunk size to 5 ultimately after a lot of testing to find the right balance between processing and DB connections. In the end, there it is always an “it depends” situation and only actual measuring can determine the right parameters.

Related Posts

Cut Jakarta EE Startup Times from Seconds to Milliseconds with CRaC 8 minutes
Jakarta EE

Cut Jakarta EE Startup Times from Seconds to Milliseconds with CRaC 

Jakarta EE applications can take anywhere from several seconds to over a minute to start, depending on their size […]

Stacked copies of the Payara developer guide “Zero Trust Architecture with Jakarta EE and MicroProfile” on an orange background, showing the dark blue cover design with the Payara logo and a laptop illustration featuring a shield and padlock icon. 4 minutes
Jakarta EE

Implementing Zero Trust Security with Jakarta EE: A Practical Guide

Zero Trust security has moved from buzzword to necessity. The principle is simple: never trust, always verify. But implementing […]

Blue background with coral and fish. Left text: 'MONTHLY CATCH'. Right: laptop screen with tech tabs and Payara Community logo. 4 minutes
Community

The Payara Monthly Catch – December 2025

As we kick off the new year, this January edition of The Monthly Catch looks back at everything that […]