A failing Item blocks processing of further item while using multi threaded step [BATCH-1832] #1757
Labels
in: infrastructure
related-to: multi-threading
status: waiting-for-triage
Issues that we did not analyse yet
type: bug
Kamal Govindraj opened BATCH-1832 and commented
We have job configured that reads items from SQS queue and processes. We have setup a taskexecutor with a throttle limit to allow processing of multiple items in parallel. In a specific scenario the job stops picking up further items to process even though there are items in the queue and the number of threads in use is less than the throttle limit.
The following is the relevant snippet of configuration
On further investigation we found that this problem occurs only when processing of one of the item fails with an exception. The main thread blocks till processing of another item finishes. In our case processing of one item can take more than 30+ minutes - for this duration all the other threads are idle.
Stack trace:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:485)
org.springframework.batch.repeat.support.ResultHolderResultQueue.take(ResultHolderResultQueue.java:134)
org.springframework.batch.repeat.support.ResultHolderResultQueue.take(ResultHolderResultQueue.java:33)
org.springframework.batch.repeat.support.TaskExecutorRepeatTemplate.getNextResult(TaskExecutorRepeatTemplate.java:143)
org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250)
org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135)
I have investigated it further - the problem is either in the ResultHolderResultQueue.isContinuable or TaskExecutorRepeatTemplate.ExecutingRunnable.run method. The ResultHolderResultQueue doesn't work correctly when you put a Result with an error. The following test case replicates the issue
The bug can be fixed by modifying the ResultHolderResultQueue.isContinuable as follows
or by setting the result in the catch block of TaskExecutorRepeatTemplate.ExecutingRunnable.run method
I have attached a patch with the test case and two fixes - only one is required. I think the TaskExecutorRepeatTemplate is the correct place to fix it - with a check in ResultHolderResultQueue to disallow null result.
Affects: 2.1.8
Attachments:
2 votes, 2 watchers
The text was updated successfully, but these errors were encountered: