Exceptions during chunk commit are silently ignored #1189

spring-projects-issues · 2015-08-07T08:38:32Z

Erwin Vervaet opened BATCH-2415 and commented

A TaskletStep uses various helpers to loop over all chunks to be processed and process each of them in a
separate transaction while managing things such as skippable and retryable exceptions.
In pseudo code this whole system can be summarized as follows:

while (more chunks) { // RepeatTemplate usage in TaskletStep.doExecute
	try { // RepeatTemplate.executeInternal

		try { // StepContextRepeatCallback.doInIteration
			chunkContext = queue.poll()

			{ // TaskletStep: StepContextRepeatCallback.doInChunkContext
				try { // TransactionTemplate.execute
					{ // TaskletStep: ChunkTransactionCallback.doInTransaction
						{ // ChunkOrientedTasklet.execute
							chunk = chunkProvider.provide()
							chunkProcessor.process(chunk) [A]
							chunkContext.setComplete()
							return RepeatStatus.continueIf(!chunk.isEnd())
						}
					}
				} catch (Exception e) {
					rollback()
				}
				commit() [B]
			}

		} finally {
			if (!chunkContext.isComplete()) {
				queue.add(chunkContext)
			}
		}

	} catch (Exception e) {
		{ // SimpleRetryExceptionHandler.handleException
			if (exception is fatal) {
				throw e;
			}
		}
	}
}

This code exhibits the following type of behavior:

If no exceptions occur during chunk processing, the chunk is marked as complete and the transaction
commits. Since the chunk its complete it is not pushed back on the queue and the repeat loop ends normally.
A non-fatal (i.e. retryable) exception that is thrown while processing the chunk (point [A])
will cause the chunk not to be marked complete and the transaction to rollback. Since the chunk
is not complete its pushed back onto the queue of chunks to process. Furthermore, since the
exception is not fatal it will be ignored by the RepeatTemplate and a new iteration will start, actually doing the retry.

So far so good. However, we're experiencing the following case:

No exceptions occur during chunk processing. Consequently the chunk is marked as complete.
Now an exception is thrown during transaction commit (point [B]). Since the chunk is already marked as
complete the chunk is not pushed back on the queue and no retry is attempted. The commit failure
caused correct chunk transaction rollback. However, since the exception is not deemed to be fatal
it is silently ignored by the RepeatTemplate! End result: the batch completes normally but certain chunks were silently skipped!

The exception we get is (see also BATCH-2403):

2015-08-06 13:53:04.440 DEBUG o.s.b.c.s.i.SimpleRetryExceptionHandler - Handled non-fatal exception
javax.persistence.PersistenceException: org.hibernate.exception.LockAcquisitionException: could not execute statement
	at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1387) ~[hibernate-entitymanager-4.2.12.Final.jar:4.2.12.Final]
	at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1310) ~[hibernate-entitymanager-4.2.12.Final.jar:4.2.12.Final]
	at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1316) ~[hibernate-entitymanager-4.2.12.Final.jar:4.2.12.Final]
	at org.hibernate.ejb.AbstractEntityManagerImpl$CallbackExceptionMapperImpl.mapManagedFlushFailure(AbstractEntityManagerImpl.java:1510) ~[hibernate-entitymanager-4.2.12.Final.jar:4.2.12.Final]
	at org.hibernate.engine.transaction.synchronization.internal.SynchronizationCallbackCoordinatorNonTrackingImpl.beforeCompletion(SynchronizationCallbackCoordinatorNonTrackingImpl.java:110) ~[hibernate-core-4.2.12.Final.jar:4.2.12.Final]
	at org.hibernate.engine.transaction.synchronization.internal.RegisteredSynchronization.beforeCompletion(RegisteredSynchronization.java:53) ~[hibernate-core-4.2.12.Final.jar:4.2.12.Final]
	at bitronix.tm.BitronixTransaction.fireBeforeCompletionEvent(BitronixTransaction.java:478) ~[btm-2.1.2.jar:2.1.2]
	at bitronix.tm.BitronixTransaction.commit(BitronixTransaction.java:193) ~[btm-2.1.2.jar:2.1.2]
	at bitronix.tm.BitronixTransactionManager.commit(BitronixTransactionManager.java:120) ~[btm-2.1.2.jar:2.1.2]
	at org.springframework.transaction.jta.JtaTransactionManager.doCommit(JtaTransactionManager.java:1012) ~[spring-tx-3.2.9.RELEASE.jar:3.2.9.RELEASE]
	at org.springframework.transaction.support.AbstractPlatformTransactionManager.processCommit(AbstractPlatformTransactionManager.java:755) ~[spring-tx-3.2.9.RELEASE.jar:3.2.9.RELEASE]
	at org.springframework.transaction.support.AbstractPlatformTransactionManager.commit(AbstractPlatformTransactionManager.java:724) ~[spring-tx-3.2.9.RELEASE.jar:3.2.9.RELEASE]
	at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:148) ~[spring-tx-3.2.9.RELEASE.jar:3.2.9.RELEASE]
	at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:267) ~[spring-batch-core-2.2.7.RELEASE.jar:na]
	at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:77) ~[spring-batch-core-2.2.7.RELEASE.jar:na]
	at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:368) [spring-batch-infrastructure-2.2.7.RELEASE.jar:na]
...
Caused by: org.hibernate.exception.LockAcquisitionException: could not execute statement
	at org.hibernate.dialect.MySQLDialect$1.convert(MySQLDialect.java:412) ~[hibernate-core-4.2.12.Final.jar:4.2.12.Final]
...
Caused by: com.mysql.jdbc.exceptions.jdbc4.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[na:1.8.0_05]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[na:1.8.0_05]
...

I fail to see how that can be desired behavior. Or am I just missing something here?

Affects: 2.2.7, 3.0.4

Attachments:

debug.log (399.30 kB)
step_succeeds_when_commit_fails.patch (4.33 kB)

4 votes, 9 watchers

The text was updated successfully, but these errors were encountered:

spring-projects-issues · 2015-08-07T08:48:41Z

Michael Minella commented

Does this issue persist in the 3.0.x branch? 2.2.x is no longer maintained.
How is this issue different from BATCH-2403 which you point out?

spring-projects-issues · 2015-08-07T09:58:26Z

Erwin Vervaet commented

Michael Minella,

We're still on 2.2.7 so I haven't verified that yet. I'll upgrade and test this next week.
BATCH-2403 is ultimately about the noRetry and noSkip methods on FaultTolerantStepBuilder. This ticket is about a logic path that can lead to a situation where Spring Batch rolls back a chunk and silently ignores the exception that caused the rollback, ultimately marking the batch as having completed successfully while some chunks were actually skipped.

spring-projects-issues · 2015-08-08T06:22:55Z

Albert Strasser commented

Hi Erwin,

You might be correct that we are looking at 2 different issues here. My assumption was that just checking if an exception was fatal (last catch block of your example) seemed to me as a wrong implementation of skipping (which should default to false for any exception, not only for fatal ones). But you could be right that at this point we are already beyond the concept of skipping and therefore this would mean that we look at a major bug in handling the status of a chunk.

I can assure that this issue does persist in the 3.0.x branch as we are using 3.0.4 and experience exactly the same behaviour.

spring-projects-issues · 2015-08-10T09:55:31Z

Erwin Vervaet commented

I can now also confirm that the problem is reproducible with Spring Batch 3.0.4.
Attached you can find DEBUG level Spring Batch logging for an occurrence of the problem.

The logging is from two 50-item batches (documents_cuba_RLBTST17.zip and documents_cuba_RLBTST16.zip) that run concurrently. The system first picks up the documents_cuba_RLBTST17.zip batch and about 10 seconds later documents_cuba_RLBTST16.zip is also picked up.

Both batches complete successfully.

documents_cuba_RLBTST17.zip incorrectly skipped 2 items. The STEP_EXECUTION table contains the following (no skips):

STATUS	COMMIT_COUNT	READ_COUNT	WRITE_COUNT	ROLLBACK_COUNT
COMPLETED	49	50	48	2

documents_cuba_RLBTST16.zip incorrectly skipped 12 items. The STEP_EXECUTION table contains the following (no skips):

STATUS	COMMIT_COUNT	READ_COUNT	WRITE_COUNT	ROLLBACK_COUNT
COMPLETED	39	50	38	12

spring-projects-issues · 2015-08-13T07:31:43Z

Erwin Vervaet commented

As a temporary work-around, we're now forcing a transaction flush (using EntityManager.flush()) from inside our ItemProcessor. This way Spring Batch will correctly retry or fail the batch if retries are exhausted.

Obviously this is not a real fix since a transaction commit might still fail for other reasons.

spring-projects-issues · 2015-09-03T14:00:12Z

Erwin Vervaet commented

Michael Minella,
Can you given any feedback related to this issue?
Given that if confirmed this is a critical bug I would say this has some priority.

spring-projects-issues · 2015-10-26T18:07:45Z

Andrei Amariei commented

Also affected by this issue.
I attached [^step_succeeds_when_commit_fails.patch] that adds a failing test to spring-batch-samples module, maybe it helps moving it forward.

spring-projects-issues · 2015-10-26T18:56:08Z

Erwin Vervaet commented

Thanks for that Andrei Amariei!

spring-projects-issues · 2015-10-27T10:38:51Z

Andrei Amariei commented

One concern I have with flushing is that it can add side effects to an unsuccessful write phase.

For example:

given inputs are entities and hibernate's Session.saveOrUpdate(item) is used (as in HibernateItemWriter), flushing will increase the version field before throwing any exception.
therefore after rollback, when attempting to process the items individually, writing will fail with StaleObjectStateException.
the same can also happen in commit phase, so maybe a way to skip all items in the chunk (instead of retrying) in case of write/commit failure would help.

About the attached test, maybe it's wrong to expect the step to fail (even if all the items are skipped), and it would be better to instead have a field StepExecution.commitSkipCount and a method SkipListener.onSkipInCommit() (the method can be placed in a new interface to keep backward compatibility). Similar for retry rules.

spring-projects-issues · 2015-11-06T21:54:30Z

Erwin Vervaet commented

Andrei Amariei, you're right, the flush is just a hack. In our application it effectively fixes the problem: we've not seen any new occurrences in our production system. But I fully agree that it's not a real solution that applies generally.

spring-projects-issues · 2015-12-07T08:45:09Z

Erwin Vervaet commented

Michael Minella, still no input on this?

spring-projects-issues · 2016-01-29T13:55:12Z

Erwin Vervaet commented

still no input on this?
I guess not...

fmbenhassine · 2023-09-21T12:56:56Z

Failing test with 5.0.3 (to add in ChunkOrientedStepIntegrationTests):

@Test
public void faultTolerantStepShouldFailWhenCommitFails() throws Exception {
	StepBuilder stepBuilder = new StepBuilder("step", jobRepository);
	FaultTolerantStepBuilder<String, String> faultTolerantStepBuilder = new FaultTolerantStepBuilder<>(stepBuilder);
	faultTolerantStepBuilder.transactionManager(transactionManager);
	faultTolerantStepBuilder.reader(getReader(new String[] { "a", "b", "c" }));
	faultTolerantStepBuilder.writer(data -> TransactionSynchronizationManager
			.registerSynchronization(new TransactionSynchronizationAdapter() {
				@Override
				public void beforeCommit(boolean readOnly) {
					throw new RuntimeException("Simulate commit failure");
				}
			}));
	step = faultTolerantStepBuilder.build();

	JobParameters jobParameters = new JobParameters();
	JobExecution jobExecution = jobRepository.createJobExecution(job.getName(), jobParameters);
	StepExecution stepExecution = new StepExecution(step.getName(), jobExecution);

	jobRepository.add(stepExecution);
	step.execute(stepExecution);

	Assert.assertEquals(BatchStatus.FAILED, stepExecution.getStatus());
}

Will be reviewed in #3950 .

renannprado · 2024-07-31T04:13:04Z

Is there a workaround for this issue?

tanjooxiang · 2024-08-29T09:15:10Z

Hello. facing the same issue. is there any workaround currently?

spring-projects-issues added type: bug status: waiting-for-triage Issues that we did not analyse yet in: core labels Dec 16, 2019

fmbenhassine mentioned this issue Jan 14, 2021

Fault tolerant steps may ignore chunks silently #3828

Closed

fmbenhassine added the related-to: transaction-management label Jan 26, 2021

fmbenhassine changed the title ~~Exceptions during chunk commit are silently ignored [BATCH-2415]~~ Exceptions during chunk commit are silently ignored Mar 15, 2021

fmbenhassine mentioned this issue Mar 23, 2021

BATCH-1767: fix optimistic locking exception in multi-threaded step #591

Closed

fmbenhassine mentioned this issue Jun 25, 2021

Revisit the chunk-oriented processing model implementation #3950

Open

3 tasks

git9999999 mentioned this issue May 16, 2023

Improve Documentation if an ItemWriter is used with a Database connection #4378

Closed

fmbenhassine removed the status: waiting-for-triage Issues that we did not analyse yet label Sep 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Exceptions during chunk commit are silently ignored #1189

Exceptions during chunk commit are silently ignored #1189

spring-projects-issues commented Aug 7, 2015

spring-projects-issues commented Aug 7, 2015

spring-projects-issues commented Aug 7, 2015

spring-projects-issues commented Aug 8, 2015

spring-projects-issues commented Aug 10, 2015

spring-projects-issues commented Aug 13, 2015

spring-projects-issues commented Sep 3, 2015

spring-projects-issues commented Oct 26, 2015

spring-projects-issues commented Oct 26, 2015

spring-projects-issues commented Oct 27, 2015

spring-projects-issues commented Nov 6, 2015

spring-projects-issues commented Dec 7, 2015

spring-projects-issues commented Jan 29, 2016

fmbenhassine commented Sep 21, 2023

renannprado commented Jul 31, 2024

tanjooxiang commented Aug 29, 2024

Exceptions during chunk commit are silently ignored #1189

Exceptions during chunk commit are silently ignored #1189

Comments

spring-projects-issues commented Aug 7, 2015

spring-projects-issues commented Aug 7, 2015

spring-projects-issues commented Aug 7, 2015

spring-projects-issues commented Aug 8, 2015

spring-projects-issues commented Aug 10, 2015

spring-projects-issues commented Aug 13, 2015

spring-projects-issues commented Sep 3, 2015

spring-projects-issues commented Oct 26, 2015

spring-projects-issues commented Oct 26, 2015

spring-projects-issues commented Oct 27, 2015

spring-projects-issues commented Nov 6, 2015

spring-projects-issues commented Dec 7, 2015

spring-projects-issues commented Jan 29, 2016

fmbenhassine commented Sep 21, 2023

renannprado commented Jul 31, 2024

tanjooxiang commented Aug 29, 2024