Spring Batch Partition for Scaling & Parallel Processing

For Scaling & Parallel Processing, Spring Batch provides various solutions: Multi-threaded Step, Parallel Steps, Remote Chunking of Step & Partitioning a Step. In the tutorial, JavaSampleApproach will introduce Partitioning a Step cleary by a sample project.

Related articles:
Spring Batch Job with Parallel Steps
How to use Spring Batch Late Binding – Step Scope & Job Scope


I. SPRING BATCH PARTITION

Spring Batch provides an solution for partitioning a Step execution by remotely or easily configuration for local processing.

How it work?

Partitional

The Job in left hand is executed sequentially, Master step is partitioning step that has some Slave steps. Slave steps can be remote services or local threads.

For configuring partitioning step, Spring Batch provies PartitionHandler component & Partitioner interface.

1. PartitionHandler

The componenet PartitionHandler knows about the kind of remote services(RMI remoting, EJB remoting,… or local threads) or grid numbers. PartitionHandler can send StepExecution requests to the remote Steps, in various format, like a DTO.

How to configure?

The gridSize defines the number of step executions, so we should to consider the size of TaskExecutor’s thread pool.

2. Partitioner

The Partitioner interface is used to build execution contexts as input parameters for step executions.

Map contains a unique name for each step execution that associated with ExecutionContext’s value.

3. How to Binding input Map to Steps

StepScope feature of Spring Batch can help us to late binding data from PartitionHandler to each step at runtime.
See more: How to use Spring Batch Late Binding – Step Scope & Job Scope

II. PRACTICE

In the tutorial, we create a Batch Job that has only one partition step with 5 slave steps for inserting data from 5 csv files to MySql database.
Spring batch Partition - overview

Technologies

– Java 1.8
– Maven 3.3.9
– Spring Tool Suite – Version 3.8.1.RELEASE
– Spring Boot: 1.5.1.RELEASE
– MySQL Database 1.4

Step to do

– Create Spring Boot project
– Create a simple model
– Create DAO class
– Create Batch Job Step

– Create Batch Job Partitioner
– Configure Partitional Batch Job

– Create JobLaunchController
– Create 5 csv files
– Run & Check results

1. Create Spring Boot project

Create a Spring Boot project with needed dependencies:
– spring-boot-starter-batch
– spring-boot-starter-web
– mysql-connector-java

2. Create a simple model

3. Create DAO class

4. Create Batch Job Step

– Create Reader.java:

– Create Writer.java:

– Create Processor.java:

5. Create Batch Job Partitioner

6. Configure Partitional Batch Job

– Create a batchjob.xml configuration file:

– Open application.properties file, configure DataSource info:

– In main class, enable batch job proccessing:
@EnableBatchProcessing
@ImportResource("classpath:batchjob.xml")

7. Create JobLaunchController

8. Create 5 csv files

Create 5 csv files {customer-data-1.csv, customer-data-2.csv, customer-data-3.csv, customer-data-4.csv, customer-data-5.csv}
With Customer’s info:

9. Run & Check results

– Build & Run the project with Spring Boot App mode.
– Create a database’s table with SQL:

– Then makes a launch request: localhost:8080/runjob
– Result:

Spring batch Partition - result

III. SOURCE CODE

SpringBatchPartitioning


Related Posts


4 thoughts on “Spring Batch Partition for Scaling & Parallel Processing”

  1. If we have 100 customer-data-*.csv files and if we use grid-size = 100 then there will be 100 threads which is not logical. Hence what is the way to solve this issue i.e. process many files with limited number of threads.

    1. Hi Arpit Garg,

      The number of thread depends on your infrastructure, So you can configure them. The heart of tutorial is how to use PartitionHandler. And understand the Master-Slave steps.
      Your problems can have some ways to resolve:
      – Use 10 threads to process 100 files, each thread processes 10 files.
      – Use 5 threads to process 100 files, each thread handles 20 files.

Got Something To Say:

Your email address will not be published. Required fields are marked *

*