You are here
Home > java >

Spring Batch Example CSV to MySQL: JPA Batch Insert

Spring Boot Batch Example CSV to MySQL Using JPASometimes in real time projects, we need to transfer data from one location to another. If the volume of data is small, we can achieve this by applying any traditional approach. On the other hand, if there is a huge amount of data, we can make use of the Spring Batch API to make the transfer of data faster and performant. In this article ‘Spring Batch Example CSV to MySQL Using JPA’, we are going to transfer the data from CSV file to MySQL database using Spring Boot Batch.

We have already discussed theoretical concepts of Spring Batch in a separate article ‘Spring Batch Tutorial‘. Now, its time to implement our example ‘Spring Batch Example CSV to MySQL Using JPA’.

What will you be expecting from this Article?

After going through all the details in this article, you will be able to answer the following:

1) Why is this example so important to implement?

2) What are the technologies/software used to develop this example?

3) How to read records from a CSV file using FlatFileItemReader?

4) How to transform/process records using ItemProcessor?

5) How to write records into database using ItemWriter and Spring Data JPA?

6) How to use JobListener interface provided by Spring Batch?

7) How to launch a Spring Batch job using Runner?

8) How to test and verify that the records are inserted into the database successfully?

9) How to troubleshoot problems faced during the development of the example?

10) How to minimize the number of lines of code using Lambda Expressions?

11) Last but not the least, How to develop ‘Spring Batch Example CSV to MySQL Using Spring Boot & JPA’?

Why is this example so important?

In real time projects it is very common that the client provides you some data to insert in the application. Most of the times you will be provided with a CSV file filled with the huge amount of data. Now, if you have to insert data in the application, it means you need to insert the same in the database. Even sometimes in some applications it is a continuous real time process to insert data from CSV file to the database, may be using a scheduler.

Why to use Spring Batch in this Example?

Spring Batch supports scalable and parallel processing of data, which is important when processing large amount of data. This ensures improved performance and powerful processing, particularly when migrating large amounts of data from a CSV file to a MySQL database. Spring Batch offers built-in support for reading and writing data in chunks, which is beneficial for processing large datasets efficiently. It offers item readers and writers that can be configured to read data from a CSV file and write it to a MySQL database. Spring Batch is particularly designed for batch processing that makes it a normal choice for handling large volumes of data.

Software/Technologies Used in the Example

Sometimes some version conflicts with other version. Hence, listing down the combinations that are proven to be working with each other. Below is the proven combination of software that are used to develop these examples.

1) CSV (Microsoft® Excel® 2019)
2) Spring Boot 2.6.0
3) JDK 1.8 or later
4) Lombok 1.18.22
5) Spring Data JPA 2.6.0
6) spring-boot-starter-batch 2.6.0
7) Spring Framework 5.3.13
8) Maven 3.8.1
9) IDE – STS 4.7.1.RELEASE (Starters: ‘Spring Batch’, ‘MySQL Driver’, ‘Spring Data JPA’, Lombok)

Spring Batch Example CSV to MySQL Using JPA

In order to implement the Spring Boot Batch Example CSV to MySQL Using JPA using Spring Batch concept, follow the step by step tutorial given as below.

Details of the Use case 

Let’s assume that we have a CSV file filled with huge volume of data. On the other hand, we have a MySQL database. We need to read the data from CSV, perform some data transformation, and then transfer the data to MySQL. Furthermore, for the database communication we will use Spring Data JPA. Our CSV file will contain data about Invoices such as invoice name, invoice number, invoice amount, discount, location etc. In order to implement data transformation, we will calculate the final amount on the basis of discount applied to the amount. Let’s start implementing our use case Spring Boot Batch Example CSV to MySQL Using JPA step by step.

Step #1: Create a Spring Boot Project

Here, we will use STS(Spring Tool Suite) to create our Spring Boot Project. If you are new to Spring Boot, visit the internal link to create a sample project in spring boot. While creating a project in STS, add starters ‘Spring Batch’, ‘MySQL Driver’, ‘Spring Data JPA’, and ‘Lombok’. However, we will use Lombok in order to generate boilerplate code automatically such as getters, setters, constructors, toString() etc. If you are not using Lombok, you can even write the required code manually.

Step #2: Place your CSV file

Generally, there are three places where we can place our CSV file as shown below. We call this location as Resource Location.

1) In Project Class Path: src/main/resources (ClasspathResource)

2) In local directory: D://mydata (FileSystemResource)

3) In Remote location: https://xyz.com/files/… (UrlResource)

In our case, we will use the first location ‘Project Class Path’ which is under ‘src/main/resources’ folder in our STS. If you already have a CSV file, you can place it here. Otherwise, create a file with extension .csv at the same location and enter some data into it. It’s “invoices.csv” for our case. Below is the sample of “invoices.csv”.

SpringBatch_CSV_Sample

Step #3: Modify application.properties file

Add below properties in your application.properties file.

#Server
server.port=8080
#Database Connection
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.url=jdbc:mysql://localhost:3306/batchdb
spring.datasource.username=root
spring.datasource.password=devs

#----------ORM Details-------------------
#To display SQL At console
spring.jpa.show-sql=true
#To Create tables
spring.jpa.hibernate.ddl-auto=create
#To Generate SQL queries
spring.jpa.database-platform=org.hibernate.dialect.MySQL8Dialect

#----------Spring Batch Properties----------
# By default it's true which means all the Spring batches will start executing automatically
spring.batch.job.enabled=false
# Tables for metadata created by Spring Boot (Always, Embedded, Never)
spring.batch.jdbc.initialize-schema=ALWAYS

As shown above, we are  using Spring Data JPA Concept to save data into MySQL. All properties are already commented.

Step #4: Create an entity class as Invoice.java

As aforementioned, we will have invoice related data in our CSV, we need to create a model/entity class invoice.java accordingly as shown below. Here, we have used Lombok to generate getters, setters, constructors etc.

import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.Id;
import lombok.AllArgsConstructor;
import lombok.Data;
import lombok.NoArgsConstructor;

@Entity
@Data
@NoArgsConstructor   
@AllArgsConstructor
public class Invoice {

   @Id
   @GeneratedValue
   private Long id;
   private String name;
   private String number;
   private Double amount;
   private Double discount;
   private Double finalAmount;
   private String location;
}

Step #5: Create an interface for repository as InvoiceRepository.java

In order to save data into database, create a repository interface InvoiceRepository.java which will extend JpaRepository interface as shown below.

import org.springframework.data.jpa.repository.JpaRepository;
Import com.dev.springboot.entity.Invoice;

public interface InvoiceRepository extends JpaRepository<Invoice, Long>{

}

Step #6: Create an optional listener class as InvoiceListener.java

Let’s create a class InvoiceListenet.java which will implement JobExecutionListener. It has two methods beforeJob() and afterJob() overwritten from the interface JobExecutionListener as shown below. This is an optional class to create, if you want to execute some logic before and after the job execution.

import org.springframework.batch.core.JobExecution;
import org.springframework.batch.core.JobExecutionListener;

public class InvoiceListener implements JobExecutionListener{

    @Override
    public void beforeJob(JobExecution jobExecution) {

       System.out.println("Job started at: "+ jobExecution.getStartTime());
       System.out.println("Status of the Job: "+jobExecution.getStatus());
    }

    @Override
    public void afterJob(JobExecution jobExecution) {

       System.out.println("Job Ended at: "+ jobExecution.getEndTime());
       System.out.println("Status of the Job: "+jobExecution.getStatus());
    }

}

Step #7: Create a configuration class as BatchConfig.java

This is a configuration class where we create multiple objects that are required to perform batch processing. This is an important class for our Spring Boot Batch Example CSV to MySQL Using JPA. These objects are listed below:

1) ItemReader

2) ItemWriter

3) ItemProcessor

4) JobListener (Optional)

5) Step

6) Job

As shown above the JobListener is an optional object. All other 5 objects are necessary to implement the Spring batch concept. Furthermore, we have used 3 autowiring in order to support the creation of objects: InvoiceRepository to create ItemWriter, StepBuilderFactory to create Step, and JobBuilderFactory to create Job respectively.

Note: Don’t forget to apply 2 annotations in this class: @Configuration and @EnableBatchProcessing

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobExecutionListener;
import org.springframework.batch.core.Step;
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing;
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory;
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory;
import org.springframework.batch.core.launch.support.RunIdIncrementer;
import org.springframework.batch.item.ItemProcessor;
import org.springframework.batch.item.ItemWriter;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import com.dev.springboot.entity.Invoice;
import com.dev.springboot.listener.InvoiceListener;
import com.dev.springboot.repo.InvoiceRepository;

@Configuration
@EnableBatchProcessing
public class BatchConfig {

    //Reader class Object
    @Bean
    public FlatFileItemReader<Invoice> reader() {

       FlatFileItemReader<Invoice> reader= new FlatFileItemReader<>();
       reader.setResource(new ClassPathResource("/invoices.csv"));
       // Reader. setResource(new FileSystemResource("D:/mydata/invoices.csv"));
       // reader.setResource(new UrlResource("https://xyz.com/files/invoices.csv"));
       // reader.setLinesToSkip(1);

       reader.setLineMapper(new DefaultLineMapper<>() {{
           setLineTokenizer(new DelimitedLineTokenizer() {{
              setDelimiter(DELIMITER_COMMA);
              setNames("name","number","amount","discount","location");
           }});

           setFieldSetMapper(new BeanWrapperFieldSetMapper<>() {{
              setTargetType(Invoice.class);
           }});
       }});

       reader.setRecordSeparatorPolicy(new BlankLineRecordSeparatorPolicy());

       return reader;
    }

    //Autowire InvoiceRepository
    @Autowired
    InvoiceRepository repository;

    //Writer class Object
    @Bean
    public ItemWriter<Invoice> writer(){
       // return new InvoiceItemWriter(); // Using lambda expression code instead of a separate implementation
       return invoices -> {
         System.out.println("Saving Invoice Records: " +invoices);
         repository.saveAll(invoices);
       };
    }

    //Processor class Object
    @Bean
    public ItemProcessor<Invoice, Invoice> processor(){
      // return new InvoiceProcessor(); // Using lambda expression code instead of a separate implementation
      return invoice -> {
         Double discount = invoice.getAmount()*(invoice.getDiscount()/100.0);
         Double finalAmount= invoice.getAmount()-discount;
         invoice.setFinalAmount(finalAmount);
         return invoice;
      };
    }

    //Listener class Object
    @Bean
    public JobExecutionListener listener() {
       return new InvoiceListener();
    }

    //Autowire StepBuilderFactory
    @Autowired
    private StepBuilderFactory sbf;

    //Step Object
    @Bean
    public Step stepA() {
       return sbf.get("stepA")
               .<Invoice,Invoice>chunk(2)
               .reader(reader())
               .processor(processor())
               .writer(writer())
               .build() 
       ;
    }

    //Autowire JobBuilderFactory
    @Autowired
    private JobBuilderFactory jbf;

    //Job Object
    @Bean
    public Job jobA(){
       return jbf.get("jobA")
              .incrementer(new RunIdIncrementer())
              .listener(listener())
              .start(stepA())
           // .next(stepB()) 
           // .next(stepC())
              .build()
       ;
    }

}

In the below step, we have created a helper class BlankLineRecordSeparatorPolicy,java in order to handle blank lines in the CSV while reading the items by the FlatFileItemReader.

Step #7A: Create a helper class as BlankLineRecordSeparatorPolicy.java

import org.springframework.batch.item.file.separator.SimpleRecordSeparatorPolicy;

public class BlankLineRecordSeparatorPolicy extends SimpleRecordSeparatorPolicy {

    @Override
    public boolean isEndOfRecord(final String line) {
        return line.trim().length() != 0 && super.isEndOfRecord(line);
    }

    @Override
    public String postProcess(final String record) {
       if (record == null || record.trim().length() == 0) {
         return null;
       }
       return super.postProcess(record);
    }
}

Step #8: Create a Job Launcher as JobRunner.java

import org.springframework.batch.core.Job;
import org.springframework.batch.core.JobParameters;
import org.springframework.batch.core.JobParametersBuilder;
import org.springframework.batch.core.launch.JobLauncher;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.stereotype.Component;

@Component
public class JobRunner implements CommandLineRunner {

    @Autowired
    private JobLauncher jobLauncher;

    @Autowired   
    private Job jobA;

    @Override    
    public void run(String... args) throws Exception {

    JobParameters jobParameters =
          new JobParametersBuilder()
            .addLong("time", System.currentTimeMillis())
            .toJobParameters();

    jobLauncher.run(jobA, jobParameters);
    System.out.println("JOB Execution completed!");
    }
}

Here, we have created JobRunner class in order to launch the job. We can also use the Scheduler while doing real time implementation. Runner is used to execute a Spring Boot Application only once.

How to Test the Spring Batch implemented Application?

In order to test the Spring Boot Batch Example CSV to MySQL Using JPA, follow the steps as given below:

1) Start your Spring Boot Application and make sure that it is getting started without any exception/error.

2) You should see all the values of print statements in the console such as ‘Job started at’, ‘Status of the Job’, ‘Saving Invoice Records’, ‘Job Ended at’, ‘Status of the Job’ etc. Make sure that the value of “Status of the Job” should be ‘STARTED’ for the first time and ‘COMPLETED’ for the second time.

3) Check the records in the MySQL database whether they are uploaded properly or not.

Console Output

The console output will be something like below. We have 7 records in the SCV file. Since we have declared chunk size 2, records are being written to the database in 4 batches. First 3 batches are having 2 records which is equal to the chunk size. The 4th and last batch has the remaining only 1 record.

Job started at: Fri Jul 01 21:01:55 IST 2022
Status of the Job: STARTED
2022-07-01 21:01:55.799  INFO 10336 --- [ main] o.s.batch.core.job.SimpleStepHandler  : Executing step: [stepA]
Saving Invoice Records: [Invoice(id=null, name=INV001, number=CSQL49SR, amount=496.75, discount=25.0, finalAmount=372.5625, location=Chicago), Invoice(id=null, name=INV002, number=QRQL96MT, amount=498.25, discount=15.0, finalAmount=423.5125, location=Houston)]
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into invoice (amount, discount, final_amount, location, name, number, id) values (?, ?, ?, ?, ?, ?, ?)
Hibernate: insert into invoice (amount, discount, final_amount, location, name, number, id) values (?, ?, ?, ?, ?, ?, ?)
Saving Invoice Records: [Invoice(id=null, name=INV003, number=QLMP84CS, amount=459.0, discount=22.0, finalAmount=358.02, location=Dallas), Invoice(id=null, name=INV004, number=SPMN49DZ, amount=475.5, discount=12.0, finalAmount=418.44, location=New York)]
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into invoice (amount, discount, final_amount, location, name, number, id) values (?, ?, ?, ?, ?, ?, ?)
Hibernate: insert into invoice (amount, discount, final_amount, location, name, number, id) values (?, ?, ?, ?, ?, ?, ?)
Saving Invoice Records: [Invoice(id=null, name=INV005, number=QLMP84CS, amount=495.0, discount=18.0, finalAmount=405.9, location=Paris), Invoice(id=null, name=INV006, number=SPMN49DZ, amount=434.5, discount=12.0, finalAmount=382.36, location=Germany)]
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into invoice (amount, discount, final_amount, location, name, number, id) values (?, ?, ?, ?, ?, ?, ?)
Hibernate: insert into invoice (amount, discount, final_amount, location, name, number, id) values (?, ?, ?, ?, ?, ?, ?)
Saving Invoice Records: [Invoice(id=null, name=INV007, number=QLMP84CS, amount=439.24, discount=13.0, finalAmount=382.1388, location=United Kingdom)]
Hibernate: select next_val as id_val from hibernate_sequence for update
Hibernate: update hibernate_sequence set next_val= ? where next_val=?
Hibernate: insert into invoice (amount, discount, final_amount, location, name, number, id) values (?, ?, ?, ?, ?, ?, ?)
2022-07-01 21:01:56.003  INFO 10336 --- [ main] o.s.batch.core.step.AbstractStep  : Step: [stepA] executed in 204ms
Job Ended at: Fri Jul 01 21:01:56 IST 2022
Status of the Job: COMPLETED

Common Exceptions/Errors During development

You may face various exceptions while developing this Spring Batch Example CSV to MySQL Using JPA. Some of them are listed below:

Parsing Exception 

org.springframework.batch.item.file.FlatFileParseException: Parsing error at line: 8 in resource=[class path resource [invoices.csv]], input=[]

Caused by: org.springframework.batch.item.file.transform.IncorrectTokenCountException: Incorrect number of tokens found in record: expected 5 actual 0

Cause of Exception

This exception occurred while reading 8th line(8th record) in the CSV. Since we have only 7 records in the CSV, 8th record is blank. The ‘Caused By’ statement indicates that it was expected to be 5 columns, but there are 0 columns at line # 8.

Solution

In order to resolve this issue we need to create a record separator policy. In our example it is ‘BlankLineRecordSeparatorPolicy.java’ which extends ‘SimpleRecordSeparatorPolicy’ provided by Spring Batch API.

Bad SQL Grammer Exception 

Caused by: org.springframework.jdbc.BadSqlGrammarException: PreparedStatementCallback; bad SQL grammar [SELECT JOB_INSTANCE_ID, JOB_NAME from BATCH_JOB_INSTANCE where JOB_NAME = ? and JOB_KEY = ?]; nested exception is java.sql.SQLSyntaxErrorException: Table ‘batchdb.batch_job_instance’ doesn’t exist

Cause of Exception

By default Spring framework creates database tables of metadata for the Job Repository when the application starts. The spring boot batch is disabled to create the tables by default. The spring boot application should be enabled to create the tables to create and use the batch tables in the database.

Solution

In order to resolve this issue, we should add  “spring.batch.initialize-schema” in the application.properties to create the batch tables in the database. The property should be set to “ALWAYS” to create the batch tables in the database. This property allows the spring boot batch to perform a database table check. Spring boot batch will use the tables if they exist. If the tables do not exist, it will create them for you. Therefore, add below entry into your application.properties file.

spring.batch.initialize-schema=ALWAYS

You may further refer official link to Spring Batch documentation.

One thought on “Spring Batch Example CSV to MySQL: JPA Batch Insert

  1. Hello Sir, that’s an amazing tutorial. I kindly request you to do for flat file like a file with “.txt” extension.

    Thanks in advanced

Leave a Reply


Top