Getting Started With Spring Batch 2.0
Running The Football Sample Application
Spring Batch includes several sample batch applications. A good starting point is the (American) football sample app. I'm going to assume that you have your project already set up in your IDE. If you're using Eclipse, I recommend installing the latest version of Spring IDE, including the core plug-in and the Batch extension. That will allow you to visualize the bean dependencies in your Spring bean configuration files.
The football sample app is a three-step job. The first step loads a bunch of player data in from a text file and copies it into a database table called players. The second step does the same thing with game data, placing the result in a table called games. Finally, the third step generates player summary stats from the players and games tables and writes it into a third database table called player_summary.
You might find it useful to glance at the player and game data files, just to see what's up. The data files are inside src/main/resources/data/footballjob/input.
Let's run it. We can run the job by running the JUnit test
org.springframework.batch.sample.FootballJobFunctionalTests
in the src/test/java folder. Go ahead and try that now. The JUnit tests should pass.
By default the test uses an in-memory HSQLDB database. While this makes for a fast test, it's not so useful for trying to see what the job is actually doing. So instead let's run the batch job against a persistent database. I'm using MySQL though you can use whatever you like. Here's what we need to do.
Step 1. Create a database; e.g. CREATE DATABASE spring_batch_samples.
Step 2. Inside src/main/resources you'll see various batch-xxx.properties files. Open the one corresponding to your RDBMS of choice and modify the properties as necessary. Make sure the value of batch.jdbc.url matches the database name you chose in step 1.
Step 3. When running the tests, we need to override the default RDBMS specified in src/main/resources/data-source-context.xml. If you're lazy, you can just find the environment bean in that file and change the defaultValue property's value from hsql to mysql or sqlserver or whatever. (The options correspond to the batch-xxx.properties files we mentioned above.) The right way to do it, though, is to set the
org.springframework.batch.support.SystemPropertyInitializer.ENVIRONMENT
system property. There are different ways to do that. If you're using Eclipse, go to the Run > Run Configurations dialog, and in the run configuration for FootballJobFunctionalTests go to the Arguments tab. Then add the following to the VM arguments:
-Dorg.springframework.batch.support.SystemPropertyInitializer.ENVIRONMENT=mysql
(I've broken that into two lines for formatting purposes, but it should all be a single line.)
Step 4. Just to make this batch job more interesting (i.e., to make it much bigger), open up the src/main/resources/jobs/footballJob.xml application context file and look for the footballProperties bean. Change its properties from
<beans:value>
games.file.name=games-small.csv
player.file.name=player-small1.csv
job.commit.interval=2
</beans:value>
to
<beans:value>
games.file.name=games.csv
player.file.name=player.csv
job.commit.interval=100
</beans:value>
Step 5. Run FootballJobFunctionalTests again. It will run for a while depending on how fast your computer is. Mine is pretty slow but the job still finishes in a couple of minutes.
Assuming everything runs as it should, step 5 creates several tables in your database. Here's what it looks like in MySQL:
mysql> show tables;
+--------------------------------+
| Tables_in_spring_batch_samples |
+--------------------------------+
| batch_job_execution |
| batch_job_execution_context |
| batch_job_execution_seq |
| batch_job_instance |
| batch_job_params |
| batch_job_seq |
| batch_staging |
| batch_staging_seq |
| batch_step_execution |
| batch_step_execution_context |
| batch_step_execution_seq |
| customer |
| customer_seq |
| error_log |
| games |
| player_summary |
| players |
| trade |
| trade_seq |
+--------------------------------+
19 rows in set (0.00 sec)
Spring Batch uses the batch_xxx tables to manage job execution. These are part of Spring Batch itself, not part of the samples, and so the SQL scripts that generate them are inside the org.springframework.batch.core-2.0.0.RC2.jar. On the other hand, the other tables are sample business tables. These are defined in the src/main/resources/business-schema-xxx.sql scripts. As you can see, there are some extra tables here—these support some of the other sample apps—but the only business tables we care about are players, games and player_summary.
There's a lot of data in the tables. Here's what it looks like:
mysql> select count(*) from players;
+----------+
| count(*) |
+----------+
| 4320 |
+----------+
1 row in set (0.00 sec)
mysql> select count(*) from games;
+----------+
| count(*) |
+----------+
| 56377 |
+----------+
1 row in set (0.06 sec)
mysql> select count(*) from player_summary;
+----------+
| count(*) |
+----------+
| 5931 |
+----------+
1 row in set (0.01 sec)
If you want to check out some of the data itself without having to pull down the entire dataset, you can use the following queries:
select * from players limit 10;
select * from games limit 10;
select * from player_summary limit 10;
Just for kicks, you might find it entertaining to investigate the batch_xxx tables too. For instance:
mysql> select * from batch_job_execution;
+------------------+---------+-----------------+---------------------+
| JOB_EXECUTION_ID | VERSION | JOB_INSTANCE_ID | CREATE_TIME |
+------------------+---------+-----------------+---------------------+
| 1 | 2 | 1 | 2009-03-22 20:31:40 |
+------------------+---------+-----------------+---------------------+
+---------------------+---------------------+-----------+-----------+
| START_TIME | END_TIME | STATUS | EXIT_CODE |
+---------------------+---------------------+-----------+-----------+
| 2009-03-22 20:31:40 | 2009-03-22 20:33:44 | COMPLETED | COMPLETED |
+---------------------+---------------------+-----------+-----------+
+--------------+---------------------+
| EXIT_MESSAGE | LAST_UPDATED |
+--------------+---------------------+
| | 2009-03-22 20:33:44 |
+--------------+---------------------+
1 row in set (0.00 sec)
This will give you some visibility into how Spring Batch keeps track of job executions, but we're not going to worry about that here. (Consult the Spring Batch 2.0 reference manual for more information on that.)
It's time to take a closer look at what's going on behind the scenes.
- Login or register to post comments
- 37190 reads
- Printer-friendly version
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)










Comments
gzres replied on Wed, 2009/03/25 - 2:59am
Matthew Schmidt replied on Wed, 2009/03/25 - 6:30am
Matthew Schmidt replied on Wed, 2009/03/25 - 9:33am
in response to: gzres
gzres replied on Thu, 2009/03/26 - 5:17am
in response to: matt
hksduhksdu replied on Wed, 2009/04/01 - 12:38am
While SpringFramework team has done a great job, I would rather go with BPM. Basically you can manage your BPM with scheduler or in sequence of events. Besides, it is easy to make changes to business rules where sticking with batch means to going back to COBOL+JCL Java version?
I've played with NetBeans' BPEL plugin and I think it's awesome! I think I can spare more time on data schema validations and requirement verifications than pure coding to end of the world for batch processing.
Just my 2 cents though,
:)
tariqahsan replied on Wed, 2009/05/06 - 10:13am
Declan Cox replied on Tue, 2009/08/25 - 5:22am
sunrise1 replied on Fri, 2009/10/23 - 5:45am
Kevin321 replied on Fri, 2009/11/13 - 5:27am