Can Spring Batch work with Amazon Redshift?

aws batch
aws batch vs lambda
spring batch aws lambda
spring batch aws s3
aws batch lambda
aws amazon com batch
aws batch dashboard
aws batch updates

I'm trying to use Spring Batch (4.0.1.RELEASE) working with Amazon Redshift. I got through the first major problem with Redshift's lack of support for sequences here.

However, now I've run into this when I try to run a job:

10:57:07.122 ERROR [http-nio-8080-exec-4 ] [JobLaunchingService] [] Could not start job [demoJob]
org.springframework.dao.InvalidDataAccessApiUsageException: PreparedStatementCallback; SQL [INSERT INTO BATCH_JOB_EXECUTION_CONTEXT (SHORT_CONTEXT, SERIALIZED_CONTEXT, JOB_EXECUTION_ID) VALUES(?, ?, ?)[Amazon][JDBC](10220) Driver does not support this optional feature.; nested exception is java.sql.SQLFeatureNotSupportedException: [Amazon][JDBC](10220) Driver does not support this optional feature.

This is with the Redshift 1.2.16.1027 JDBC Driver.

Is it even possible to use Redshift as the batch database? Any suggestions on how to get around this?

I'm not sure about you use case, if it is limitation or constrained that you need to have Spring batch only. Also, the jdbc driver, says It doesn't support the batch., then I believe there is no way around to make it work. As a recommended approach and best practice, in Redshift instead of insert statement, COPY command should be used. Though, you could call the copy command using plain JDBC could be good idea.

You could take a look at one my answer, I have provided earlier. I'm just copy/pasting it make it handy.

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.Statement;
import java.util.Properties;

public class RedShiftJDBC {
public static void main(String[] args) {

Connection conn = null;
Statement statement = null;
try {
//Make sure to choose appropriate Redshift Jdbc driver and its jar in classpath
Class.forName("com.amazon.redshift.jdbc42.Driver");
Properties props = new Properties();
props.setProperty("user", "username***");
props.setProperty("password", "password****");

System.out.println("\n\nconnecting to database...\n\n");
//In case you are using postgreSQL jdbc driver.

conn = DriverManager.getConnection("jdbc:redshift://********url-to-redshift.redshift.amazonaws.com:5439/example-database", props);

System.out.println("\n\nConnection made!\n\n");

statement = conn.createStatement();

String command = "COPY my_table from 's3://path/to/csv/example.csv' CREDENTIALS 'aws_access_key_id=******;aws_secret_access_key=********' CSV DELIMITER ',' ignoreheader 1";

System.out.println("\n\nExecuting...\n\n");

statement.executeUpdate(command);
//you must need to commit, if you realy want to have data copied.
conn.commit();
System.out.println("\n\nThats all copy using simple JDBC.\n\n");
statement.close();
conn.close();
 } catch (Exception ex) {    ex.printStackTrace();   }  } }

I hope this gives you some idea. If you have specific question add a comment, I should be able to refocus the answer.

Using Redshfit as Spring batch Job Repository and alternatives to , You can build an EC2 instance that has Docker installed on it and run your new Dockerized application there. Or, and probably the more advised  Amazon Redshift Spectrum - Exabyte-Scale In-Place Queries of S3 Data. Presto - Distributed SQL Query Engine for Big Data. Apache Spark - Fast and general engine for large-scale data processing

In order to make this work, I had to define a separate MySQL database for the Spring Batch "control" tables. That was the default (@Primary) database in the Batch application. The ItemWriters are fed with a different DataSource, the one that was pointed at Redshift.

So now I've got a DataSource for the Batch tables, one for my source db, and one for the target db. That seems to work, but I'm only using the standard DataSourceTransactionManager so it's not clear at all to me what the transactional boundaries are, if a step fails whether the databases are rolled back the same way. But I am NOT going to use XA!!

Transferring Spring Batch Apps to AWS Cloud, Spring Batch does, and you'll be ready to implement your first job with Spring Batch. Accenture brought its experience gained from years of working on pro-. Even though Redshift is a data warehouse and designed for batch loads, combined with a good ETL tool like Hevo, it can also be used for near real-time data loads. Scaling on Redshift One of the most critical factors which makes a completely managed data warehouse service valuable is its ability to scale.

Amazon Redshift is not a supported database for Spring Batch. The supported databases are listed here: https://github.com/spring-projects/spring-batch/tree/master/spring-batch-core/src/main/resources/org/springframework/batch/core.

[PDF] Introducing Spring Batch, The slaves will do the work and will report the result to master. References. [1] Spring Batch Framework Github. 29 Jul 2019. spring · #aws · #batch · #chunking​  Amazon Redshift, is a fast, fully managed, petabyte-scale data warehousing service that makes it simple and cost-effective to analyze all of your data.Many of our customers, including Scholastic, King.com, Electronic Arts, TripAdvisor and Yelp, migrated to Amazon Redshift and achieved agility and faster time to insight, while dramatically reducing costs.

Spring Batch AWS Series (I): Introduction, Spring batch applications can be scaled by running multiple process in parallel on remote machines that can work independently on the  Amazon Redshift is the only cloud data warehouse that offers On-Demand pricing with no up-front costs, Reserved Instance pricing which can save you up to 75% by committing to a 1- or 3-year term, and per-query pricing based on the amount of data scanned in your Amazon S3 data lake.

Scaling Spring Batch Application on AWS with remote partitioning, You can use AWS Lambda to extend other AWS services with custom logic, or create your own back-end services that operate at AWS scale, performance, and​  Amazon Redshift determines what data is local and what is in Amazon S3, generates a plan to minimize the amount of Amazon S3 data that needs to be read, requests Redshift Spectrum workers out of a shared resource pool to read and process data from Amazon S3, and pulls results back into your Amazon Redshift cluster for any remaining processing.

AWS Batch vs Spring Batch, This guide walks you through the process of creating a basic batch-driven solution. Either way, you end up with working code. This is a fairly common pattern that Spring can handle without customization. “AWS” and “Amazon Web Services” are trademarks or registered trademarks of Amazon.com Inc. or its affiliates. Finally, it executes the Amazon Redshift COPY command to copy the S3 contents to the newly created Amazon Redshift table. You can also use the append option with spark-redshift to append data to an existing Amazon Redshift table. In this example, we will write the data to a table named ‘ord_flights’ in Amazon Redshift.

Comments
  • what is a batch database? anyway i guess not, redshift <> postgres. there are plenty of ETL platforms compatible with redshift, why not use one of those.
  • The "batch" database is where Spring Batch stores its control tables. I'm using it because I already have a dozen jobs written in Spring Batch, I just want to change from a MySQL target database to Redshift.
  • if you are generating and submitting inserts - you may find performance issues. you may need to consider staging the data to s3 then issuing copy command on redshift.