How to integrate Google Cloud SQL with Google Big Query

cloud sql to bigquery
google cloud mysql
bigquery vs cloud sql
bigquery mysql
bigquery incremental load
insert data into bigquery table
mysql to bigquery airflow
bigquery transactions

I am designing a solution in which Google Cloud SQL will be used to store all data from the regular functioning of the app(kind of OLTP data). The data is expected to grow over time into pretty large size. The data itself is relational in nature and hence we have chosen Cloud SQL instead of Cloud Datastore.

This data needs to be fed into Big Query for analytics and this needs to be near real-time analytics (as the best case), although realistically some lag can be expected. But I am trying to design a solution which reduces this lag to minimum possible.

My question has 3 parts -

  1. Should I use Cloud SQL for storing data and then move it to BigQuery or change the basic design itself and use BigQuery for storing the data initially as well? Is BigQuery suitable for use for regular, low-latency OLTP workloads?(I don't think so - is my assumption correct?)

  2. What is the recommended/best practice for loading Cloud SQL data into BigQuery and have this integration work near real-time?

  3. Is Cloud Dataflow a good option? If I connect Cloud SQL to Cloud DataFlow and further to BigQuery - will it work? Or is there any other way to achieve this which is better(as asked in question 2)?

Performing ETL from a relational database into , Google BigQuery solves this problem by enabling super-fast, SQL-like queries against petabytes of data using the processing power of Google's infrastructure. It's  Google Cloud SQL MySQL to Google BigQuery in minutes The Stitch Google Cloud SQL MySQL integration will ETL your Google Cloud SQL MySQL to Google BigQuery in minutes and keep it up to date without the headache of writing and maintaining ETL scripts.

BigQuery supports Cloud SQL federated queries which lets you directly query Cloud SQL database from BigQuery. To keep Cloud SQL table in sync with BigQuery, you can write a simple script with following query to sync two tables every hour.

INSERT
   demo.customers (column1)
SELECT
   *
FROM
   EXTERNAL_QUERY(
      "project.us.connection",
      "SELECT column1 FROM mysql_table WHERE timestamp > ${timestamp};");

Just remember replace the ${timestamp} with the current timestamp - 1 hour.

How to integrate Google Cloud SQL with Google Big Query, Google BigQuery solves this problem by enabling super-fast, SQL-like queries against petabytes of data using the processing power of Google's infrastructure. It's  This topic describes how to set the query dialect when you query BigQuery data. You can use either the standard SQL or legacy SQL dialect. To learn how to get started querying data by using the BigQuery web UI, see the Quickstart using the web UI.

Another method would be to split the write process to CloudSQL and to Cloud Pub/Sub and then have a Dataflow reader to stream into BigQuery. This works well when you have materially different target schema for your BigQuery tables - which is common when denormalizing your relational data.

The upside is that you can reduce overall latency to say a few seconds; however, the main downside is that if your transactional data is highly mutating you will have to create a versioning scheme to track changes.

Using BigQuery in the GCP Console, By leveraging the Alooma enterprise data pipeline, you can easily integrate, connect, and watch your Google Cloud SQL for MySQL data flow into Google  Google Cloud SQL for MySQL is a fully-managed database service that makes it easy to set up, maintain, manage, and administer your relational MySQL databases in the cloud. Source and Destination Native Integration Connect to Google Cloud SQL for MySQL with our generic REST API component.

When Does it Make Sense to Use Google BigQuery?, Add tool. Learn Reference Architectures on Snowflake. Google BigQuery vs Google Cloud SQL: What are the differences? Developers describe Google  Loading Google BigQuery Data to Google Cloud SQL PostgreSQL and Vice Versa. Skyvia offers a number of benefits for import Google Cloud SQL PostgreSQL data to Google BigQuery or vice versa. With Skyvia import you can import data from several Google BigQuery objects at once, use complex mapping, etc. These features are available for both directions.

Integrate Google BigQuery to Google Cloud SQL for MySQL, The Stitch Google Cloud SQL PostgreSQL integration will ETL your Google Cloud SQL PostgreSQL to Google BigQuery in minutes and keep it up to date without  Query data in BigQuery and Cloud SQL with a federated query. Querying Cloud Bigtable data. Use BigQuery to query data stored in Cloud Bigtable. Querying Cloud Storage data. Use BigQuery to query data stored in Cloud Storage. Querying Drive data. Use BigQuery to query data stored in Drive.

Integrate Google BigQuery to Google Cloud SQL for PostgreSQL , Compare Google BigQuery vs Google Cloud SQL. BigQuery is unlike anything we've used as a big data tool. Premium Consulting/Integration Services. —  Google BigQuery vs Google Cloud SQL: What are the differences? Developers describe Google BigQuery as "Analyze terabytes of data in seconds".Run super-fast, SQL-like queries against terabytes of data in seconds, using the processing power of Google's infrastructure Load data with ease.

Comments
  • Thanks Felipe! Its really helpful for me.