Copy table from one dataset to another in google big query

bigquery copy dataset to another project
bigquery rename table
bigquery copy table schema
bigquery table expiration
bigquery insert into table
bigquery recover deleted table
update table bigquery
python bigquery update table

I intend to copy a set of tables from one dataset to another within the same project. I execute the code in Ipython notebook.

I get the list of table names to be copied in the variable "value" using the below code:

list = bq.DataSet('test:TestDataset')

for x in list.tables():
   if(re.match('table1(.*)',x.name.table_id)):
     value = 'test:TestDataset.'+ x.name.table_id

Then i tried using the "bq cp" command to copy table from one dataset to another. But I cannot execute the bq command in the notebook.

!bq cp $value proj1:test1.table1_20162020

Note:

I tried with bigquery command to check whether there is a copy command associated with it but could not find any.

Any help would be appreciated!!

If you are using the BigQuery API with Python, you can run a copy job:

https://cloud.google.com/bigquery/docs/tables#copyingtable

Copying the Python example from the docs:

def copyTable(service):
   try:
    sourceProjectId = raw_input("What is your source project? ")
    sourceDatasetId = raw_input("What is your source dataset? ")
    sourceTableId = raw_input("What is your source table? ")

    targetProjectId = raw_input("What is your target project? ")
    targetDatasetId = raw_input("What is your target dataset? ")
    targetTableId = raw_input("What is your target table? ")

    jobCollection = service.jobs()
    jobData = {
      "projectId": sourceProjectId,
      "configuration": {
          "copy": {
              "sourceTable": {
                  "projectId": sourceProjectId,
                  "datasetId": sourceDatasetId,
                  "tableId": sourceTableId,
              },
              "destinationTable": {
                  "projectId": targetProjectId,
                  "datasetId": targetDatasetId,
                  "tableId": targetTableId,
              },
          "createDisposition": "CREATE_IF_NEEDED",
          "writeDisposition": "WRITE_TRUNCATE"
          }
        }
      }

    insertResponse = jobCollection.insert(projectId=targetProjectId, body=jobData).execute()

    # Ping for status until it is done, with a short pause between calls.
    import time
    while True:
      status = jobCollection.get(projectId=targetProjectId,
                                 jobId=insertResponse['jobReference']['jobId']).execute()
      if 'DONE' == status['status']['state']:
          break
      print 'Waiting for the import to complete...'
      time.sleep(10)

    if 'errors' in status['status']:
      print 'Error loading table: ', pprint.pprint(status)
      return

    print 'Loaded the table:' , pprint.pprint(status)#!!!!!!!!!!

    # Now query and print out the generated results table.
    queryTableData(service, targetProjectId, targetDatasetId, targetTableId)

   except HttpError as err:
    print 'Error in loadTable: ', pprint.pprint(err.resp)

The bq cp command does basically the same, internally (you could call that function too, depending on what bq you are importing).

Copying datasets | BigQuery, If you Big Query dataset has lots of table, then it's very time consuming to clone or copy all tables one by one. But for the help of Shell script, we  Assume you want to copy most tables, you can first copy the entire BigQuery dataset, then delete some tables you don't want to copy. The copy dataset UI is similar to copy table. Just click "copy dataset" button from the source dataset, and specify the destination dataset in the pop-up form. You can copy dataset to another project or another region.

I have created following script to copying all the tables from one dataset to another dataset with couple of validation.

from google.cloud import bigquery

client = bigquery.Client()

projectFrom = 'source_project_id'
datasetFrom = 'source_dataset'

projectTo = 'destination_project_id'
datasetTo = 'destination_dataset'

# Creating dataset reference from google bigquery cient
dataset_from = client.dataset(dataset_id=datasetFrom, project=projectFrom)
dataset_to = client.dataset(dataset_id=datasetTo, project=projectTo)

for source_table_ref in client.list_dataset_tables(dataset=dataset_from):
    # Destination table reference
    destination_table_ref = dataset_to.table(source_table_ref.table_id)

    job = client.copy_table(
      source_table_ref,
      destination_table_ref)

    job.result()
    assert job.state == 'DONE'

    dest_table = client.get_table(destination_table_ref)
    source_table = client.get_table(source_table_ref)

    assert dest_table.num_rows > 0 # validation 1  
    assert dest_table.num_rows == source_table.num_rows # validation 2

    print ("Source - table: {} row count {}".format(source_table.table_id,source_table.num_rows ))
    print ("Destination - table: {} row count {}".format(dest_table.table_id, dest_table.num_rows))

Managing tables | BigQuery, We achieved clone/copy one dataset to another dataset using a few python Declare source dataset and destination dataset by using the export command as in with your Google cloud platform (GCP) account and go BigQuery console then  The usual way to copy BigQuery datasets and tables is to use bq cp: bq cp source_project:source_dataset.source_table \ dest_dataset.dest_table Unfortunately, the copy command doesn’t support

Now coping dataset feature available in BigQuery Data Transfer Service. Select transfer service in BigQuery web console and fill the source and destination details and run it on-demand or schedule it on specified time interval.

Or simply run following gcloud command to achieve this

bq mk --transfer_config --project_id=[PROJECT_ID] --data_source=[DATA_SOURCE] --target_dataset=[DATASET] --display_name=[NAME] --params='[PARAMETERS]'

Copy BigQuery Dataset To Another Dataset – Copy All Tables, created in the dataset is deleted after [INTEGER] seconds from its creation time. This page describes how to export or extract data from BigQuery tables. After you've loaded your data into BigQuery , you can export the data in several formats. BigQuery can export up to 1 GB of data to a single file. If you are exporting more than 1 GB of data, you must export your data to multiple files . When you export your data to

I think it would help you.

    tables = source_dataset.list_tables()
    for table in tables:
        #print table.name
        job_id = str(uuid.uuid4())
        dest_table = dest_dataset.table(table.name)
        source_table = source_dataset.table(table.name)
        if not dest_table.exists():
            job = self.bigquery_client.copy_table(job_id, dest_table, source_table)
            job.create_disposition = (google.cloud.bigquery.job.CreateDisposition.CREATE_IF_NEEDED)
            job.begin()
            job.result()

Copy BigQuery dataset to another dataset, so you only get charged for the storage once. Create a BigQuery dataset. Table limitations. BigQuery tables are subject to the following limitations: Table names must be unique per dataset. The Cloud Console and the classic BigQuery web UI support copying only one table at a time. When copying tables, the destination dataset must reside in the same location as the table being copied.

I am not sure why it is not working for you, since it works perfectly for me.

projectFrom = 'project1'
datasetFrom = 'dataset1'
tableSearchString = 'test1'

projectTo = 'project2'
datasetTo = 'dataset2'

tables = bq.DataSet(projectFrom + ':' + datasetFrom).tables()

for table in tables:
  if tableSearchString in table.name.table_id:

    tableFrom = projectFrom + ':' + datasetFrom + '.' + table.name.table_id
    tableTo = projectTo + ':' + datasetTo + '.' + table.name.table_id

    !bq cp $tableFrom $tableTo

Try this in your notebook, since it works well for me. Just wondering, what is the error code that returns from your script?

BigQuery complete replicates one dataset to another blank dataset , To load the data into BigQuery, first create a dataset called ch04 to hold the data: bq --location=US mk To copy tables, use bq cp to copy one table to another: Call the tables.patch method and use the description property in the table resource to update the table's description. Because the tables.update method replaces the entire table resource, the tables.patch method is preferred. Before trying this sample, follow the Go setup instructions in the BigQuery Quickstart Using Client Libraries . For

Creating datasets | BigQuery, A BigQuery dataset resides in a GCP project and contains one or more tables. to query public datasets, load your own data, and export data to a Google Cloud​  Below the Query editor, scroll to the bottom of the Schema section and click Edit schema. Scroll to the bottom of the panel that opens, and click Add field. For Name, type the column name. For Type, choose the data type. For Mode , choose NULLABLE or REPEATED. When you are done adding columns, click Save. In the navigation pane, select your table.

How to Rename a Table in BigQuery, As a Google BigQuery data warehouse user, you are able to create tables by emplying a few For these simple examples, we'll use the one of the public datasets, specifically the datasetId : The name of the (existing) dataset to export to. You can copy data from Google BigQuery to any supported sink data store. For a list of data stores that are supported as sources or sinks by the copy activity, see the Supported data stores table. Data Factory provides a built-in driver to enable connectivity. Therefore, you don't need to manually install a driver to use this connector.

4. Loading Data into BigQuery, Run a query against data in one or more tables; Extract data from a table; Copy a table. job_iterator = client.list_jobs  Today at Pictarine we decided to release one of our Chrome extensions. This one is a toolbox for BigQuery and we always add new features based on what we need. For now extension allows to : check a query's cost before running it. add a "query temp table" button. auto expand datasets. auto display query validator

Comments
  • Thanks Felipe. But in my scenario, I have to copy multiple tables that starts with a same name but has different timestamps in the end. That was the reason i loop through the table list and get the list of tables that start with 'table1'
  • seems like something easy to automate with python and the provided code?
  • How does the bq cp command work between datasets? I only see an error message claiming the destination dataset doesn't exist, which is clearly wrong (I also see this in the console; both the destination dataset and the incorrect error).