Google Cloud Dataflow, BigQueryIO and NullPointerException on TableRow.get

I'm new to GC Dataflow and didn't find a relevant answer here. Apologies if I should have found this already answered.

I'm trying to create a simple pipeline using the v2.0 SDK and am having trouble reading data into my PCollection using BigQueryIO. I am using the .withQuery method and I have tested the query in the BigQuery interface and it seems to be working fine. The initial PCollection seems to get created without any issues, but when I think setup a simple ParDo function to convert the values from the TableRow into a PCollection I am getting a NullPointerException on the line of code that does the .get on the TableRow object.

Here is my code. (I'm probably missing something simple. I'm a total newbie at Pipeline programming. Any input would be most appreciated.)

public class ClientAutocompletePipeline {
    private static final Logger LOG = LoggerFactory.getLogger(ClientAutocompletePipeline.class);


    public static void main(String[] args) {
        //  create the pipeline  
        Pipeline p = Pipeline.create(
                PipelineOptionsFactory.fromArgs(args).withValidation().create());

        // A step to read in the product names from a BigQuery table
        p.apply(BigQueryIO.read().fromQuery("SELECT name FROM [beaming-team-169321:Products.raw_product_data]"))

        .apply("ExtractProductNames", ParDo.of(new DoFn<TableRow, String>() {
            @ProcessElement
            public void processElement(ProcessContext c) {
                // Grab a row from the BigQuery Results
                TableRow row = c.element();

                // Get the value of the "name" column from the table row.
                //NOTE: This is the line that is giving me the NullPointerException 
                String productName = row.get("name").toString();

                // Make sure it isn't empty
                if (!productName.isEmpty()) {
                    c.output(productName);
                }
            }
        }))

The query definitely works in the BigQuery UI and the column called "name" is returned when I test the query. Why am I getting a NullPointerException on this line:

String productName = row.get("name").toString();

Any ideas?

This is a common problem when working with BigQuery and Dataflow (most likely the field is indeed null). If you are ok with using Scala, you could take a look at Scio (which is a Scala DSL for Dataflow) and its BigQuery IO.

Troubleshooting and debugging | Cloud Dataflow, might find helpful if you're having trouble building or running your Dataflow pipeline. Dataflow provides real-time feedback on your job, and there is a basic set of You can detect any errors in your pipeline runs by using the Dataflow monitoring interface. Write/BatchLoads/Create/Read(CreateSource)+​BigQueryIO. google-cloud-dataflow apache-beam. Google Cloud Dataflow, BigQueryIO and NullPointerException on TableRow.get. 0.

Just make your code null safe. Replace this:

String productName = row.get("name").toString();

With something like this:

String productName = String.valueOf(row.get("name"));

Google Cloud Dataflow、BigQueryIO、TableRow.get , [Getting Started](https://cloud.google.com/dataflow/getting-started) instructions, you can execute. the same A DoFn that converts a Word and Count into a BigQuery table row. */ .apply(BigQueryIO.Write catch (NullPointerException e​) {. Go to the Google Cloud Console. Select your Google Cloud project from the project list. Click the menu in the upper left corner. Navigate to the Big Data section and click Dataflow. A list of running jobs appears in the right-hand pane. Select the pipeline job you want to view.

I think I'm late for this but you can do something like if(row.containsKey("column-name")). This will basically tell you if the field is null or not. In BigQuery what happens is, while reading data, if a column value is null, it is not available as a part of that particular TableRow. Hence, you are getting that error. You can also do something like if(null == row.get("column-name")) to check if the field is null or not.

NullPointerException+when+outputting+TableRow+with+null+value, projectId : the Cloud project id (defaults to GcpOptions. PCollection<TableRow​> weatherData = pipeline.apply( BigQueryIO. SerializableFunction<T, com.​google.api.services.bigquery.model. This method expects to receive a map-​valued PCollectionView , mapping table specifications (project:dataset.table-id),​  But now, by combining Google Cloud’s billing-export-to-BigQuery functionality with Google Data Studio, you can not only get up-to-date billing graphs throughout the day, you can use labels to

BigQueryType should skip None fields instead of converting them to , Read null pointer error - google-cloud-dataflow. which is going to read data from the BigQuery and return a TableRow, but it is giving me NullPointerException I want to get JavaRdd from JavaDStream and store these Rdd's to kafka server. Google Cloud Dataflow is a data processing service for both batch and real-time data streams. Dataflow allows you to build pipes to ingest data, then transform and process according to your needs before making that data available to analysis tools.

test · Issue #19 · GoogleCloudPlatform/DataflowSDK-examples , Я новичок в GC Dataflow и не нашел соответствующего ответа здесь. Извинения, если я Облачный поток данных Google, BigQueryIO и NullPointerException на TableRow.get. google-bigquery google-cloud-dataflow. In this section of the tutorial, instead of using the BigQuery UI, you use a sample program to load data into BigQuery by using a Dataflow pipeline. Then, you use the Dataflow programming model to denormalize and cleanse data to load into BigQuery. Before you begin, review the concepts and the sample code. Review the concepts.

BigQueryIO, Dismiss Join GitHub today. GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.

Comments
  • Are all of the values in the name column guaranteed not to be null?
  • If you run SELECT name FROM [beaming-team-169321:Products.raw_product_data] where name is null in BigQuery you'll see that there are null values. So, you need to take this into consideration in your pipeline.
  • Well, now that you mention it, that does make perfect sense. I guess I was under the incorrect impression that something was causing them all to be null because I had some error in my code, but I guess that might not have been the case. Thanks for replying!
  • Thanks for taking the time to respond! I don't have any experience with Scala, but I'll definitely check it out when I have some time. Thanks!