NullPointerException when outputting TableRow with null value

I am trying to build a TableRow object to eventually be written to a BigQuery table, but I get a NullPointerException if I include a null value in the row. This is the full stacktrace:

Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.NullPointerException
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:349)
    at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:319)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:210)
    at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
    at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
    at dataflowsandbox.StarterPipeline.runTest(StarterPipeline.java:224)
    at dataflowsandbox.StarterPipeline.main(StarterPipeline.java:83)
Caused by: java.lang.NullPointerException
    at com.google.api.client.util.ArrayMap$Entry.hashCode(ArrayMap.java:419)
    at java.util.AbstractMap.hashCode(AbstractMap.java:530)
    at java.util.Arrays.hashCode(Arrays.java:4146)
    at java.util.Objects.hash(Objects.java:128)
    at org.apache.beam.sdk.util.WindowedValue$ValueInGlobalWindow.hashCode(WindowedValue.java:245)
    at java.util.HashMap.hash(HashMap.java:339)
    at java.util.HashMap.get(HashMap.java:557)
    at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:191)
    at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:130)
    at org.apache.beam.repackaged.beam_runners_direct_java.com.google.common.collect.HashMultimap.put(HashMultimap.java:48)
    at org.apache.beam.runners.direct.ImmutabilityCheckingBundleFactory$ImmutabilityEnforcingBundle.add(ImmutabilityCheckingBundleFactory.java:111)
    at org.apache.beam.runners.direct.ParDoEvaluator$BundleOutputManager.output(ParDoEvaluator.java:242)
    at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:219)
    at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner.access$700(SimpleDoFnRunner.java:69)
    at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:517)
    at org.apache.beam.repackaged.beam_runners_direct_java.runners.core.SimpleDoFnRunner$DoFnProcessContext.output(SimpleDoFnRunner.java:505)
    at dataflowsandbox.StarterPipeline$6.procesElement(StarterPipeline.java:202)

Process finished with exit code 1

This is the code that triggers the NullPointerException:

  Pipeline p = Pipeline.create( options );

  p.apply( "kicker", Create.of( "Kick!" ) )
  .apply( "Read values", ParDo.of( new DoFn<String, TableRow>() {
     @ProcessElement
     public void procesElement( ProcessContext c ) {

        TableRow row = new TableRow();

        row.set( "ev_id",       "2323423423" );
        row.set( "customer_id", "111111"     );
        row.set( "org_id",      null         ); // Without this line, no NPE
        c.output( row );  


     } }) )
     .apply( BigQueryIO.writeTableRows()
        .to( DATA_TABLE_OUT )
        .withCreateDisposition( CREATE_NEVER )
        .withWriteDisposition( WRITE_APPEND ) );

  PipelineResult result = p.run();

My actual code is a little more complicated, but I should be able to catch the null value and just not set it in the row, but maybe I don't understand something about TableRows.

You can, for example, provide the table schema and just omit setting the value of the field.

The table schema, where org_id is NULLABLE:

List<TableFieldSchema> fields = new ArrayList<>();
fields.add(new TableFieldSchema().setName("ev_id").setType("STRING"));
fields.add(new TableFieldSchema().setName("customer_id").setType("STRING"));
fields.add(new TableFieldSchema().setName("org_id").setType("STRING").setMode("NULLABLE"));
TableSchema schema = new TableSchema().setFields(fields);

Simply do not set any value for that field (comment out that line):

row.set( "ev_id",       "2323423423" );
row.set( "customer_id", "111111"     );
// row.set( "org_id",     None         ); // Without this line, no NPE
c.output( row );  

Pass the table schema in the write step:

.apply( BigQueryIO.writeTableRows()
   .to( DATA_TABLE_OUT )
   .withSchema(schema)
   .withCreateDisposition( CREATE_NEVER )
   .withWriteDisposition( WRITE_APPEND ) );

A NULL value will be written to BigQuery:

NullPointerException when outputting TableRow with null value, NullPointerException when outputting TableRow with null value. I am trying to build a TableRow object to eventually be written to a BigQuery  You are try to read a value from database column that has no value. You might want to modify the select query and use where clause not null for those fields. This way you always get records that has values in the table.

If you are using DirectRunner use the parameter --enforceImmutability=false. It worked for me. This issue has been handled by Dataflow Runner but when DirectRunner is used we encounter the NPE if null is passed to tableRow.set(). If we turn off the DirectRunner's ImmutabilityEnforcement check by setting the --enforceImmutability=false pipeline option, the error is no longer seen.

Ref: https://issues.apache.org/jira/browse/BEAM-1714

[#BEAM-1714] Null Pointer Exception when outputting a TableRow , When outputting a TableRow that has a Null value for a key, a Null Pointer Exception is thrown. In the Dataflow SDK (1.9) this case is handled  I am trying to build a TableRow object to eventually be written to a BigQuery table, but I get a NullPointerException if I include a null value in the row. This is the full stacktrace: Exception in

Put a temporary value instead of the null or an empty string. As far as I can tell tablerows don't accept null values.

Re: java.lang.NullPointerException: Outputs for non-root node after , NullPointerException: Outputs for non-root node after a Combine. are null at org.apache.beam.sdk.repackaged.com.google.common.base. <String, TableRow>create()) .apply(rollup + "_pardo_pageviews", Combine. mapStore.​map.values().size()]); LOG.info(accum.rollup + "pageviews list length " +  Outputting blank values as null or fixed values - 7.1. tMap. You can output blank values as null or fixed values using expressions. To output blank values of a column as null, use the following expression syntax:

How+to+find+the+null+obejct+when+NullPointerException+is+thrown, private WriteResult writeResult(Pipeline p) { PCollection<TableRow> empty @​Test public void testCreateGetName() { assertEquals("Create. public void testCreateTimestampedEmpty() { PCollection<String> output = p.apply( Create.​timestamped(new TimestampedValues<Record> values = Create.timestamped​(  After executing the above query, you can insert NULL value to that column because the column is modified successfully above. mysql> insert into AllowNullDemo values(); Query OK, 1 row affected (0.15 sec) Display records to check whether the last value inserted is NULL or not. mysql> select *from AllowNullDemo;

org.apache.beam.sdk.transforms.Create java code examples, API client library. It looks like maybe you are creating an invalid tablerow (null key? null value?) The specific error there looks like you have a forbidden null somewhere deep inside the output of logLine. catch(NullPointerException e){ I have tried outputting the 'GetRegion.pos1.getX()' and it returned the correct value, therefore, believe my problem here is that something to do with accessing the Main class is null. I cannot however figure out why, or what I can do to fix this.

Re: Testing/Running a pipeline with a BigQuery Sink locally with the , Retail. Analytics and collaboration tools for the retail value chain. Slow-running pipelines or lack of output. Making sure the service can access input and output sources, such as files. public void processElement(ProcessContext c) { Here are two options, both based on using GROUP BY and HAVING.The syntax is for Microsoft SQL Server but should be easily adaptable to any other RDBMS syntax. /* SETUP: Run the following once */ SET NOCOUNT ON; CREATE TABLE #TestData (ID INT NOT NULL, Col1 INT NULL); INSERT INTO #TestData (ID, Col1) VALUES (100, NULL); INSERT INTO #TestData (ID, Col1) VALUES (100, 10); INSERT INTO #TestData

Comments
  • Yea, this is what I ended up doing. But Accepted due to the detailed response!