How to flatten a latlong array in bigquery to produce a linestring?

bigquery flatten array
flatten bigquery
sql flatten array
array_agg bigquery
struct bigquery
bigquery subquery
bigquery recursive query
bigquery union

I have a nested table structure, like this:

[
  {
    "startTime": "2017-09-02 09:08:00:000",
    "endTime": "2017-09-02 09:09:00:000",
    "startTimeMillis": "1504343280000",
    "endTimeMillis": "1504343340000",
    "uuid": "1748750880",
    "country": "CI",
    "city": "Punta Arenas",
    "x": "-70.906904",
    "y": "-53.133514"
  },
  {
    "startTime": "2017-09-02 09:08:00:000",
    "endTime": "2017-09-02 09:09:00:000",
    "startTimeMillis": "1504343280000",
    "endTimeMillis": "1504343340000",
    "uuid": "1748750880",
    "country": "CI",
    "city": "Punta Arenas",
    "x": "-70.907353",
    "y": "-53.133253"
  },
  {
    "startTime": "2017-09-02 09:08:00:000",
    "endTime": "2017-09-02 09:09:00:000",
    "startTimeMillis": "1504343280000",
    "endTimeMillis": "1504343340000",
    "uuid": "1748750880",
    "country": "CI",
    "city": "Punta Arenas",
    "x": "-70.90771",
    "y": "-53.133041"
  },
  {
    "startTime": "2017-09-02 09:08:00:000",
    "endTime": "2017-09-02 09:09:00:000",
    "startTimeMillis": "1504343280000",
    "endTimeMillis": "1504343340000",
    "uuid": "1748750880",
    "country": "CI",
    "city": "Punta Arenas",
    "x": "-70.908979",
    "y": "-53.132287"
  }
]

A resulting table is something like this:

  Row|startTime|endTime|startTimeMillis|endTimeMillis|uuid|country|city|x|y| 
  1|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|-70.906904|-53.133514| 
  2|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|-70.907353|-53.133253| 
  3|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|-70.90771|-53.133041| 
  4|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|-70.908979|-53.132287|

I'd like to concat the repeated fields x and y to produce a GIS linestring, in a single line, like this:

Row|startTime|endTime|startTimeMillis|endTimeMillis|uuid|country|city|linestring
1|2017-09-02 09:08:00:000|2017-09-02 09:09:00:000|1504343280000|1504343340000|1748750880|CI|Punta Arenas|LINESTRING(-70.906904 -53.133514, -70.907353 -53.133253, -70.90771 -53.133041, -70.908979 -53.132287)

How can I do this? The original x and y values are floats.

Thanks in advanced!

Below is for BigQuery Standard SQL

#standardSQL
WITH `yourTable` AS (
  SELECT '2017-09-02 09:08:00:000' AS startTime, '2017-09-02 09:09:00:000' AS endTime, 1504343280000 AS startTimeMillis, 1504343340000 AS endTimeMillis, 1748750880 AS uuid, 'CI' AS country, 'Punta Arenas' AS city, -70.906904 AS x, -53.133514 AS y UNION ALL 
  SELECT '2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280000, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.907353, -53.133253 UNION ALL 
  SELECT '2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280000, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.90771, -53.133041 UNION ALL 
  SELECT '2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280000, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.908979, -53.132287 
)
SELECT startTime, endTime, startTimeMillis, endTimeMillis, uuid, country, city,
STRING_AGG(CONCAT(CAST(x AS STRING), ' ', CAST(y AS STRING)), ',') AS linestring
FROM `yourTable`
GROUP BY startTime, endTime, startTimeMillis, endTimeMillis, uuid, country, city  

google bigquery, how to group by and push top n elements into an Array in BQ · Ask Question How to flatten a latlong array in bigquery to produce a linestring? ST_MAKELINE(array_of_geography) Description. Creates a GEOGRAPHY with a single linestring by concatenating the point or line vertices of each of the input GEOGRAPHYs in the order they are given. ST_MAKELINE comes in two variants. For the first variant, input must be two GEOGRAPHYs. For the second, input must be an ARRAY of type GEOGRAPHY.

You could use the ARRAY_AGG function available in Standard SQL, something like:

#standardSQL
WITH data AS(
  SELECT "2017-09-02 09:08:00:000" AS startTime, "2017-09-02 09:09:00:000" endTime, "1504343280000" AS startTimeMillis, "1504343340000" endTimeMillis, "1748750880" AS uuid, "CI" AS country, "Punta Arenas" AS city, "-70.906904" AS x, "-53.133514" AS y UNION ALL
  SELECT "2017-09-02 09:08:00:000", "2017-09-02 09:09:00:000", "1504343280000", "1504343340000", "1748750880", "CI", "Punta Arenas", "-70.907353", "-53.133253" UNION ALL
  SELECT "2017-09-02 09:08:00:000", "2017-09-02 09:09:00:000", "1504343280000", "1504343340000", "1748750880", "CI", "Punta Arenas", "-70.90771", "-53.133041" UNION ALL
  SELECT "2017-09-02 09:08:00:000", "2017-09-02 09:09:00:000", "1504343280000", "1504343340000", "1748750880", "CI", "Punta Arenas", "-70.908979", "-53.132287"
)

SELECT
  startTime,
  endTime,
  startTimeMillis,
  endTimeMillis,
  uuid,
  country,
  city,
  ARRAY_AGG(STRUCT(x, y)) AS LINESTRING
FROM data
GROUP BY
  startTime,
  endTime,
  startTimeMillis,
  endTimeMillis,
  uuid,
  country,
  city

Result:

Even though result is an ARRAY with the elements x and y, notice that they have been structured together as a STRUCT which will allow you to access each field by its respective name.

Bigquery flatten struct, How to flatten a latlong array in bigquery to produce a linestring? I have a nested table structure, like this: [ "startTime": "2017-09-0 As data engineers, we  Using regular expressions on integer or float data While BigQuery's regular expression functions only work for string data, it's possible to use the STRING() function to cast integer or float data into string format. In this example, STRING() is used to cast the integer value corpus_date to a string,

Thank you all!

I'm using Mikhail Berlyant solution!

SELECT
  w.startTime, w.endTime, w.startTimeMillis, w.endTimeMillis,
  jams_u.uuid, jams_u.country, jams_u.city, jams_u.street, 
  jams_u.roadType, jams_u.turnType, 
  jams_u.type, jams_u.length, jams_u.speed, jams_u.level, jams_u.delay,
  jams_u.startNode, jams_u.endNode, jams_u.pubMillis,
  TIMESTAMP_MILLIS(jams_u.pubMillis) as pubdatetime_utc,
  STRING_AGG(CONCAT(CAST(line_u.x AS STRING),' ',CAST(line_u.y AS STRING))) linestring_4326
FROM
  a_import.table w,
  UNNEST(jams) jams_u,
  UNNEST(line) line_u
GROUP BY
  w.startTime, w.endTime, w.startTimeMillis, w.endTimeMillis,
  jams_u.uuid, jams_u.country, jams_u.city, jams_u.street, 
  jams_u.roadType, jams_u.turnType, 
  jams_u.type, jams_u.length, jams_u.speed, jams_u.level, jams_u.delay,
  jams_u.startNode, jams_u.endNode, jams_u.pubMillis,
  pubdatetime_utc

Flattening the BigQuery variants table, Note: If you do not include t.call AS call in the FROM clause, BigQuery reports the following error: Cannot access field name on a value with type ARRAY<STRUCT​  To get a single row from the track array, we need to go through UNNEST(). When you call UNNEST(track) , it makes a table, so the UNNEST() can only be used in the FROM clause of BigQuery.

One concern with the proposed solutions that use GROUP BY only - without an ORDER BY operator within group, the order of the elements in the GROUP BY group is undefined. So you can get arbitrary order of points in the linestring, which is probably not what you want. Unfortunately, with small inline datasets you get stable results, but this might break once you have real data.

To solve this you need to define which attributes define group, and which define order. E.g. if uuid defines a linestring, and start timestamp defines order (they would need to be different, unlike in your sample), your query might group by uuid, and sort by timestamp.

I also prefer to use new Geospatial functions to construct WKT linestring, rather than string concatenation, which gives:

#standardSQL
WITH `yourTable` AS (
  SELECT * FROM UNNEST([
    STRUCT('2017-09-02 09:08:00:000' AS startTime, '2017-09-02 09:09:00:000' AS endTime, 1504343280002 AS startTimeMillis, 1504343340000 AS endTimeMillis, 
           1748750880 AS uuid, 'CI' AS country, 'Punta Arenas' AS city, -70.906904 AS x, -53.133514 AS y),
    STRUCT('2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280001, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.907353, -53.133253), 
    STRUCT('2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280004, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.90771, -53.133041),
    STRUCT('2017-09-02 09:08:00:000', '2017-09-02 09:09:00:000', 1504343280003, 1504343340000, 1748750880, 'CI', 'Punta Arenas', -70.908979, -53.132287)]) 
)
SELECT uuid, MIN(startTime) startTime, MAX(endTime) endTime,
       ANY_VALUE(country), ANY_VALUE(city),
       ST_MakeLine(ARRAY_AGG(ST_GeogPoint(x, y) 
                             ORDER BY startTime, startTimeMillis)) line
FROM `yourTable`
GROUP BY uuid

Geography Functions in Standard SQL | BigQuery, Additional GEOGRAPHY s provided in the input ARRAY specify a polygon hole. For every input GEOGRAPHY containing exactly one linestring, the following  Because track is an array, you get the whole array. To get a single row from the track array, we need to go through UNNEST() . When you call UNNEST(track) , it makes a table, so the UNNEST() can only be used in the FROM clause of BigQuery.

7.20. Geospatial Functions, For example, it does not make sense to calculate the area of a polygon that has a hole defined For spherical/geospatial uses, this implies (longitude, latitude) instead of (latitude, longitude) . Returns a LineString formed from an array of points. A GeometryCollection will produce an un-flattened array of its constituents:  131 A utility class to flatten any hierarchy of geometry 391 bool get_linestring(const Json_array *data 593 bool check_valid_latlong_type

google-bigquery, How to re-set custom HTTP Headers in BigQuery API calls · Can we create How to flatten a latlong array in bigquery to produce a linestring? BigQuery  The following are code examples for showing how to use shapely.geometry.Point().They are from open source Python projects. You can vote up the examples you like or vote down the ones you don't like.

ST_MakeLine, geometry ST_MakeLine( geometry[] geoms_array ) ;. geometry ST_MakeLine( geometry set geoms ) ;. Description. Creates a LineString containing the points of Point, MultiPoint, or LineString Create a line composed of two points. SELECT​  Working with Attributes (QGIS3) Working with Terrain Data. Working with WMS Data. Working with Projections. PyQGIS in a Day - Course Material ↗ Want more QGIS Tips and Tricks? See Spatial Thoughts Blog ↗.

Comments
  • you say I have a nested table structure, like this: but what you show as a schema and example is NOT a nested structure! can you clarify what exactly you have
  • if answer helped you - also consider voting it up! :o) Vote up answers that are helpful. ... You can check about what to do when someone answers your question - stackoverflow.com/help/someone-answers. Following these simple rules you increase your own reputation score and at the same time you keep us motivated to answer your questions :o
  • Thank you! I found a different solution, but this one is fine!