Snowflake : How to create a view on json file on external S3 stage

snowflake external stage s3
snowflake select from stage
snowflake create table from stage
snowflake copy into example
copy into stage snowflake example
snowflake list files in stage
create file format -- snowflake
snowflake external table

I am trying to create a view on json file which is placed on external S3 stage of snowflake. Json file structure is :

[{ 'key1':val1, 'key2':val2 }, { 'key1':val1, 'key2':val2 }] I want to query json file with above structure.How can i do this?

Note : I am able to create a view on json file with below structure using flatten function - {"Data": { 'key1':val1, 'key2':val2 }, { 'key1':val1, 'key2':val2 }}

But now my json structure is different and i don't have "Data" node in my above json structure.


To get past size limitations, use STRIP_OUTER_ARRAY=TRUE in the file format, so each object lands in its own row.

For example, assuming @my_stage/data.json points to a file containing:

[{ "key1":"val1", "key2":"val2" }, { "key1":"val3", "key2":"val4" }]

We can create this file format and view:

create or replace file format my_format type=json strip_outer_array=true;

create or replace view v_my_view as
select $1 data
from @my_stage/data.json 
(file_format=>my_format) a;

With these results:

select * from v_my_view;

DATA:KEY1 | DATA:KEY2
----------+----------
"val1"    | "val2"   
"val3"    | "val4"   

Or flatten further:

create or replace view v_my_view_flat as
select b.key, b.path, b.index, b.value
from @my_stage/data.json 
(file_format=>my_format) a, lateral flatten(input=>a.$1) b;

To get this output:

KEY  | PATH | INDEX | VALUE 
-----+------+-------+-------
key1 | key1 |  NULL | "val1"
key2 | key2 |  NULL | "val2"
key1 | key1 |  NULL | "val3"
key2 | key2 |  NULL | "val4"

Querying Data in Staged Files — Snowflake Documentation, This can be useful for inspecting/viewing the contents of the staged files, Example 3: Querying Elements in a JSON File external_location is the URI specifier for the named external stage or external location (Amazon S3, Google Cloud� Querying Data in Staged Files¶ Snowflake supports using standard SQL to query data files located in an internal (i.e. Snowflake) stage or named external (Amazon S3, Google Cloud Storage, or Microsoft Azure) stage. This can be useful for inspecting/viewing the contents of the staged files, particularly before loading or after unloading data.


Maybe you want the obvious:

WITH MY_TABLE AS
  (SELECT PARSE_JSON('[{ ''key1'':1, ''key2'':2 }, { ''key1'':3, ''key2'':4 }]') obj)
SELECT VALUE:key1 key1, VALUE:key2 key2 FROM MY_TABLE
CROSS JOIN LATERAL FLATTEN(INPUT => obj);

or maybe you want something completely different. Hard to tell...

CREATE STAGE — Snowflake Documentation, CREATE MASKING POLICY � CREATE MATERIALIZED VIEW � CREATE NETWORK Creates a new named internal or external stage to use for loading data from files The named file format determines the format type (CSV, JSON, etc.) Specifies the URL for the external location (existing S3 bucket) used to store data� @kkakkireni External tables are essentially a physical implementation of the link that I sent to you. Either way works fine, but I just want to make sure that you understand that a Snowflake Stage is not "staging a file into Snowflake". A stage is just a pointer to an S3 bucket.


A couple comments on approach of processing JSON in Snowflake (regardless of the specific examples and querying need of the JSON data) - In general, we see the most popular way that Snowflake customers use to process JSON is to ingest into the VARIANT data type in a Snowflake table and querying from the VARIANT column. Ingesting JSON into VARIANT is easy and same approach like relational data (e.g. Specify the file format, then use COPY INTO) - This generalized approach (best practice) has the benefits of performance as well as supporting querying from VARIANT data in all standard SQL operations (join, group by, filtering) just like relational data.

Step 2. Create a Named Stage Object — Snowflake Documentation, An external stage references data files stored in a S3 bucket. In this case, we are creating a stage that references the sample data files necessary to complete the� Snowflake supports using a customer S3 bucket for staging data files. These instructions show how a customer can create the Identity & Access Management (IAM) credentials and policy to authorize Snowflake access to their S3 staging bucket.


Querying Metadata for Staged Files — Snowflake Documentation, metadata for files in internal (i.e. Snowflake) stages or external (Amazon S3, Example 2: Querying the Metadata Columns for a JSON File Includes the path to the data file in the stage. Queries on INFORMATION_SCHEMA views� Note that it only does so with an abbreviated set of five records. For the sake of this demonstration, the records are being inserted one at a time via the Python Snowflake connector. This is not a best practice and provided for demonstration only; when doing this yourself, make sure Snowflake instead ingests data via flat files and stage objects.


Tutorial: Bulk Loading from Amazon S3 Using COPY — Snowflake , Create named file formats that describe your data files. Create named stage objects. Load data The tutorial covers loading of both CSV and JSON data. Click the JSON tab. Add a policy document that will allow Snowflake to access the S3 bucket and folder. The following policy (in JSON format) provides Snowflake with the required permissions to load or unload data using a single bucket and folder path. You can also purge data files using the PURGE copy option.


CREATE EXTERNAL TABLE — Snowflake Documentation, Creates a new external table in the current/specified schema or replaces an Specifies the external stage where the files containing data to be read are staged : Required only when configuring AUTO_REFRESH for Amazon S3 stages rows in a CSV data file as JSON objects with elements identified by column position,� We are working with structured, comma-delimited data. The file we downloaded from S3 has no headers. 6. Load data to table. To load the data to the table on the web UI we must use the load data wizard because some Snowflake commands are not supported from the worksheet on web UI.