Create AWS Athena view programmatically

aws glue create view
athena create table
aws athena documentation
athena view partitions
aws athena cli
athena bucketing
athena create table from view
athena write to s3

Can you create views in Amazon Athena? outlines how to create a view using the User Interface.

I'd like to create an AWS Athena View programatically, ideally using Terraform (which calls CloudFormation).

I followed the steps outlined here: https://ujjwalbhardwaj.me/post/create-virtual-views-with-aws-glue-and-query-them-using-athena, however I run into an issue with this in that the view goes stale quickly.

...._view' is stale; it must be re-created.

The terraform code looks like this:

resource "aws_glue_catalog_table" "adobe_session_view" {

  database_name = "${var.database_name}"
  name = "session_view"

  table_type = "VIRTUAL_VIEW"
  view_original_text = "/* Presto View: ${base64encode(data.template_file.query_file.rendered)} */"
  view_expanded_text = "/* Presto View */"

  parameters = {
    presto_view = "true"
    comment = "Presto View"
  }

  storage_descriptor {
    ser_de_info {
      name = "ParquetHiveSerDe"
      serialization_library = "org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe"
    }

    columns { name = "first_column" type = "string" }
    columns { name = "second_column" type = "int" }
    ...
    columns { name = "nth_column" type = "string" }
}

An alternative I'd be happy to use is the AWS CLI, however aws athena [option] provides no option for this.

I've tried:

  • create-named-query which I have not been able to get working for a statement such as CREATE OR REPLACE VIEW as this doesn't seem to be the intended use case for this command.
  • start-query-execution which asks for an output location, which suggests that this is meant for querying the data and outputting the results, as opposed to making stateful changes/creations. It also seems to be paired with stop-query-execution.

As you suggested, it is definitely possible to create an Athena view programmatically via the AWS CLI using the start-query-execution. As you pointed out, this does require you to provide an S3 location for the results even though you won't need to check the file (Athena will put an empty txt file in the location for some reason).

Here is an example:

$ aws athena start-query-execution --query-string "create view my_view as select * from my_table" --result-configuration "OutputLocation=s3://my-bucket/tmp" --query-execution-context "Database=my_database"

{
    "QueryExecutionId": "1744ed2b-e111-4a91-80ea-bcb1eb1c9c25"
}

You can avoid having the client specify a bucket by creating a workgroup and setting the location there.

You can check whether your view creation was successful by using the get-query-execution command.

$ aws --region athena get-query-execution --query-execution-id bedf3eba-55b0-42de-9a7f-7c0ba71c6d9b
{
    "QueryExecution": {
        "QueryExecutionId": "1744ed2b-e111-4a91-80ea-bcb1eb1c9c25",
        "Query": "create view my_view as select * from my_table",
        "StatementType": "DDL",
        "ResultConfiguration": {
            "OutputLocation": "s3://my-bucket/tmp/1744ed2b-e111-4a91-80ea-bcb1eb1c9c25.txt"
        },
        "Status": {
            "State": "SUCCEEDED",
            "SubmissionDateTime": 1558744806.679,
            "CompletionDateTime": 1558744807.312
        },
        "Statistics": {
            "EngineExecutionTimeInMillis": 548,
            "DataScannedInBytes": 0
        },
        "WorkGroup": "primary"
    }
}

CREATE VIEW - Amazon Athena, Creates a new view from a specified SELECT query. The view is a logical table that can be referenced by future queries. Views do not contain any data and do  Creates a new view from a specified SELECT query. The view is a logical table that can be referenced by future queries. Views do not contain any data and do not write data. Instead, the query specified by the view runs each time you reference the view by another query.

Working with Views - Amazon Athena, To create an AWS account . Best Practices When Using Athena with AWS Glue . Athena service endpoints that you can connect to programmatically, see. Working with Views in the Console. In the Athena console, choose Views , choose a view, then expand it. In the list of views, choose a view, and open the context (right-click) menu. The actions menu icon (⋮) is highlighted for the view that you Choose an option. For example, Show properties

Updating the above examples for Terraform 0.12+ syntax, and adding in reading the view queries from the filesystem:

resource "null_resource" "athena_views" {
  for_each = {
    for filename in fileset("${path.module}/athenaviews/", "**"):
           replace(filename,"/","_") => file("${path.module}/athenaviews/${filename}")
  }

  provisioner "local-exec" {
    command = <<EOF
    aws athena start-query-execution \
      --output json \
      --query-string CREATE OR REPLACE VIEW ${each.key} AS ${each.value} \
      --query-execution-context "Database=${var.athena_database}" \
      --result-configuration "OutputLocation=s3://${aws_s3_bucket.my-bucket.bucket}"
EOF
  }

  provisioner "local-exec" {
    when    = "destroy"
    command = <<EOF
    aws athena start-query-execution \
      --output json \
      --query-string DROP VIEW IF EXISTS ${each.key} \
      --query-execution-context "Database=${var.athena_database}" \
      --result-configuration "OutputLocation=s3://${aws_s3_bucket.my-bucket.bucket}"
EOF
  }
}

Note also then when= "destroy" block to ensure the views are dropped when your stack is torn down.

Place text files with a SELECT query below your module path under a directory (athenaview/ in this example), and it will pick them up and create views. This will create views named subfolder_filename, and destroy them if the files are removed.

[PDF] Amazon Athena - User Guide - AWS Documentation, package aws.example.athena; public class ExampleConstants { public static final see //https://docs.aws.amazon.com/athena/latest/ug/work-with-data.html public The AthenaClientFactory.java class shows how to create and configure an  Open the Athena console at https://console.aws.amazon.com/athena/. In the Query Editor, under Database, choose Create table, and then choose from AWS Glue crawler.

To add to the answers by JD D and Theo, working with their solutions, we have figured out how to invoke the AWS Cli via terraform in the following:

resource "null_resource" "athena_view" {

  provisioner "local-exec" {
    command = <<EOF
aws sts assume-role \
  --output json \
  --region my_region \
  --role-arn arn:aws:iam::${var.account_number}:role/my_role \
  --role-session-name create_my_view > /tmp/credentials.json

export AWS_SESSION_TOKEN=$(jq -r '.Credentials.SessionToken' /tmp/credentials.json)
export AWS_ACCESS_KEY_ID=$(jq -r '.Credentials.AccessKeyId' /tmp/credentials.json)
export AWS_SECRET_ACCESS_KEY=$(jq -r '.Credentials.SecretAccessKey' /tmp/credentials.json)

aws athena start-query-execution \
  --output json \
  --region my_region \
  --query-string "CREATE OR REPLACE VIEW my_view AS SELECT * FROM my_table \
  --query-execution-context "Database=${var.database_name}" \
  --result-configuration "OutputLocation=s3://${aws_s3_bucket.my-bucket.bucket}"
EOF
  }
}

We use null_resource ... to run provisioners that aren't directly associated with a specific resource.

The result of aws sts assume-role is outputted as JSON into /tmp/credentials.json.

jq is used to parse the necessary fields out of the output of aws sts assume-role .

aws athena start-query-execution is then able to execute under the role specified by the environment variables defined.

Instead of --result-configuration "OutputLocation=s3://...., --work-group can be specified, NOTE that this is a separate flag on start-query-execution, not part of the --result-configuration string.

Code Samples - Amazon Athena - AWS Documentation, Create, view, and delete tables. Filter tables by starting to type their names. Preview tables and generate CREATE TABLE DDL for them. Show table  To get started with Amazon Athena, simply log into the AWS Management Console for Athena and create your schema by writing DDL statements on the console or by using a create table wizard. You can then start querying data using a built-in query editor. Athena queries data directly from Amazon S3 so there’s no loading required.

Addition to Theo's answer: In the base64 encoded JSON file, the type "string" is not valid when defining the cloumn attributes! Always write "varchar" at this point.

edit: Also "int" must be declared as "integer"!

I went with the solution by Theo and it worked using AWS Cloud Formation Templates.

I just wanted to add a little hint, that can save you hours of debugging. I am not writing this as a comment, because I don't have rights to comment yet. Feel free to copy&paste this into the comment section of Theo's answer.

Amazon Athena FAQs, To access and view query output files, IAM principals (users and roles) need permission to the Data manifest files are generated to track files that Athena creates in Amazon S3 data To retrieve and save query history programmatically. In all other queries, Athena uses the INTEGER data type, where INTEGER is represented as a 32-bit signed value in two's complement format, with a minimum value of-2^31 and a maximum value of 2^31-1. In the JDBC driver, INTEGER is returned, to ensure compatibility with business analytics applications.

Accessing Amazon Athena, CHAR . Fixed length character data, with a specified length between 1 and 255, such as char(10) . For more information, see CHAR  When using Athena with the AWS Glue Data Catalog, you can use AWS Glue to create databases and tables (schema) to be queried in Athena, or you can use Athena to create schema and then use them in AWS Glue and related services. This topic provides considerations and best practices when using either method.

Working with Query Results, Output Files, and Query History, Athena does not support transaction-based operations (such as the ones found in Hive or Presto) on table data. For a full list of keywords not supported, see  Essentially, you are going to be creating a mapping for each field in the log to a corresponding column in your results. If you are familiar with Apache Hive, you might find creating tables on Athena to be pretty similar. You can create tables by writing the DDL statement in the query editor or by using the wizard or JDBC driver. An important

CREATE TABLE - Amazon Athena, Amazon Athena is a serverless query engine for data on Amazon S3. Many customers use Athena to query application and service logs, schedule automated reports, and integrate with their applications, enabling new analytics-based capabilities. Different types of users rely on Athena, including business analysts, data scientists, security, and operations engineers. In this post, I show you how to

Comments
  • Thank you JD! This is perfect! Worked like a charm, great documentation :D
  • With the Query result location set on my work group I was able to replace --result-configuration with --work-group.
  • To add to this: struct column data types in Athena need to be mapped to row in the Presto definition JSON e.g. type = "struct<col1:string>" in the Terraform/Glue definition maps to "type": "row(col1 varchar)" in the Presto view definition.
  • @NathanGriffiths right you are, I had written struct instead of row, now fixed in my answer.
  • Just as a couple of additional notes from my learnings when implementing this answer which hopefully will help others. Columns in all 3 representations of the table must be in the same order (stale view otherwise). Columns must be cast in the originalSQL to match that denoted in the presto columns. (stale view). I also misread the answer and thought Presto would add the prefix and base64 encode my JSON for me, but that's not the case. originalText = addPrefixSuffix(base64(JSON.stringify(exampleObjectabove)))
  • Thank you @Joshua Samuel Great addition! I believe we've added a good bit of documentation in this area.
  • I like this approach, but : Error: Error running command 'aws athena start-query-execution --query-string "CREATE OR REPLACE VIEW Query1 AS SELECT ac, region FROM meta.getresources" --output json --query-execution-context "Database=meta_resources" --result-configuration "OutputLocation=s3://query-log" ': exit status 255. Output: usage: aws [options] <command> <subcommand> [<subcommand> ...] [parameters] aws help Unknown options: REPLACE, VIEW, Query1, AS, SELECT, ac,, region, FROM, meta.getresources", OR ... however, if I copy the SQL from CMD output, it runs in my SQL Client
  • @SimonB you have to wrap --query-string parameter value in quotes, e.g. --query-string 'CREATE OR REPLACE VIEW...' but even better it is to make AWS CLI load source files, instead of loading them in Terraform: --query-string file://${each.value}
  • @MaciejMajewski Yes I had it wrapped, have done with double and single quotes, same error. Also loaded entire 'Create' statement from file. WHat version are you on ? Terraform v0.12.20
  • @SimonB I am on Terraform v0.12.21. Hard to say, with file:// it works well for us
  • No problem! Glad Theo's very details answer helped!
  • I've fixed my answer so that it says varchar in the appropriate place.