How to structure views in bigquery for efficient access management

bigquery views vs tables
bigquery authorized views
share bigquery dataset between projects
bigquery views version control
bigquery information_schema
bigquery describe table schema
bigquery data structure
bigquery nested views

In BigQuery you give users/roles (or authorized views) access on dataset-level and not views/table-level. The challenge I want to address is how to manage access control in bigquery when I have hundreds of tables and views and many different roles/departments that should have access to both views shared across all departments and views only for a particular role/department?

Example: let's say I have a source dataset with source tables A->D and three views for each table exposing different fields based on sensitivity of data 1->3. Also, I have three roles (Blue, Green, Red). If I could manage access on table-level it would look like this:

View: roles

  • A1: Blue, Red

  • A2: Red

  • A3: Red

  • B1: Blue, Green, Red

  • B2: Green, Red

  • B3: Red

  • C1: Green, Red

  • C2: Green, Red

  • C3: Red

  • D1: Red

  • D2: Red

  • D3: Red

Given these requirements, I can't create datasets based only on sensitivity (1-3) or source (A-D) and manage access based on that. The only solution I can see that meet this is generating a dataset per role. This could be done manually if the number of roles and views are few, but when managing 10+ roles and 50+ views it becomes more challenging.

The only solution I can come up with is a CI/CD setup (cloud build) with file/s defining datasets (i.e. roles), dependencies and DDL-statement/s. Letting a script/program iterate through the file/s and generate views and give access (authorized view) to source. Example file:

{"roles":["crm_analyst", "admin", "customer_service_agent"],
"ddl":"CREATE VIEW `myproject.'{role}'.newview` AS SELECT column_1, column_2, column_3 FROM myproject.mydataset.myview",
"dependencies":"myproject.mydataset.myview"}

How do other companies solve this? There are large banks that have migrated to bigquery that must have loads of departments and different sensitivity of data sets.

I ended up writing a python script that reads view definitions from json-files and then generate datasets and views and give correct access rights. The solution is a bit rough and could make use of dependency mapping (when a view queries another view) instead of the current solution iterating views until all views are generated or the script can't generate anymore views (broken dependencies). The script generates two datasets per group, one with READER (suffix '_ro') and one with WRITER (suffix '_rw') to make sure that views generated by data team can't be modified and at the same time give a sandbox for the group. The group should be an e-mail group and the name of datasets will be the local-part of the email address. The script is executed by google cloud build and triggered by a push to our github repo.

Example view definition (path: views/view_test.json)

{
    "groups":["developers@datahem.org", "analysts@datahem.org"],
    "sql":"SELECT * FROM `{project}.shared_views.test_view`"
}

Generates the following datasets (access) and views:

analysts_ro (analysts@datahem.org:READER):
- view_test

analysts_rw (analysts@datahem.org:WRITER):
(empty)

developers_ro (developers@datahem.org:READER):
- view_test

developers_rw (developers@datahem.org:WRITER):
(empty)

shared_views (analysts_ro.view_test:None, developers_ro.view_test:None):
- test_view

I made the python script available on github as open source as part of datahem, feel free to clone, improve and use for your own purposes.

Introduction to views | BigQuery, The challenge I want to address is how to manage access control in bigquery when I have hundreds of tables and views and many different roles/departments​  You can create a view in BigQuery by: Using the Cloud Console or classic BigQuery web UI; Using the command line tool's bq mk command; Calling the tables.insert API method; Using the client libraries; Submitting a CREATE VIEW Data Definition Language (DDL) statement; View naming. When you create a view in BigQuery, the view name must be unique per dataset.

Streaming data into BigQuery, .cloud.google.com/ and select one of your funnel_overview_ tables. Hit the "Query Table" button and set up a SQL query that will match your needs. To run the query, first click "Show Options" and make sure the option labeled "Use Legacy SQL" is not enabled. This page provides an overview of views in BigQuery. Introduction. A view is a virtual table defined by a SQL query. When you create a view, you query it in the same way you query a table. When a user queries the view, the query results contain data only from the tables and fields specified in the query that defines the view.

Another option would be to set up row-level access and put all views in the same dataset.

Mockup an access_control table (user, usergroups) for example purpose:

SELECT 'userA@datahem.org' as user_name, ['developer','analyst'] as user_groups
UNION ALL
SELECT 'userB@datahem.org' as user_name, ['developer'] as user_groups

And create a view that has row-level access control by adding a static column with array of user_groups and join with the access_control "table" where at least one of the current user's groups match the allowed_groups:

SELECT c.* EXCEPT(allowed_groups) FROM (
  SELECT OrderReference, Date, ['developer', 'analyst'] AS allowed_groups 
  FROM `project.dataset.orders`) as c
INNER JOIN (
  SELECT user_name, user_group 
  FROM  `project.access.access_control`, UNNEST(user_groups) as user_group 
  WHERE SESSION_USER() = user_name) g
ON g.user_group IN UNNEST(c.allowed_groups)

It is a nice solution, however it exposes all views to a user even if the user doesn't have access to it. Also, the user will be able to run queries against a view he/she doesn't have access to (generating cost) but won't get any results back. From a usability perspective (only showing views a user has access to) we chose the solution marked above.

What are two of the benefits of using denormalized data structures in BigQuery? Giving a view access to a dataset is also known as creating an authorized view in BigQuery. An authorized view lets you share query results with particular users and groups without giving them access to the underlying tables. You can also use the view's SQL query to restrict the columns (fields) the users are able to query.

Which property does BigQuery use to de duplicate data in a streaming job? An introduction to BigQuery views. Creating views. How to create views. Controlling access to views. How to control access to views. Creating authorized views. How to create a view that allows you to share query results with users and groups without giving them access to the underlying tables. Listing views. How to list views. Getting

For more information on Cloud IAM roles and permissions in BigQuery, see Access control. Getting view information. Getting information about views is identical to  You can only delete one table at a time by using the Cloud Console, the classic BigQuery web UI, the command-line tool, the API, or the client libraries. When exporting table data, the only supported destination is Cloud Storage. As you approach 50,000 or more tables in a dataset, enumerating them becomes slower.

Read Introduction to views to learn about BigQuery views. Read IAM overview to learn the basic IAM concepts. Read Managing policies to learn  Teams. Q&A for Work. Stack Overflow for Teams is a private, secure spot for you and your coworkers to find and share information.

Comments
  • Wondering what are good solutions outside of BigQuery to model this. Maybe that could drive a feature request
  • @felipehoffa I guess most companies put the ACL in the application layer that connects to BigQuery, i.e. BI-tools. But I want users to be able to connect to BigQuery with whatever tool (tableau, data studio, collab, etc.) they prefer and still be certain that they only can access data that they have permission to. That’s the reason for this question.
  • Thanks Nathan, you confirm my thoughts about the structure of datasets, tables and views. But how do you build those? Manually or automated in a CI/CD setup? I’m also thinking about setting up a sandbox dataset for each role/group to let users create and experiment with views in that one.
  • You can use the BigQuery API or the client libraries to do automated.