https://github.com/tomgorb/row-access-policies
Implementing row-level security in BigQuery to restrict data visibility for individual users.
https://github.com/tomgorb/row-access-policies
Last synced: about 1 month ago
JSON representation
Implementing row-level security in BigQuery to restrict data visibility for individual users.
- Host: GitHub
- URL: https://github.com/tomgorb/row-access-policies
- Owner: tomgorb
- Created: 2024-07-31T15:27:12.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-07T11:57:09.000Z (almost 2 years ago)
- Last Synced: 2025-02-26T12:23:58.857Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 809 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# (Proof of Concept) Row Access Policies
> Implementing row-level security in BigQuery to restrict data visibility for individual users.
## CODE
### Environments Variables
```shell
source env.sh
```
It will cause your variables to be set in the current shell otherwise bash will open a non-interactive shell.
Content of *env.sh*
- ```export GCP_PROJECT_ID='myProject'```
- ```export GCP_PROJECT_ID_VIEW='myProjectForViews'```
- ```export GCP_SA=$(cat myServiceAccount.json)```
- ```export GCP_USER='firstname.lastname@gmail.com'```
- ```export POC_PWD='myPassword'```
#### **GCP_SA**
GCP Service Account Role(s):
- **GCP_PROJECT_ID**
- BigQuery Admin
- **GCP_PROJECT_ID_VIEW**
- BigQuery Data Editor
- Security Admin
#### **GCP_USER**
> Email addresses and domains must be associated with an active **Google Account**, **Google Workspace account**, or **Cloud Identity account**.
### DATA
In the example data, there is a field **identifier** (a numerical string) which will be used for partitioning.
***main.py*** will load these data and prepare it:
- copy data with an integer-based partitioning (max 10,000 partitions) on **index** using *farm fingerprinting* on **identifier** ;
- split data into **n** parts (*params.yaml*) to somehow bypass the 100-row access policy limit per table ;
- add a default row access policy on 0=1 so that by default no one can see the data.
### Web App
***app.py*** is a Streamlit application to:
- grant, revoke and check accesses for a specific user (email) ;

- check IAM policies ;

- (de)authorize view.

The code is using a BigQuery *user defined function* **atoz** available [here](https://github.com/tomgorb/gcp-terraform-examples/blob/main/bigquery_udf/sql/atoz.sql).
### Dashboard
Create a dashboard in **Looker Studio** using the view *insights* as source.
**DO NOT FORGET** to change Data Credentials to ```Viewer```. Otherwise your (```Owner```) credentials will be used (potential data leak).
Nevertheless if **you** $\Leftrightarrow$ **GCP_USER**, in principle, you should not see any data since there is a policy on 0=1 (kind of built-in security).
You should share the report as follow
```
Unlisted
Anyone on the internet with the link can view Viewer
```
- After granting access, you should see the effects almost immediately.
- After revoking access, you should wait a couple of minutes or wait for data refreshening.

