Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/unytics/bigfunctions
Supercharge BigQuery with BigFunctions
https://github.com/unytics/bigfunctions
bigquery data data-analytics data-engineering data-visualization data-warehouse
Last synced: 6 days ago
JSON representation
Supercharge BigQuery with BigFunctions
- Host: GitHub
- URL: https://github.com/unytics/bigfunctions
- Owner: unytics
- License: mit
- Created: 2022-08-24T07:41:54.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-09T13:20:06.000Z (13 days ago)
- Last Synced: 2025-01-09T15:10:10.972Z (13 days ago)
- Topics: bigquery, data, data-analytics, data-engineering, data-visualization, data-warehouse
- Language: Python
- Homepage: https://unytics.io/bigfunctions/
- Size: 12.3 MB
- Stars: 654
- Watchers: 7
- Forks: 63
- Open Issues: 28
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
Supercharge BigQuery
with BigFunctions
Upgrade your data impact
with 100+ ready-to-use BigQuery Functions
(+ build a catalog of functions)
---
## 🔍️ 1. What is BigFunctions?
BigFunctions is:
✅ **a framework** to build a **governed catalog of powerful BigQuery functions** at YOUR company.
✅ **100+ open-source functions to supercharge BigQuery** that you can call directly (no install) or redeploy in YOUR catalog.
## 💡 2. Why BigFunctions?
**As a data-analyst**
You'll have new powers! *(such as loading data from any source or activating your data through reverse ETL)*.**As an analytics-engineer**
You'll feel at home with BigFunctions style which imitates the one of dbt *(with a yaml standard and a CLI)*.
You'll love the idea of getting more things done through SQL.**As a data-engineer**
You'll easily build software-engineering best practices through unit testing, cicd, pull request validation, continuous deployment, etc.
You will love avoiding reinventing the wheel by using functions already developed by the community.**As a central data-team player in a large company**
You'll be proud of providing a governed catalog of curated functions to your 10000+ employees with mutualized and maintainable effort.**As a security champion**
You will enjoy the ability to validate the code of functions before deployment thanks to your git validation workflow, CI Testing, binary authorization, etc.**As an open-source lover**
You'll be able to contribute so that a problem solved for you is solved for everyone.
## 👀 3. Call public BigFunctions without install from your GCP project
All BigFunctions represented by a 'yaml' file in *bigfunctions* folder of the GitHub repo are automatically deployed in public datasets so that you can call them directly without install from your BigQuery project.
Give it a try! Execute this SQL query from your GCP Project 👀:
```sql
select bigfunctions.eu.faker("name", "it_IT")
```Explore all available bigfunctions **here**.
## 🚀 4. Deploy BigFunctions in your GCP project
You can also deploy any bigfunction in your project! To deploy *my_bigfunction* defined in *bigfunctions/my_bigfunction.yaml* file, simply call:
``` sh
bigfun deploy my_bigfunction
```Details about `bigfun` command line are given below.
## 💥 5. `bigfun` CLI
`bigfun` CLI (command-line-interface) facilitates BigFunctions development, test, deployment, documentation and monitoring.
### 5.1 Install `bigfun` 🛠️
``` sh
pip install bigfunctions
```### 5.2 Use `bigfun` 🔥
``` sh
$ bigfun --help
Usage: bigfun [OPTIONS] COMMAND [ARGS]...Options:
--help Show this message and exit.Commands:
deploy Deploy BIGFUNCTION
docs Generate, serve and publish documentation
get Download BIGFUNCTION yaml file from unytics/bigfunctions...
test Test BIGFUNCTION
```### 5.3 Create you first function 👷
Functions are defined as yaml files under `bigfunctions` folder. To create your first function locally, the easiest is to download an existing yaml file of unytics/bigfunctions Github repo.
For instance to download `is_email_valid.yaml` into bigfunctions folder, do:
```sh
bigfun get is_email_valid
```You can then update the file to suit your needs.
### 5.4 Deploy you first function 👨💻
> 1. Make sure the `gcloud` command is [installed on your computer](https://cloud.google.com/sdk/docs/install)
> 2. Activate the application-default account with `gcloud auth application-default login`. A browser window should open, and you should be prompted to log into your Google account. Once you've done that, `bigfun` will use your oauth'd credentials to connect to BigQuery through BigQuery python client!
> 3. Get or create a `DATASET` where you have permission to edit data and where the function will be deployed.
> 4. The `DATASET` must belong to a `PROJECT` in which you have permission to run BigQuery queries.You now can deploy the function `is_email_valid` defined in `bigfunctions/is_email_valid.yaml` yaml file by running:
```sh
bigfun deploy is_email_valid
```> The first time you run this command it will ask for `PROJECT` and `DATASET`.
>
> Your inputs will be written to `config.yaml` file in current directory so that you won't be asked again (unless you delete the entries in `config.yaml`). You can also override this config at deploy time: `bigfun deploy is_email_valid --project=PROJECT --dataset=DATASET`.Test it with 👀:
```sql
select PROJECT.DATASET.is_email_valid('[email protected]')
```
### 5.5 Deploy you first javascript function which depends on *npm packages* 👽
*To deploy a **javascript** function* which depends on **npm packages** there are additional requirements *in addition to the ones above*.
> 1. You will need to install each *npm package* on your machine and bundle it into one file. For that, you need to [install *nodejs*](https://nodejs.org/en/download/).
> 2. The bundled js file will be uploaded into a cloud storage bucket in which you must have write access. The bucket name must be provided in `config.yaml` file in a variable named `bucket_js_dependencies`. Users of your functions must have read access to the bucket.You now can deploy the function `render_template` defined in `bigfunctions/render_template.yaml` yaml file by running:
```sh
bigfun deploy render_template
```Test it with 👀:
```sql
select PROJECT.DATASET.render_template('Hello {{ user }}', json '{"user": "James"}')
```
### 5.6 Deploy you first *remote* function ⚡️
*To deploy a **remote** function* (e.g. python function), there are additional requirements *in addition to the ones of **Deploy you first function** section*.
> 1. A *Cloud Run* service will be deployed to host the code ([as seen here](https://cloud.google.com/bigquery/docs/reference/standard-sql/remote-functions)). So you must have [permissions to deploy a *Cloud Run* service](https://cloud.google.com/run/docs/deploying-source-code#permissions_required_to_deploy) in your project `PROJECT`.
> 2. `gcloud` CLI will be used directly to deploy the service (using `gcloud run deploy`). Then, make sure you are logged in with `gcloud` by calling: `gcloud auth login`. A browser window should also open, and you should be prompted to log into your Google account. WARNING: you read correctly: you have to authenticate twice. Once for bigquery python client (to deploy any function including remote as seen above.) and once now to use `gcloud` (to deploy a *Cloud Run* service).
> 3. A *BigQuery Remote Connection* will be created to link BigQuery with the *Cloud Run* service. You then should have permissions to create a remote connection. *[BigQuery Connection Admin](https://cloud.google.com/bigquery/docs/access-control#bigquery.connectionAdmin)* or *[BigQuery Admin](https://cloud.google.com/bigquery/docs/access-control#bigquery.admin)* roles have these permissions.
> 4. A service account will be automatically created by Google along with the *BigQuery Remote Connection*. BigQuery will use this service account of the remote connection to invoke the *Cloud Run* service. You then must have the permission to authorize this service account to invoke the *Cloud Run* service. This permission is provided in the role *[roles/run.admin](https://cloud.google.com/run/docs/reference/iam/roles)*You now can deploy the function `faker` defined in `bigfunctions/faker.yaml` yaml file by running:
```sh
bigfun deploy faker
```Test it with 👀:
```sql
select PROJECT.DATASET.faker("name", "it_IT")
```
## ❓ 6. FAQ
How to correctly highlight
sql
,python
andjavascript
code in yaml files?In yaml files multiline string are by default highlighted as strings. That makes reading
code
field hard to read (with all code in the same string color). To correctly highlight the code regarding its python / javascript / sql syntax, you can install YAML Embedded Languages VSCode extension.
## 👋 7. Contribute
BigFunctions is fully open-source. Any contribution is more than welcome 🤗!
- Add a ⭐ on the repo to show your support
- [Join our Slack](https://join.slack.com/t/unytics/shared_invite/zt-1gbv491mu-cs03EJbQ1fsHdQMcFN7E1Q) and talk with us
- Suggest a new function [here](https://github.com/unytics/bigfunctions/issues/new?assignees=&labels=new-bigfunction&projects=&template=0_new_bigfunction.yaml&title=%5Bnew%5D%3A+%60function_name%28argument1%2C+argument2%29%60)
- Raise an issue [there](https://github.com/unytics/bigfunctions/issues/new/choose)
- Open a Pull-Request! (See [contributing instructions](https://github.com/unytics/bigfunctions/blob/main/CONTRIBUTING.md)).
**Contributors**