https://github.com/nyzl/cj-data
a data pipeline built in Python and run using Google Cloud Run
https://github.com/nyzl/cj-data
data-pipeline flask google-cloud python
Last synced: 27 days ago
JSON representation
a data pipeline built in Python and run using Google Cloud Run
- Host: GitHub
- URL: https://github.com/nyzl/cj-data
- Owner: Nyzl
- License: gpl-3.0
- Created: 2019-11-07T09:58:31.000Z (almost 6 years ago)
- Default Branch: master
- Last Pushed: 2022-12-08T06:50:30.000Z (almost 3 years ago)
- Last Synced: 2024-02-01T16:26:23.415Z (almost 2 years ago)
- Topics: data-pipeline, flask, google-cloud, python
- Language: Python
- Homepage:
- Size: 131 KB
- Stars: 4
- Watchers: 2
- Forks: 2
- Open Issues: 11
-
Metadata Files:
- Readme: README.md
- Contributing: docs/CONTRIBUTING.md
- License: LICENSE
- Code of conduct: docs/CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# Content prioritisation data pipeline
This is a Python project that collects data from various sources and sends them to Big Query.
A mini data pipeline type of thing.
## Getting Started
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See [deployment](#deployment-to-google-cloud-run) for notes on how to deploy the project.
Please read if you plan on contributing to the project:
[Code of conduct for this project](docs/CODE_OF_CONDUCT.md)
and
[Contribution guidelines for this project](docs/CONTRIBUTING.md)
### Prerequisites
You will need a Google Cloud account, [Google Cloud SDK](https://cloud.google.com/sdk) and [Docker](https://www.docker.com/).
Make sure you hace gcloud installed and run `gcloud auth configure-docker`
## Environments
### Installing locally
To use a local development environment you will have to download a new service account keyfile that has read permission to Google Cloud Storage.
You will also have to set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` to the location of that keyfile.
eg `export GOOGLE_APPLICATION_CREDENTIALS=/path/to/file.json`
### Hosted environjment on Goog Cloud Run
A Dockerfile is used to define the hosted environment on Google CLoud run
The Dockerfile details all the required environment variables:
`gcp_project` this is the Google Cloud project
`bq_dataset` this is the data set to send data to
`advisernet_ga` this is used with `ga_data.py` to get GA data for Advisernet
`public_ga` this is used with `ga_data.py` to get GA data for the Public site
`all_ga` this is used with `ga_data.py` to get GA data for all sites
The contents of folders `creds` and `store` will not be committed to git or included in the Docker image. The intention is that `creds` can be used to locally store credential files and `store` can be used as a local store for data files.
## Deployment to Google Cloud Run
Deployment is handled via the Makefile:
`make build` - Builds the image on [Google Container Repository](https://cloud.google.com/container-registry)
`make deploy` - Deploys the image on [Google Cloud Run](https://cloud.google.com/run)
`make dev-build` - Builds a development image on Google Container Repository
`make dev-deploy` - Deploys the development image and overwrites the env variable for the BQ dataset to write to test tables rather than writing to the production tables
## The code
this bit will explain how it all works, but it's yet to be written
## Authors
**Ian Ansell** - *Initial work* - [Nyzl](https://github.com/Nyzl)
See also the list of [contributors](https://github.com/your/project/contributors) who participated in this project.
## License
This project is licensed under the GNU License - see the [LICENSE.md](LICENSE.md) file for details
## Acknowledgments
[Alec Johnson](https://github.com/MrAlecJohnson) for helping with the alpha of this codebase and for being a general sounding board throughout the development.
[Daniel Nissenbaum](https://github.com/danielnissenbaum) for help getting the code and documentation into something approaching maintainable