{"id":31805266,"url":"https://github.com/nyzl/cj-data","last_synced_at":"2025-10-11T02:47:49.857Z","repository":{"id":40976382,"uuid":"220204822","full_name":"Nyzl/cj-data","owner":"Nyzl","description":"a data pipeline built in Python and run using Google Cloud Run ","archived":false,"fork":false,"pushed_at":"2022-12-08T06:50:30.000Z","size":134,"stargazers_count":4,"open_issues_count":11,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-02-01T16:26:23.415Z","etag":null,"topics":["data-pipeline","flask","google-cloud","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nyzl.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"docs/CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"docs/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-11-07T09:58:31.000Z","updated_at":"2023-03-31T11:14:32.000Z","dependencies_parsed_at":"2023-01-24T17:30:34.569Z","dependency_job_id":null,"html_url":"https://github.com/Nyzl/cj-data","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Nyzl/cj-data","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nyzl%2Fcj-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nyzl%2Fcj-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nyzl%2Fcj-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nyzl%2Fcj-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nyzl","download_url":"https://codeload.github.com/Nyzl/cj-data/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nyzl%2Fcj-data/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279005957,"owners_count":26084004,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-11T02:00:06.511Z","response_time":55,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-pipeline","flask","google-cloud","python"],"created_at":"2025-10-11T02:47:46.754Z","updated_at":"2025-10-11T02:47:49.841Z","avatar_url":"https://github.com/Nyzl.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Content prioritisation data pipeline\n\nThis is a Python project that collects data from various sources and sends them to Big Query.\nA mini data pipeline type of thing.\n\n## Getting Started\n\nThese instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See [deployment](#deployment-to-google-cloud-run) for notes on how to deploy the project. \n\nPlease read if you plan on contributing to the project:\n[Code of conduct for this project](docs/CODE_OF_CONDUCT.md)\nand\n[Contribution guidelines for this project](docs/CONTRIBUTING.md)\n\n### Prerequisites\n\nYou will need a Google Cloud account, [Google Cloud SDK](https://cloud.google.com/sdk) and [Docker](https://www.docker.com/).\nMake sure you hace gcloud installed and run `gcloud auth configure-docker`\n\n## Environments\n\n### Installing locally\n\nTo use a local development environment you will have to download a new service account keyfile that has read permission to Google Cloud Storage.\nYou will also have to set the environment variable `GOOGLE_APPLICATION_CREDENTIALS` to the location of that keyfile.\neg `export GOOGLE_APPLICATION_CREDENTIALS=/path/to/file.json`\n\n### Hosted environjment on Goog Cloud Run\n\nA Dockerfile is used to define the hosted environment on Google CLoud run \n\nThe Dockerfile details all the required environment variables:\n\n`gcp_project` this is the Google Cloud project\n\n`bq_dataset` this is the data set to send data to\n\n`advisernet_ga` this is used with `ga_data.py` to get GA data for Advisernet\n\n`public_ga` this is used with `ga_data.py` to get GA data for the Public site\n\n`all_ga` this is used with `ga_data.py` to get GA data for all sites\n\n\nThe contents of folders `creds` and `store` will not be committed to git or included in the Docker image. The intention is that `creds` can be used to locally store credential files and `store` can be used as a local store for data files.\n\n\n## Deployment to Google Cloud Run\n\nDeployment is handled via the Makefile:\n\n`make build` - Builds the image on [Google Container Repository](https://cloud.google.com/container-registry)\n\n`make deploy` - Deploys the image on [Google Cloud Run](https://cloud.google.com/run)\n\n`make dev-build` - Builds a development image on Google Container Repository\n\n`make dev-deploy` - Deploys the development image and overwrites the env variable for the BQ dataset to write to test tables rather than writing to the production tables\n\n\n\n## The code\n\nthis bit will explain how it all works, but it's yet to be written\n\n\n## Authors\n\n**Ian Ansell** - *Initial work* - [Nyzl](https://github.com/Nyzl)\n\nSee also the list of [contributors](https://github.com/your/project/contributors) who participated in this project.\n\n## License\n\nThis project is licensed under the GNU License - see the [LICENSE.md](LICENSE.md) file for details\n\n## Acknowledgments\n\n[Alec Johnson](https://github.com/MrAlecJohnson) for helping with the alpha of this codebase and for being a general sounding board throughout the development.\n[Daniel Nissenbaum](https://github.com/danielnissenbaum) for help getting the code and documentation into something approaching maintainable\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnyzl%2Fcj-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnyzl%2Fcj-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnyzl%2Fcj-data/lists"}