{"id":20093116,"url":"https://github.com/teaxyz/chai","last_synced_at":"2025-05-16T18:06:48.386Z","repository":{"id":259970706,"uuid":"864256775","full_name":"teaxyz/chai","owner":"teaxyz","description":"tea’s package dataset","archived":false,"fork":false,"pushed_at":"2025-05-14T17:54:30.000Z","size":800,"stargazers_count":191,"open_issues_count":10,"forks_count":101,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-05-14T18:51:50.115Z","etag":null,"topics":["data","packages"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/teaxyz.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-09-27T19:44:45.000Z","updated_at":"2025-05-10T13:37:40.000Z","dependencies_parsed_at":"2025-01-19T20:08:08.697Z","dependency_job_id":"722f83a1-0f61-4043-bc0e-4c5b340beafc","html_url":"https://github.com/teaxyz/chai","commit_stats":null,"previous_names":["teaxyz/chai"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teaxyz%2Fchai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teaxyz%2Fchai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teaxyz%2Fchai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/teaxyz%2Fchai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/teaxyz","download_url":"https://codeload.github.com/teaxyz/chai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254582905,"owners_count":22095518,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data","packages"],"created_at":"2024-11-13T16:45:55.913Z","updated_at":"2025-05-16T18:06:48.380Z","avatar_url":"https://github.com/teaxyz.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CHAI\n\nCHAI is an attempt at an open-source data pipeline for package managers. The\ngoal is to have a pipeline that can use the data from any package manager and\nprovide a normalized data source for myriads of different use cases.\n\n## Getting Started\n\nUse [Docker](https://docker.com)\n\n1. Install Docker\n2. Clone the chai repository (https://docs.github.com/en/repositories/creating-and-managing-repositories/cloning-a-repository)\n3. Using a terminal, navigate to the cloned repository directory\n4. Run `docker compose build` to create the latest Docker images\n5. Then, run `docker compose up` to launch.\n\n\u003e [!NOTE]\n\u003e\n\u003e This will run CHAI for all package managers. As an example crates by\n\u003e itself will take over an hour and consume \u003e5GB storage.\n\u003e\n\u003e Currently, we support only two package managers:\n\u003e\n\u003e - crates\n\u003e - Homebrew\n\u003e\n\u003e You can run a single package manager by running\n\u003e `docker compose up -e ... \u003cpackage_manager\u003e`\n\u003e\n\u003e We are planning on supporting `NPM`, `PyPI`, and `rubygems` next.\n\n### Arguments\n\nSpecify these eg. `FOO=bar docker compose up`:\n\n- `FREQUENCY`: Sets how often (in hours) the pipeline should run.\n- `TEST`: Runs the loader in test mode when set to true, skipping certain data insertions.\n- `FETCH`: Determines whether to fetch new data from the source when set to true.\n- `NO_CACHE`: When set to true, deletes temporary files after processing.\n\n\u003e [!NOTE]\n\u003e The flag `NO_CACHE` does not mean that files will not get downloaded to your local\n\u003e storage (specifically, the ./data directory). It only means that we'll\n\u003e delete these temporary files from ./data once we're done processing them.\n\nThese arguments are all configurable in the `docker-compose.yml` file.\n\n### Docker Services Overview\n\n1. `db`: [PostgreSQL] database for the reduced package data\n2. `alembic`: handles migrations\n3. `package_managers`: fetches and writes data for each package manager\n4. `api`: a simple REST API for reading from the db\n\n### Hard Reset\n\nStuff happens. Start over:\n\n`rm -rf ./data`: removes all the data the fetcher is putting.\n\n\u003c!-- this is handled now that alembic/psycopg2 are in pkgx --\u003e\n\u003c!--\n## Alembic Alternatives\n\n- sqlx command line tool to manage migrations, alongside models for sqlx in rust\n- vapor's migrations are written in swift\n--\u003e\n\n## Goals\n\nOur goal is to build a data schema that looks like this:\n\n![db/CHAI_ERD.png](db/CHAI_ERD.png)\n\nYou can read more about specific data models in the dbs [readme](db/README.md)\n\nOur specific application extracts the dependency graph understand what are\ncritical pieces of the open-source graph. We also built a simple example that displays\n[sbom-metadata](examples/sbom-meta) for your repository.\n\nThere are many other potential use cases for this data:\n\n- License compatibility checker\n- Developer publications\n- Package popularity\n- Dependency analysis vulnerability tool (requires translating semver)\n\n\u003e [!TIP]\n\u003e Help us add the above to the examples folder.\n\n## FAQs / Common Issues\n\n1. The database url is `postgresql://postgres:s3cr3t@localhost:5435/chai`, and\n   is used as `CHAI_DATABASE_URL` in the environment. `psql CHAI_DATABASE_URL`\n   will connect you to the database.\n\n## Deployment\n\n```sh\nexport CHAI_DATABASE_URL=postgresql://\u003cuser\u003e:\u003cpw\u003e@host.docker.internal:\u003cport\u003e/chai\nexport PGPASSWORD=\u003cpw\u003e\ndocker compose up alembic\n```\n\n## Tasks\n\nThese are tasks that can be run using [xcfile.dev]. If you use `pkgx`, typing\n`dev` loads the environment. Alternatively, run them manually.\n\n### reset\n\n```sh\nrm -rf db/data data .venv\n```\n\n### build\n\n```sh\ndocker compose build\n```\n\n### start\n\nRequires: build\n\n```sh\ndocker compose up -d\n```\n\n### test\n\nEnv: TEST=true\nEnv: DEBUG=true\n\n```sh\ndocker compose up\n```\n\n### full-test\n\nRequires: build\nEnv: TEST=true\nEnv: DEBUG=true\n\n```sh\ndocker compose up\n```\n\n### stop\n\n```sh\ndocker compose down\n```\n\n### logs\n\n```sh\ndocker compose logs\n```\n\n### db-start\n\nRuns migrations and starts up the database\n\n```sh\ndocker compose build --no-cache db alembic\ndocker compose up alembic -d\n```\n\n### db-reset\n\nRequires: stop\n\n```sh\nrm -rf db/data\n```\n\n### db-generate-migration\n\nInputs: MIGRATION_NAME\nEnv: CHAI_DATABASE_URL=postgresql://postgres:s3cr3t@localhost:5435/chai\n\n```sh\ncd alembic\nalembic revision --autogenerate -m \"$MIGRATION_NAME\"\n```\n\n### db-upgrade\n\nEnv: CHAI_DATABASE_URL=postgresql://postgres:s3cr3t@localhost:5435/chai\n\n```sh\ncd alembic\nalembic upgrade head\n```\n\n### db-downgrade\n\nInputs: STEP\nEnv: CHAI_DATABASE_URL=postgresql://postgres:s3cr3t@localhost:5435/chai\n\n```sh\ncd alembic\nalembic downgrade -$STEP\n```\n\n### db\n\n```sh\npsql \"postgresql://postgres:s3cr3t@localhost:5435/chai\"\n```\n\n### db-list-packages\n\n```sh\npsql \"postgresql://postgres:s3cr3t@localhost:5435/chai\" -c \"SELECT count(id) FROM packages;\"\n```\n\n### db-list-history\n\n```sh\npsql \"postgresql://postgres:s3cr3t@localhost:5435/chai\" -c \"SELECT * FROM load_history;\"\n```\n\n### restart-api\n\nRefreshes table knowledge from the db.\n\n```sh\ndocker-compose restart api\n```\n\n### remove-orphans\n\n```sh\ndocker compose down --remove-orphans\n```\n\n### run-pipeline\n\nInputs: SERVICE\nRequires: build\nEnv: CHAI_DATABASE_URL=postgresql://postgres:s3cr3t@localhost:5435/chai\n\n```sh\ndocker compose up $SERVICE\n```\n\n[PostgreSQL]: https://www.postgresql.org\n[`pkgx`]: https://pkgx.sh\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteaxyz%2Fchai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fteaxyz%2Fchai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fteaxyz%2Fchai/lists"}