{"id":24087961,"url":"https://github.com/tinybirdco/data-quality-checks","last_synced_at":"2026-03-02T09:03:27.065Z","repository":{"id":180437566,"uuid":"650734440","full_name":"tinybirdco/data-quality-checks","owner":"tinybirdco","description":"A Tinybird project to test 5 criteria of data quality on the NYC Taxi Dataset","archived":false,"fork":false,"pushed_at":"2023-06-07T18:03:26.000Z","size":4,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-09-17T01:39:28.709Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tinybirdco.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-06-07T17:35:42.000Z","updated_at":"2023-07-11T13:30:00.000Z","dependencies_parsed_at":"2023-08-20T23:01:27.157Z","dependency_job_id":null,"html_url":"https://github.com/tinybirdco/data-quality-checks","commit_stats":null,"previous_names":["tinybirdco/data-quality-checks"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tinybirdco/data-quality-checks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-quality-checks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-quality-checks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-quality-checks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-quality-checks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tinybirdco","download_url":"https://codeload.github.com/tinybirdco/data-quality-checks/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tinybirdco%2Fdata-quality-checks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29996286,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-02T01:47:34.672Z","status":"online","status_checked_at":"2026-03-02T02:00:07.342Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-10T03:56:40.580Z","updated_at":"2026-03-02T09:03:27.049Z","avatar_url":"https://github.com/tinybirdco.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# Implementing data quality checks with Tinybird\n\nThis repository includes a simple data project to test for 5 criteria of data quality on the [NYC Taxi dataset](https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page) and publish the results of the tests as an API endpoint that you can use in CI/CD pipelines for data QA.\n\n## Clone the repo\nFirst, clone the repo into a directory of your choice with:\n\n```bash\ngit clone https://github.com/tinybirdco/data-quality-checks.git\ncd data-quality-checks\n```\n\n### Set up your .gitignore\nTinybird stores your auth details in a `.tinyb` file. Add this to your `.gitignore` if you intend to push your work to GitHub.\n\n## Setting up Tinybird\nIf you want to follow along with the examples, go ahead and [sign up](https://www.tinybird.co/signup) for a free Tinybird account and create a Workspace.\n\nOnce you've done that, copy your user admin token in the Tinybird UI. It's the token that says `Use it to authenticate with the CLI.`.\n\n### Set up your local environment\nThe easiest way to install the Tinybird CLI is with pip using a Python virtual environment. Run the following commands:\n\n```bash\npython3 -m venv .venv\nsource .venv/bin/activate\npip install tinybird-cli\n```\n\nThen go ahead and set your token as an environment variable for auth:\n\n```bash\nexport TB_TOKEN=\u003cpaste your token here\u003e\n```\n\nAuthenticate to your Tinybird Workspace with:\n\n```bash\ntb auth\n```\n\n## Pushing your project to the server\nNow you can push the data project files from the repo to the Tinybird server. \n\n## Create the Data Source\nStart by pushing the Data Source with\n\n```bash\ntb push datasources/yellow_tripdata.datasource\n```\n\nThis creates an empty Data Source server side with the correct schema.\n\n### Adding data to the Data Source\nNext, add data to the Data Source with\n\n```bash\ntb datasource append yellow_tripdata https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_2023-03.parquet\n```\n\n**Note**: You might find that some rows go to quarantine. As we’ll see, this can be due to data quality issues in the source data! To learn more about data quarantine in Tinybird, [go here](https://www.tinybird.co/docs/concepts/data-sources.html#the-quarantine-data-source). In the meantime, you can ignore this for the sake of following along.\n\n### Push the Pipe\nIn the `/endpoints` folder is a `.pipe` file. You can push this to the Tinybird server with \n\n```bash\ntb push endpoints/yellow_trip_data_qa_measurements.pipe\n```\n\nBy default, Tinybird publishes the last node of the Pipe as an API endpoint when you push with the CLI.\n\n### Generate a read token for the Pipe\nTo create a token to read the API, you'll need to navigate to the Tinybird UI: [https://ui.tinybird.co](https://ui.tinybird.co) for EU or [https://ui.us-east.tinybird.co](https://ui.us-east.tinybird.co) for US-East.\n\nClick \"Auth Tokens\", then \"Add a new token\", then \"Add Pipe scope\", choose `yellow_trip_data_qa_measurements` and give it a `Read` scope.\n\nCopy this token.\n\n## Call your API\n\nYou can now call your API and get a JSON result with the following cURL:\n\n```bash\ncurl --compressed -H 'Authorization: Bearer \u003cTOKEN\u003e'  https://api.tinybird.co/v0/pipes/yellow_trip_data_qa_measurements.json\n```\n\n**Note**: For workspaces in US-EAST, use `https://api.us-east.tinybird.co...`\n\nIf you navigate back to the Tinybird UI, you can find additional sample usage for the endpoint on the API page.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Fdata-quality-checks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftinybirdco%2Fdata-quality-checks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftinybirdco%2Fdata-quality-checks/lists"}