{"id":13586494,"url":"https://github.com/ooni/pipeline","last_synced_at":"2026-03-05T23:02:50.424Z","repository":{"id":31852693,"uuid":"35419970","full_name":"ooni/pipeline","owner":"ooni","description":"OONI data processing pipeline","archived":false,"fork":false,"pushed_at":"2023-08-17T13:59:09.000Z","size":3073,"stargazers_count":40,"open_issues_count":8,"forks_count":14,"subscribers_count":17,"default_branch":"master","last_synced_at":"2026-01-14T23:40:09.893Z","etag":null,"topics":["big-data","data-pipeline","open-data"],"latest_commit_sha":null,"homepage":"https://ooni.org/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ooni.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2015-05-11T11:33:08.000Z","updated_at":"2022-09-09T04:36:46.000Z","dependencies_parsed_at":"2023-02-16T10:31:24.915Z","dependency_job_id":"f83f033a-259f-49c7-8166-e8f907123008","html_url":"https://github.com/ooni/pipeline","commit_stats":{"total_commits":1191,"total_committers":17,"mean_commits":70.05882352941177,"dds":0.6817800167926112,"last_synced_commit":"0cefc86df79c38b81ed156fac79b0cb27f9e8561"},"previous_names":["thetorproject/ooni-pipeline"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/ooni/pipeline","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ooni%2Fpipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ooni%2Fpipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ooni%2Fpipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ooni%2Fpipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ooni","download_url":"https://codeload.github.com/ooni/pipeline/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ooni%2Fpipeline/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30154287,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-05T22:39:40.138Z","status":"ssl_error","status_checked_at":"2026-03-05T22:39:24.771Z","response_time":93,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["big-data","data-pipeline","open-data"],"created_at":"2024-08-01T15:05:36.461Z","updated_at":"2026-03-05T23:02:50.392Z","avatar_url":"https://github.com/ooni.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"# OONI backend\n\nWelcome. This document describes the architecture of the main components of the\nOONI infrastructure.\n\nThe documentation is meant for core contributors, external contributors and researcher\nthat want to extract data or reuse software components in their own projects.\n\nThis file is [rendered here](https://ooni.github.io/pipeline/README.html)\n\nYou can also explore the [documentation tree](https://ooni.github.io/pipeline/)\n\n## Table of contents\n\n[TOC]\n\n## Architecture\n\nThe backend infrastructure provides multiple functions:\n\n* Provide APIs for data consumers\n* Instruct probes on what measurements to perform\n* Receive measurements from probes, process them and store them in the database and on S3\n\n## Data flow\n\nThis diagram represent the main flow of measurement data\n\n\nblockdiag {\n Probes [color = \"#ffeeee\"]; \n Explorer [color = \"#eeeeff\"]; \n \"S3 jsonl\" [shape = ellipse];\n \"S3 postcan\" [shape = ellipse];\n \"DB jsonl tbl\" [shape = ellipse];\n \"DB fastpath tbl\" [shape = ellipse];\n \"disk queue\" [shape = ellipse];\n\n Probes -\u003e \"API: Probe services\" -\u003e \"Fastpath\" -\u003e \"DB fastpath tbl\" -\u003e \"API: Measurements\" -\u003e \"Explorer\";\n \"API: Probe services\" -\u003e \"disk queue\" -\u003e \"API: uploader\" -\u003e \"S3 jsonl\" -\u003e \"API: Measurements\";\n \"API: uploader\" -\u003e \"S3 postcan\";\n \"API: uploader\" -\u003e \"DB jsonl tbl\";\n \"DB jsonl tbl\" -\u003e \"API: Measurements\"\n}\n\n\nEach measurement is processed individually in real time.\n\n\n## Components: API\n\nThe API entry points are documented at [apidocs](https://api.ooni.io/apidocs/)\n\n### Measurements\n\nProvide access to measurements to end users directly and through Explorer.\n\nMounted under /api/v1/measurement/\n\nThe API is versioned. Access is rate limited based on source IP address and access tokens\ndue to the computational cost of running heavy queries on the database.\n\n[Sources](https://github.com/ooni/api/blob/master/newapi/ooniapi/probe_services.py)\n\n### Probe services\n\nServes lists of collectors and test helpers to the probes and receive measurements from them.\n\nMounted under /api/v1/\n\n[Sources](https://github.com/ooni/api/blob/master/newapi/ooniapi/probe_services.py)\n\n### Private entry points\n\nNot for public consumption. Mounted under `/api/_` and used exclusively by Explorer\n\n[Sources](https://github.com/ooni/api/blob/master/newapi/ooniapi/private.py)\n\n## Fastpath\n\n[Documentation](af/fastpath/fastpath/core.html)\n\n## Database\n\n## Operations\n\n### Build, deploy, rollback\n\nHost deployments are done with the [sysadmin repo](https://github.com/ooni/sysadmin)\n\nFor component updates a deployment pipeline is used:\n\nLook at the [Status dashboard](https://github.com/ooni/backend/wiki/Backend) - be aware of badge image caching\n\nUse the deploy tool:\n\n```bash\n# Update all badges:\ndep refresh_badges\n\n# Show status\ndep\n\n# Deploy/rollback a given version on the \"test\" stage\ndeploy ooni-api test 0.6~pr194-147\n\n# Deploy latest build on the first stage\ndeploy ooni-api\n\n# Deploy latest build on a given stage\ndeploy ooni-api prod\n\n```\n\n### Adding new tests\n\nUpdate [database_upgrade_schema](https://github.com/ooni/pipeline/blob/master/af/fastpath/database_upgrade_schema.py)\n\n```\nALTER TYPE ootest ADD VALUE '\u003ctest_name\u003e';\n```\n\nUpdate [fastpath](https://github.com/ooni/pipeline/blob/master/af/fastpath/fastpath/core.py)\nby adding a new test to the `score_measurement` function and adding relevant\nintegration tests.\n\nCreate a [Pull Request](https://github.com/ooni/pipeline/compare)\n\nRun fastpath manually from S3 on the testing stage see: [rerun fastpath manually](#rerun-fastpath-manually)\n\nUpdate the [api](https://github.com/ooni/api/blob/master/newapi/ooniapi/measurements.py#L491)\n\n### Adding new fingerprints\n\nTODO\n\n### API runbook\n\nMonitor the [API](https://mon.ooni.nu/grafana/d/CkdDBscGz/ams-pg-api?orgId=1) and \n[fastpath](https://mon.ooni.nu/grafana/d/75nnWVpMz/fastpath-ams-pg?orgId=1) dashboards.\n\nFollow Nginx or API logs with:\n```bash\nsudo journalctl -f -u nginx --no-hostname\n# The API logs contain SQL queries, exceptions etc\nsudo journalctl -f --identifier gunicorn3 --no-hostname\n```\n\n### Fastpath runbook\n\n#### Manual deployment\n\n```bash\nssh \u003chost\u003e\nsudo apt-get update\napt-cache show fastpath | grep Ver | head -n5\nsudo apt-get install fastpath\n```\n\n#### Restart\n`sudo systemctl restart fastpath`\n\n#### Rerun fastpath manually\n\nRun as fastpath user:\n\n```bash\nssh \u003chost\u003e\nsudo sudo -u fastpath /bin/bash\ncd\n```\n\n```bash\nfastpath --help\n# rerun without overwriting files on disk nor writing to database:\nfastpath --start-day 2016-05-13 --end-day 2016-05-14 --stdout --no-write-msmt --no-write-to-db\n# rerun without overwriting files on disk:\nfastpath --start-day 2016-05-13 --end-day 2016-05-14 --stdout --no-write-msmt\n# rerun and overwrite:\nfastpath --start-day 2016-05-13 --end-day 2016-05-14 --stdout --update\n```\n\nThe fastpath will pull cans from S3.\nThe daemon (doing real-time processing) can keep running in the meantime.\n\n[Progress chart](https://mon.ooni.nu/prometheus/new/graph?g0.expr=netdata_statsd_gauge_fastpath_s3feeder_s3_download_percentage_value_average%7Bdimension%3D%22gauge%22%7D\u0026g0.tab=0\u0026g0.stacked=1\u0026g0.range_input=2h\u0026g1.expr=netdata_statsd_gauge_fastpath_load_s3_reports_remaining_files_value_average%7Bdimension%3D%22gauge%22%7D\u0026g1.tab=0\u0026g1.stacked=1\u0026g1.range_input=1h)\n#### Log monitoring\n\n```bash\nsudo journalctl -f -u fastpath\n```\n\n#### Monitoring dashboard\n\n[https://mon.ooni.nu/grafana/d/75nnWVpMz/fastpath-ams-pg?orgId=1\u0026refresh=5m\u0026from=now-7d\u0026to=now](https://mon.ooni.nu/grafana/d/75nnWVpMz/fastpath-ams-pg?orgId=1\u0026refresh=5m\u0026from=now-7d\u0026to=now)\n\n### Analysis runbook\n\nThe Analysis tool runs a number of systemd timers to monitor the slow query summary and more.\nSee https://github.com/ooni/pipeline/blob/master/af/analysis/analysis/analysis.py\n\n#### Manual deployment\n\n```\nssh \u003chost\u003e\nsudo apt-get update\napt-cache show analysis | grep Ver | head -n5\nsudo apt-get install analysis=\u003cversion\u003e\n```\n\n#### Run manually\n```\nsudo systemctl restart ooni-update-counters.service\n```\n\n#### Log monitoring\n\n```\nsudo journalctl -f --identifier analysis\n```\n\n#### Monitoring dashboard\n\n[https://mon.ooni.nu/grafana/d/75nnWVpMz/fastpath-ams-pg?orgId=1\u0026refresh=5m\u0026from=now-7d\u0026to=now](https://mon.ooni.nu/grafana/d/75nnWVpMz/fastpath-ams-pg?orgId=1\u0026refresh=5m\u0026from=now-7d\u0026to=now)\n\n### Deploy new host\n\nDeploy host from https://cloud.digitalocean.com/projects/\n\nCreate DNS \"A\" record `\u003cname\u003e.ooni.org` at https://ap.www.namecheap.com/\n\nOn the sysadmin repo, ansible directory, add the host to the inventory\n\nRun the deploy with the root SSH user\n```\n./play deploy-\u003cfoo\u003e.yml -l \u003cname\u003e.ooni.org --diff -u root\n```\n\nUpdate prometheus\n```\n./play deploy-prometheus.yml -t prometheus-conf --diff\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fooni%2Fpipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fooni%2Fpipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fooni%2Fpipeline/lists"}