{"id":19440478,"url":"https://github.com/devinit/iati-covid19-first-prototype","last_synced_at":"2025-04-24T22:33:42.746Z","repository":{"id":37837647,"uuid":"259336056","full_name":"devinit/iati-covid19-first-prototype","owner":"devinit","description":"Extracting COVID-19 data from D-Portal and reprocesses it nightly (not used for prod visual)","archived":false,"fork":false,"pushed_at":"2023-01-09T16:11:23.000Z","size":32040905,"stargazers_count":3,"open_issues_count":0,"forks_count":3,"subscribers_count":11,"default_branch":"main","last_synced_at":"2024-04-17T03:08:11.327Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://github.com/OCHA-DAP/covid19-data/tree/gh-pages","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devinit.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-27T13:45:19.000Z","updated_at":"2022-07-27T15:49:25.000Z","dependencies_parsed_at":"2023-02-08T12:45:56.597Z","dependency_job_id":null,"html_url":"https://github.com/devinit/iati-covid19-first-prototype","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devinit%2Fiati-covid19-first-prototype","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devinit%2Fiati-covid19-first-prototype/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devinit%2Fiati-covid19-first-prototype/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devinit%2Fiati-covid19-first-prototype/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devinit","download_url":"https://codeload.github.com/devinit/iati-covid19-first-prototype/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223971670,"owners_count":17234103,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-10T15:29:28.421Z","updated_at":"2024-11-10T15:29:29.047Z","avatar_url":"https://github.com/devinit.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# COVID-19 Data\n\n## Note that this data is not used for the prod visual\n\nThe scraper and data for the prod visual can be found here: https://github.com/OCHA-DAP/hdx-scraper-iati-viz\n\nThis scraper extracts data from IATI Datastore nightly and reprocesses it:\n\n* selects certain fields and exports them in a nice clean JSON format\n* converts financial data to USD\n\nThe scripts in this repository automatically generate fresh data every day (using Github Actions), which can be seen in (and downloaded from) [the gh-pages branch](https://github.com/OCHA-DAP/covid19-data/tree/gh-pages).\n\nFor more detail on how the data was processed, see the [data notes](https://github.com/OCHA-DAP/covid19-data/blob/master/DATA-NOTES.md).\n\n\n### Installing\n\n```\ngit clone git@github.com:OCHA-DAP/covid19-data.git\nvirtualenv ./pyenv\nsource ./pyenv/bin/activate\npip install -r requirements.txt\n```\n\n### Running\n\nDownload and reprocess data using the following script. Add `--help` to see optional arguments.\n\n```\npython run.py\n```\n\n#### Running with cached rates (saves downloading a new file)\n\n```\npython run.py --cached-rates\n```\n\n#### Running and deploying to gh-pages\n\n```\npython run.py --deploy\n```\n\n### Overview\n\nThe code in this repository runs at 1500 UTC every day, using Github Actions. Files are pushed to the `gh-pages` branch and made available through Github Pages. The data is then visualised using software stored in the [OCHA-DAP/viz-covid19-visualisation](https://github.com/OCHA-DAP/viz-covid19-visualisation) repository, and also served from Github Pages.\n\n#### Data sources\n\nData is downloaded from a few places:\n\n* IATI data: D-Portal\n* FTS data: UNOCHA FTS\n* Codelists: CodeforIATI\n* Exchange Rates: CodeforIATI\n\nThese downloads are now reasonably stable, though a few things to be aware of:\n\n* **IATI data**: D-Portal fairly frequently fails to respond with relevant data. This appears to be more reliable now that we request fewer activities at once, and we run at 1500 rather than early in the morning (when D-Portal is itself collecting and updating source data). One option could be to consider switching to the new IATI Datastore (though see discussion below).\n* **FTS data**: FTS now seems to be pretty stable; occasionally the FTS API is unavailable\n* **Codelists**: these endpoints are very stable now as flat files are hosted on Github Pages. These files are generally much faster to download than the official IATI codelists, and they are also often more up to date.\n* **Exchange rates**: this file is also now very stable, again as a single compiled flat file is hosted on Github Pages; previously this data was hosted only on morph.io, but there have been a lot of stability issues recently. There don't appear to be any significant problems here any more.\n\n#### Process\n\nThe basic process is as follows:\n\n* `run.py`:\n  * either download or load in a list of exchange rates\n  * download data from D-Portal (`get_activities_from_urls()`)\n  * filter out activities that have certain problems (`activities_filter()`)\n  * filter out activities that don't conform to the IATI COVID-19 Publishing Guidance\n  * extract relevant data from each activity (`process_activity()`)\n  * write XML data for all activities (`write_xml_files()`)\n    * up to 3000 activities per file, labelled `activities-N.xml` where N is the page)\n  * write XML data for each reporting organisation\n  * write out the list of sectors and countries that are used in the data (so that in the user interface we don't display countries or sectors with no activities)\n  * download and process FTS data\n  * run `traceability.py` (see below)\n  * remove `activities.xml` (it is used by `traceability.py`, but it is a very large file and exceeds Github usage limits)\n* `traceability.py`:\n  * read in list of exchange rates\n  * download `TransactionType` codelist\n  * read in the activities XML (from `activities.xml`)\n  * identify which activities contain explicit COVID-19 transactions\n  * extract relevant data from each transaction (`make_transaction()`)\n  * export transactions to Excel\n  * disaggregate transactions by sector and country (`make_sector_country_transactions_data()`)\n  * export disaggregated data to JSON and Excel\n  * make grouped traceability data for Sankey diagram\n  * export grouped traceability data to JSON and Excel\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevinit%2Fiati-covid19-first-prototype","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevinit%2Fiati-covid19-first-prototype","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevinit%2Fiati-covid19-first-prototype/lists"}