{"id":20486217,"url":"https://github.com/undp-data/dsc-sdg-scraper","last_synced_at":"2025-07-27T08:41:06.359Z","repository":{"id":259094939,"uuid":"863468573","full_name":"UNDP-Data/dsc-sdg-scraper","owner":"UNDP-Data","description":"A collection of web scrapers to harvest SDG-labelled publications.","archived":false,"fork":false,"pushed_at":"2024-10-21T18:36:12.000Z","size":65,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-03-05T16:39:47.814Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/UNDP-Data.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-26T10:51:58.000Z","updated_at":"2024-10-21T18:36:16.000Z","dependencies_parsed_at":"2024-10-23T02:26:16.959Z","dependency_job_id":null,"html_url":"https://github.com/UNDP-Data/dsc-sdg-scraper","commit_stats":null,"previous_names":["undp-data/dsc-sdg-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/UNDP-Data/dsc-sdg-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fdsc-sdg-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fdsc-sdg-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fdsc-sdg-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fdsc-sdg-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/UNDP-Data","download_url":"https://codeload.github.com/UNDP-Data/dsc-sdg-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/UNDP-Data%2Fdsc-sdg-scraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267331331,"owners_count":24070170,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-27T02:00:11.917Z","response_time":82,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-15T16:35:45.373Z","updated_at":"2025-07-27T08:41:06.330Z","avatar_url":"https://github.com/UNDP-Data.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# dsc-sdg-scraper\n\n[![Python 3.11](https://img.shields.io/badge/python-3.11-blue.svg)](https://www.python.org/downloads/release/python-3110/)\n[![License](https://img.shields.io/github/license/undp-data/st-undp)](https://github.com/undp-data/dsc-sdg-scraper/blob/main/LICENSE)\n[![Black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)\n[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat\u0026labelColor=ef8336)](https://pycqa.github.io/isort/)\n[![Conventional Commits](https://img.shields.io/badge/Conventional%20Commits-1.0.0-%23FE5196?logo=conventionalcommits\u0026logoColor=white)](https://conventionalcommits.org)\n\nA collection of web scrapers to harvest SDG-labelled publications. The project is written\nin Python using an async interface of [`httpx`](https://www.python-httpx.org) and exposed\nto users via a simple command line interface (CLI) built in [`click`](https://click.palletsprojects.com/en/8.1.x/).\n\n## Table of Contents\n\n- [Getting Started](#getting-started)\n- [Usage](#usage)\n- [License](#license)\n- [Contributing](#contributing)\n\n## Getting Started\n\nThese instructions will help you set up the project locally. The project has been developed and tested with Python `3.11`.\nTo set up a local environment:\n\n1. Clone the repository:\n\n```shell\ngit clone https://github.com/undp-data/dsc-sdg-scraper.git\n```\n\n2. Navigate to the project directory:\n\n```shell\ncd dsc-sdg-scraper\n```\n\n3. Create a virtual environment:\n\n```shell\npython -m venv venv\nsource venv/bin/activate  # On Windows use `venv\\Scripts\\activate`\n```\n\n4. Install dependencies:\n\n```shell\npip install -r requirements.txt\n```\n\n5. Explore the CLI:\n\n```shell\npython -m sdg_scraper\n```\n\n## Usage\n\nThe CLI enables you to run scrapers for any of the supported sources.\n\nTo list available sources, run:\n\n```shell\npython -m sdg_scraper list\n```\n\nTo scrape a specific source, run:\n\n```shell\npython -m sdg_scraper run \u003csource\u003e\n```\n\nBy default, the programme will scrape resources from the first two pages of the source (pages 0-1) and save the\nfiles and metadata to the current directory. To customise this behaviour, use the command line options:\n\n```shell\npython -m sdg_scraper run \u003csource\u003e --pages 1 10 -f data\n# or\npython -m sdg_scraper run \u003csource\u003e -p 1 10 -f data\n```\n\nUse CLI help for more details:\n\n```shell\npython -m sdg_scraper --help\n```\n\n## License\n\nThis project's codebase is licensed under the BSD 3-Clause License. Data collected by the scraper may\nbe licensed under a different clause or even copyrighted. It is your responsibility to ensure that\nany processing of data with the help of the scraper is responsible, ethical and legal.\n\n## Contributing\n\nAll contributions must follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/).\nThe codebase is formatted with `black` and `isort`. Use the provided [Makefile](./Makefile) for these\nroutine operations.\n\n1. Clone or fork the repository.\n2. Create a new branch (`git checkout -b feature-branch`).\n3. Make your changes. Include tests for new features.\n4. Run tests (`make test`).\n5. Ensure your code is properly formatted (`make format`).\n6. Commit your changes (`git commit -m 'Feat: add some feature'`).\n7. Push to the branch (`git push origin feature-branch`).\n8. Open a pull request.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fundp-data%2Fdsc-sdg-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fundp-data%2Fdsc-sdg-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fundp-data%2Fdsc-sdg-scraper/lists"}