{"id":50576032,"url":"https://github.com/ashu-tosh-kumar/stack-scraper","last_synced_at":"2026-06-04T22:01:54.669Z","repository":{"id":158656715,"uuid":"632749350","full_name":"ashu-tosh-kumar/stack-scraper","owner":"ashu-tosh-kumar","description":"Sample repository showcasing how to write impeccably clean code that will save your sanity. In correspondence to my Medium article: https://medium.com/gitconnected/how-to-write-impeccably-clean-code-that-will-save-your-sanity-7d0ea59d285c","archived":false,"fork":false,"pushed_at":"2025-07-29T10:15:24.000Z","size":375,"stargazers_count":1,"open_issues_count":2,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-07-29T12:42:35.676Z","etag":null,"topics":["best-practices","clean-code","coding","howto","programming","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashu-tosh-kumar.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null},"funding":{"github":["ashu-tosh-kumar"],"custom":["https://paypal.me/ashutkumar","https://www.buymeacoffee.com/ashutoshkumar"]}},"created_at":"2023-04-26T03:47:43.000Z","updated_at":"2025-07-28T04:59:14.000Z","dependencies_parsed_at":null,"dependency_job_id":"d9af3b57-7c1a-4c46-8d67-9b048a2f7c7d","html_url":"https://github.com/ashu-tosh-kumar/stack-scraper","commit_stats":null,"previous_names":[],"tags_count":1,"template":true,"template_full_name":null,"purl":"pkg:github/ashu-tosh-kumar/stack-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashu-tosh-kumar%2Fstack-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashu-tosh-kumar%2Fstack-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashu-tosh-kumar%2Fstack-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashu-tosh-kumar%2Fstack-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashu-tosh-kumar","download_url":"https://codeload.github.com/ashu-tosh-kumar/stack-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashu-tosh-kumar%2Fstack-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33921352,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-04T02:00:06.755Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["best-practices","clean-code","coding","howto","programming","python"],"created_at":"2026-06-04T22:01:53.737Z","updated_at":"2026-06-04T22:01:54.664Z","avatar_url":"https://github.com/ashu-tosh-kumar.png","language":"Python","funding_links":["https://github.com/sponsors/ashu-tosh-kumar","https://paypal.me/ashutkumar","https://www.buymeacoffee.com/ashutoshkumar"],"categories":[],"sub_categories":[],"readme":"# Stack Scraper\n\nTable of Content\n\n- [Stack Scraper](#stack-scraper)\n  - [Introduction](#introduction)\n  - [How to run the application?](#how-to-run-the-application)\n  - [Architecture](#architecture)\n  - [Repository Structure](#repository-structure)\n  - [Release History](#release-history)\n\n## Introduction\n\nStack Scraper is a sample repository showcasing how to write impeccably clean code that\nwill save your sanity. It is in correspondence to my Medium article published by\ngitconnected: [How to Write Impeccably Clean Code That Will Save Your\nSanity](https://medium.com/gitconnected/how-to-write-impeccably-clean-code-that-will-save-your-sanity-7d0ea59d285c).\n\nPlease don't use it as it is in production without due diligence wrt security like\nenabling CSRF for live server, using non root user in docker etc.\n\nStack-Scraper contains one scrapper that can be triggered on demand and scrapes\nquestions and answers from Stack Overflow. We are making the following assumptions about\nStack Overflow.\n\n- Stack Overflow is a single page website.\n- All the questions are numbered in integers.\n- Each question has only one answer which could be accepted or not.\n\n## How to run the application?\n\nTo run the application, clone the repository and run the command `python -m src.main`\nfrom inside the repository location.\n\nYou can also use Docker to run the service using commands:\n\n```bash\ndocker compose build\ndocker compose up -d\n```\n\nOnce the application is up, you can run `curl localhost:8000/health` to confirm that the\nAPI is working fine. You should receive a response like `{\"message\":\"Stack Scraper at\nyour service\",\"status\":\"SUCCESS\"}`\n\nTo run the scraper, run `curl -X POST localhost:8000/stackoverflow`. If the scrappers\nrun, you would receive a response like `{\"status\":\"SUCCESS\"}`.\n\nOnce the scrappers run, you can try following commands to try out the\nStack Scraper.\n\n`curl localhost:8000/stackoverflow/1` : Should return successful response\n\n`curl localhost:8000/stackoverflow/2` : Should return successful response\n\n`curl localhost:8000/stackoverflow/3` : Should return successful response\n\n`curl localhost:8000/stackoverflow/4` : Should return successful response\n\n`curl localhost:8000/stackoverflow/5` : Should return question doesn't exist response\n\n## Architecture\n\n- All the scraped data is stored in an in-memory database.\n- Stack-Scraper provides following REST endpoints\n  - [`GET`] `/health`: To check health of the application\n  - [`POST`] `/stackoverflow`: To run the scrapper\n  - [`GET`] `/stackoverflow/\u003cquestion_no\u003e`\" To fetch a question by its number from db\n\nPlease note that this is not a working project but only to showcase the ideas discussed\nin the article mentioned above.\n\n## Repository Structure\n\n```text\n.\n├── src\n│   ├── apis\n│   │   └── question_no.py\n│   ├── config.py\n│   ├── constants\n│   │   ├── api_constants.py\n│   │   └── mock_data.py\n│   ├── db_wrappers\n│   │   ├── db_models.py\n│   │   └── in_memory_db.py\n│   ├── domain_models\n│   │   ├── domain_enums.py\n│   │   └── domain_models.py\n│   ├── external_sources\n│   │   ├── apis\n│   │   ├── external_source_base.py\n│   │   ├── scrapper_tasks.py\n│   │   └── scrappers\n│   │       └── stack_overflow.py\n│   ├── main.py\n│   ├── scripts\n│   │   ├── run_api.py\n│   │   └── run_api.sh\n│   ├── secret.py\n│   └── utils\n└── test\n    └── src\n        ├── apis\n        │   └── test_question_no.py\n        ├── constants\n        │   └── test_api_constants.py\n        ├── db_wrappers\n        │   ├── test_db_models.py\n        │   └── test_in_memory_db.py\n        ├── domain_models\n        │   ├── test_domain_enums.py\n        │   └── test_domain_models.py\n        ├── external_sources\n        │   ├── apis\n        │   ├── scrappers\n        │   │   └── test_stack_overflow.py\n        │   ├── test_external_source_base.py\n        │   └── test_scrapper_tasks.py\n        ├── test_config.py\n        ├── test_main.py\n        ├── test_secret.py\n        └── utils\n```\n\n`src/`: Notice that all the source code is under the `src` folder. This is not mandatory\nbut would be helpful if you want to convert your project into a sharable pip package.\n\n`apis/`: This folder contains all the domain REST endpoints (except the health check) in\nthe project. We can create a new file under the `apis` folder for each new API we add to\nthe project. This will help in having all the API code in one place and separate\ndifferent APIs.\n\n`constants/`: This folder should contain all application-wide constants. It also helps\nbusinesses to check any business-level constants like standard responses. Moreover, it\nmakes it easy to change any value without having to worry about making the same change\nin many places. Additionally, you should not put all constants in a single file naming\nconstants.py as over time it would grow to be a mess of its own. Instead, try to create\nmany files each containing a set of related constants. For example: stack_xpaths.py\nto store all xpaths for scraping, api_constants.py to store API level constants like\nHEALTH_RESPONSE.\n\n`db_wrappers/`: This folder contains all the database models and wrappers to allow\ninteraction with the database.\n\n`domain_models`: This folder contains all the pydantic-based models and enum classes.\n\n`external_sources`: This folder contains all the scrappers and/or external API\nintegrations. Our scrapper for Stack Overflow would live under\n`external_sources/scrappers`.\n\n`utils`: This folder contains all the utility code. Please note not to dump all\nutility functions in a single utility file. Instead, try to a create logical grouping\nwith each file containing each group.\n\n`test`: This folder contains all the test files. Note that the `test` directory follows\nthe same folder hierarchy as the code under `src`. This helps in maintaining test files\nof a large number of modules in a big project. Also, it aids in finding test files for a\ngiven module easily.\n\n## Release History\n\n`v0.1.0`\n\n- Making official release of working code to better track changes.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashu-tosh-kumar%2Fstack-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashu-tosh-kumar%2Fstack-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashu-tosh-kumar%2Fstack-scraper/lists"}