{"id":39509794,"url":"https://github.com/codingforentrepreneurs/scrape-websites-with-python-fastapi-celery-nosql","last_synced_at":"2026-01-18T06:01:02.588Z","repository":{"id":54448931,"uuid":"408190970","full_name":"codingforentrepreneurs/Scrape-Websites-with-Python-FastAPI-Celery-NoSQL","owner":"codingforentrepreneurs","description":"Learn how to scrape websites with Python, Selenium, Requests HTML, Celery, FastAPI, \u0026 NoSQL with Cassandra via AstraDB.","archived":false,"fork":false,"pushed_at":"2021-09-23T02:36:56.000Z","size":535,"stargazers_count":92,"open_issues_count":2,"forks_count":30,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-02-04T23:29:51.752Z","etag":null,"topics":["astradb","cassandra","cassandra-driver","celery","fastapi","python","requests-html","scheduled-tasks"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/codingforentrepreneurs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-19T17:23:31.000Z","updated_at":"2025-01-27T07:52:44.000Z","dependencies_parsed_at":"2022-08-13T16:00:47.390Z","dependency_job_id":null,"html_url":"https://github.com/codingforentrepreneurs/Scrape-Websites-with-Python-FastAPI-Celery-NoSQL","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/codingforentrepreneurs/Scrape-Websites-with-Python-FastAPI-Celery-NoSQL","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codingforentrepreneurs%2FScrape-Websites-with-Python-FastAPI-Celery-NoSQL","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codingforentrepreneurs%2FScrape-Websites-with-Python-FastAPI-Celery-NoSQL/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codingforentrepreneurs%2FScrape-Websites-with-Python-FastAPI-Celery-NoSQL/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codingforentrepreneurs%2FScrape-Websites-with-Python-FastAPI-Celery-NoSQL/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/codingforentrepreneurs","download_url":"https://codeload.github.com/codingforentrepreneurs/Scrape-Websites-with-Python-FastAPI-Celery-NoSQL/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/codingforentrepreneurs%2FScrape-Websites-with-Python-FastAPI-Celery-NoSQL/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28531991,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["astradb","cassandra","cassandra-driver","celery","fastapi","python","requests-html","scheduled-tasks"],"created_at":"2026-01-18T06:00:35.931Z","updated_at":"2026-01-18T06:01:02.575Z","avatar_url":"https://github.com/codingforentrepreneurs.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Scrape Websites with Python \u0026 NoSQL\nLearn how to scrape websites with Python, Selenium, Requests HTML, Celery, FastAPI, \u0026 NoSQL.\n\n\nHere's what each tool is used for:\n\n- **Python 3.9** [download](https://www.python.org/download/) - programming the logic.\n- **AstraDB** [sign up](https://dtsx.io/3nQnjz1) - highly perfomant and scalable database service by DataStax. AstraDB is a Cassandra NoSQL Database. [Cassandra](https://cassandra.apache.org/_/index.html) is used by Netflix, Discord, Apple, and many others to handle astonding amounts of data.\n- **Selenium** [docs](https://selenium-python.readthedocs.io/) - an automated web browsing experience that allows:\n  - Run all web-browser actions through code\n  - Loads JavaScript heavy websites\n  - Can perform standard user interaction like clicks, form submits, logins, etc.\n- **Requests HTML** [docs](https://docs.python-requests.org/) - we're going to use this to parse an HTML document extracted from Selenium\n- **Celery** [docs](https://docs.celeryproject.org/) - Celery providers worker processes that will allow us to schedule when we need to scrape websites. We'll be using [redis](https://redis.io/) as our task queue.\n- **FastAPI** [docs](https://fastapi.tiangolo.com/) - as a web application framework to Display and monitor web scraping results from anywhere\n\n\n\nThis series is broken up into 4 parts:\n\n- **Scraping** How to scrape and parse data from nearly any website with Selenium \u0026 Requests HTML. \n- **Data models** how to store and validate data with `cassandra-driver`, `pydantic`, and **AstraDB**.\n- **Worker \u0026 Scheduling** how to schedule periodic tasks (ie scraping) integrated with Redis \u0026 AstraDB\n- **Presentation** How to combine the above steps in as robust web application service\n\n\n\n## Setup your system.\nBelow is a preflight checklist to ensure you system is fully setup to work with this course. All guides and setup can be found in the [setup](./setup) directory of this repo.\n\n### Preflight checklist\n- [] Install Selenium \u0026 Chromedriver - [setup guide](./setup/Install%20Selenium%20%26%20Chromedriver%20on%20your%20System.md)\n- [] Install Redis  - [setup guide](./setup/Setup%20Redis.md)\n- [] Create a virtual environment \u0026 install dependencies\n- [] Setup an account with DataStax\n- [] Create your first AstraDB and get API credentials\n- [] Use `cassandra-driver` to verify your connection to AstraDB\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodingforentrepreneurs%2Fscrape-websites-with-python-fastapi-celery-nosql","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcodingforentrepreneurs%2Fscrape-websites-with-python-fastapi-celery-nosql","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcodingforentrepreneurs%2Fscrape-websites-with-python-fastapi-celery-nosql/lists"}