{"id":26283022,"url":"https://github.com/ismola/selenium-scraper-quickstarter","last_synced_at":"2025-05-07T09:43:45.765Z","repository":{"id":243211342,"uuid":"811778916","full_name":"Ismola/selenium-scraper-quickstarter","owner":"Ismola","description":"Flask API with Selenium Quickstart. Any contribution is welcome","archived":false,"fork":false,"pushed_at":"2025-04-25T12:14:48.000Z","size":225,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-25T12:42:04.083Z","etag":null,"topics":["chromedriver","docker","logging","python","selenium"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Ismola.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.md","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2024-06-07T09:30:54.000Z","updated_at":"2025-04-25T12:14:51.000Z","dependencies_parsed_at":"2025-03-10T14:27:08.358Z","dependency_job_id":"4d2ac3ac-ede5-4ba8-92dd-efda6cd92079","html_url":"https://github.com/Ismola/selenium-scraper-quickstarter","commit_stats":null,"previous_names":["ismola/selenium-scraper-starter","ismola/selenium-scraper-quickstarter"],"tags_count":4,"template":true,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ismola%2Fselenium-scraper-quickstarter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ismola%2Fselenium-scraper-quickstarter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ismola%2Fselenium-scraper-quickstarter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Ismola%2Fselenium-scraper-quickstarter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Ismola","download_url":"https://codeload.github.com/Ismola/selenium-scraper-quickstarter/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252852910,"owners_count":21814427,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["chromedriver","docker","logging","python","selenium"],"created_at":"2025-03-14T17:16:27.613Z","updated_at":"2025-05-07T09:43:45.749Z","avatar_url":"https://github.com/Ismola.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Selenium Scraper Starter\n\nThis repository provides a foundation for building robust and scalable web scrapers using Selenium and Flask. It emphasizes best practices including environment management, configuration with Docker, and a well-structured project layout.\n\n## Key Features\n\n- **Selenium Automation:** Efficiently interact with dynamic webpages using Selenium's browser automation capabilities.\n- **Flask Backend:** Create a RESTful API with Flask to manage scraper execution, authorization, and logging.\n- **Bearer Authentication:** Implement a secure mechanism for API access using bearer tokens.\n- **Environment Management:** Facilitate deployment across different environments (production, staging) using environment variables.\n- **Docker Configuration:** Streamline containerization for a consistent and portable development experience.\n- **Logging System:** Track scraper activities and errors for debugging and monitoring.\n\n## Local Setup - ``Dev Container`` (recomended)\n\nTo set up the development environment using Dev Container, follow these steps:\n\n1. Install [Visual Studio Code](https://code.visualstudio.com/) and the [Remote - Containers extension](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers).\n2. Install [Docker](https://www.docker.com/)\n3. Clone the repository to your local machine.\n4. Open the project in Visual Studio Code.\n5. When prompted, select \"Reopen in Container\" to automatically build and open the project inside the Dev Container.\n    - **.env file:** Create a `.env` file to store environment variables (refer to `.env.example` for guidance).\n\nThis setup will ensure all necessary dependencies are installed and provide an isolated environment tailored for development.\n\n## Cloud Setup - ``GitHub Codespaces``\n\nIf you prefer to use GitHub Codespaces, follow these steps:\n\n1. Navigate to the repository on [GitHub](https://github.com/Ismola/selenium-scraper-quickstarter).\n2. Click the green \"Code\" button, then select \"Open with Codespaces.\"\n3. GitHub will automatically build and open the project in a preconfigured environment.\n    - **.env file:** Create a `.env` file to store environment variables (refer to `.env.example` for guidance).\n\nUsing GitHub Codespaces provides a cloud-based development environment with all dependencies pre-configured and ready to use.\n\n## Local Setup - ``MANUAL``\n\n### Prerequisites\n\nBefore diving in, ensure you have the following tools installed:\n\n- **Python (version 3.x recommended):** Download and install from \u003chttps://www.python.org/downloads/\u003e.\n- **.env file:** Create a `.env` file to store environment variables (refer to `.env.example` for guidance).\n- **HTTP Client (Postman recommended):** Use an HTTP client like \u003chttps://www.postman.com/\u003e to send requests to the Flask API.\n- **Chrome Browser:** Download and install the latest version from \u003chttps://www.google.com/chrome/\u003e.\n\n### Create a Virtual Environment\n\nWe use a module named virtualenv which is a tool to create **isolated Python environments**. Virtualenv creates a folder that contains all the necessary executables to use the packages that a Python project would need.\n\n```bash\npython3 -m venv \u003cwhatever_virtual_environment_name\u003e\n```\n\n### Activate virtual environment\n\n```bash\nsource \u003cwhatever_virtual_environment_name\u003e/bin/activate   # for Unix/Linux\n.\\\u003cwhatever_virtual_environment_name\u003e\\Scripts\\activate    # for Windows\n```\n\n### Install project libraries\n\n```bash\npip install -r requirements.txt  # Works for both Unix/Linux and Windows\n```\n\n## Run app\n\n```bash\npython3 main.py  # for Unix/Linux\npython main.py   # for Windows\n```\n\nNow, the server is accessible at `http://localhost3000`\n\n### First Call\n\n![First Call](./readmeImages/firstcall.png)\n\n![Auth Call](./readmeImages/authcall.png)\n\n### Make your firsts changes\n\n1. The first thing is to add your .env file. You can add a invented bearer token to get started\n\n2. Then configure the base url in the utils/config.py file\n\n3. In order to work on your project, you must add an endpoint to main.py.\n\n4. Next, create a controller, and add the different web actions on the controller. It is recommended to do actions with few steps, to be able to modularize your code, and not repeat code in the future.\n\n## Project Structure\n\n```bash\n├─── main.py                   # Entry point for the Flask application\n├─── .vscode                   # Configuration for Visual Studio Code (optional)\n├─── actions                   # Contains scraper actions (logic for data extraction)\n├─── controller                # Functions handling API requests\n├─── temp_downloads            # Temporary files created during scraping\n└─── utils                     # Reusable helper functions\n```\n\n## Bibliography\n\n- [Selenium Web Page](https://selenium-python.readthedocs.io/): Main bot technology\n- [Selenium Tutorial](https://youtube.com/playlist?list=PLheIVUbpfWZ17lCcHnoaa1RD59juFR06C\u0026si=TTyB-dQQFl38tXO2)\n- [Flask](https://flask.palletsprojects.com/en/3.0.x/): Core technology for creating a REST API server\n\n## Commond Errors\n\n```bash\nERROR: local variable 'driver' referenced before assignment\n```\n\n`It may be because the script are taking the chromedriver from the wrong place. Every time Chrome and Chromedriver's versions are not in harmony, this error occurs`\n\n### Steps\n\n#### 1. Copy the file [utils/init_manual.py](utils/init_manual.py). In python terminal paste the content. If there are any errors, continue to the next step\n\n#### 2. To install the chromedriver again, delete this folder. In this folder is where the files that installs Python del Chromedriver are saved\n\n```bash\nrm -rf ~/.wdm\n```\n\n## Bring changes from the template\n\n```bash\ngit remote add template https://github.com/Ismola/selenium-scraper-quickstarter\ngit fetch template\ngit merge template/main --allow-unrelated-histories\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fismola%2Fselenium-scraper-quickstarter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fismola%2Fselenium-scraper-quickstarter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fismola%2Fselenium-scraper-quickstarter/lists"}