{"id":26851193,"url":"https://github.com/dehyabi/py-scraper","last_synced_at":"2026-02-25T17:37:50.910Z","repository":{"id":284648159,"uuid":"955614520","full_name":"dehyabi/py-scraper","owner":"dehyabi","description":"Py-Scraper is a powerful web scraping application built using built using Flask, BeautifulSoup, Selenium and ScrapeGraphAI.","archived":false,"fork":false,"pushed_at":"2025-04-06T00:21:15.000Z","size":16,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-04T17:52:02.585Z","etag":null,"topics":["beautifulsoup","flask","headless","mit-license","postgresql","py-scraper","python","scrapegraphai","scraper","selenium","web","web-scraper"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dehyabi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-03-26T23:19:36.000Z","updated_at":"2025-04-06T00:21:18.000Z","dependencies_parsed_at":"2025-07-04T17:39:43.638Z","dependency_job_id":"2230524e-3c11-4962-aef8-b93a42bc025e","html_url":"https://github.com/dehyabi/py-scraper","commit_stats":null,"previous_names":["dehyabi/py-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dehyabi/py-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dehyabi%2Fpy-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dehyabi%2Fpy-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dehyabi%2Fpy-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dehyabi%2Fpy-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dehyabi","download_url":"https://codeload.github.com/dehyabi/py-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dehyabi%2Fpy-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29832972,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-25T17:17:09.781Z","status":"ssl_error","status_checked_at":"2026-02-25T17:16:50.421Z","response_time":61,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","flask","headless","mit-license","postgresql","py-scraper","python","scrapegraphai","scraper","selenium","web","web-scraper"],"created_at":"2025-03-30T22:18:49.475Z","updated_at":"2026-02-25T17:37:50.895Z","avatar_url":"https://github.com/dehyabi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Py-Scraper\n\nThis project is a web scraper built using Flask, BeautifulSoup, Selenium, ScrapeGraphAI and PostgreSQL. It allows users to search for information and store the results in a PostgreSQL database.\n\n## Setup Instructions\n\n1. **Clone the repository:**\n\n   ```bash\n   git clone https://github.com/dehyabi/py-scraper.git\n   cd py-scraper\n   ```\n\n2. **Choose your scraping tools:**\n\n   - **beautifulsoup-headless**: Uses BeautifulSoup for scraping without opening a browser.\n   - **selenium-headless**: Uses Selenium to scrape without opening a browser.\n   - **scrapegraphai-headless**: Uses ScrapeGraphAI for scraping without opening a browser (need OpenAI API Key).\n\n   For example you use beautifulsoup-headless just do:\n\n   ```bash\n   cd beautifulsoup-headless\n   ```\n\n3. **Setup the environment:**\n\n   - Create a `.env` file in the root directory and add your database connection details.\n   - Example:\n     ```\n     DATABASE_NAME=your_database_name\n     DATABASE_USER=your_database_user\n     DATABASE_PASSWORD=your_database_password\n     DATABASE_HOST=localhost\n     DATABASE_PORT=5432\n     ```\n\n4. **Create a virtual environment:**\n\n   ```bash\n   python3 -m venv venv\n   source venv/bin/activate\n   ```\n\n5. **Install dependencies:**\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n6. **Test the database connection:**\n\n   ```bash\n   python3 test-db.py\n   ```\n\n7. **Run the application:**\n\n   ```bash\n   flask run\n   ```\n\n8. **Test the search endpoint:**\n   Use curl to test the search functionality:\n   ```bash\n   curl -X POST http://127.0.0.1:5000/search -H \"Content-Type: application/json\" -d '{\"query\": \"technology\"}'\n   ```\n\n## Database Interaction\n\n- To connect to PostgreSQL, use the following command:\n\n  ```bash\n  sudo postgres psql -d your_database_name\n  ```\n\n- You can view the inserted data with:\n\n  ```sql\n  SELECT * FROM table_name;\n  ```\n\n  Example of inserted data:\n\n  ```\n  -[ RECORD 1 ]-------------------------------------------------\n  id    | 1\n  title | Ultracapacitors: why, how, and where is the technology\n  ```\n\n  **Note:** The database setup and commands may vary depending on your database system.\n\n## Success Logs\n\nCheck the logs for information on the operations performed by the application.\n\n```\n * Running on http://127.0.0.1:5000\n2025-03-27 05:49:20,478 - INFO - Press CTRL+C to quit\n2025-03-27 05:49:50,642 - INFO - Received search query: technology\n2025-03-27 05:49:50,642 - INFO - Connecting to the database to insert file...\n2025-03-27 05:49:50,679 - INFO - Fetching data from: https://scholar.google.com/scholar?hl=en\u0026as_sdt=0%2C5\u0026q=technology\n2025-03-27 05:49:52,352 - INFO - Data fetched successfully.\n2025-03-27 05:49:52,484 - INFO - file inserted successfully.\n2025-03-27 05:49:52,484 - INFO - Scraped data inserted into the database.\n```\n\n## License\n\nThis project is licensed under the MIT License. See the [LICENSE](LICENSE.md) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdehyabi%2Fpy-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdehyabi%2Fpy-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdehyabi%2Fpy-scraper/lists"}