{"id":31782070,"url":"https://github.com/beingvirus/jobminer","last_synced_at":"2025-10-10T09:14:27.259Z","repository":{"id":317561206,"uuid":"1067566888","full_name":"beingvirus/JobMiner","owner":"beingvirus","description":"JobMiner – A Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data.","archived":false,"fork":false,"pushed_at":"2025-10-01T17:44:59.000Z","size":32,"stargazers_count":5,"open_issues_count":13,"forks_count":9,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-10-01T18:18:14.099Z","etag":null,"topics":["automation","beautifulsoup","career","crawler","data-collection","data-mining","hacktoberfest","hacktoberfest-accepted","hacktoberfest2025","job-scraper","jobs","open-source","python","selenium","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/beingvirus.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"Security.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-01T03:42:53.000Z","updated_at":"2025-10-01T17:45:03.000Z","dependencies_parsed_at":"2025-10-01T18:31:41.143Z","dependency_job_id":null,"html_url":"https://github.com/beingvirus/JobMiner","commit_stats":null,"previous_names":["beingvirus/jobminer"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/beingvirus/JobMiner","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beingvirus%2FJobMiner","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beingvirus%2FJobMiner/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beingvirus%2FJobMiner/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beingvirus%2FJobMiner/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/beingvirus","download_url":"https://codeload.github.com/beingvirus/JobMiner/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/beingvirus%2FJobMiner/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279003388,"owners_count":26083579,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-10T02:00:06.843Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","beautifulsoup","career","crawler","data-collection","data-mining","hacktoberfest","hacktoberfest-accepted","hacktoberfest2025","job-scraper","jobs","open-source","python","selenium","web-scraping"],"created_at":"2025-10-10T09:14:21.393Z","updated_at":"2025-10-10T09:14:27.251Z","avatar_url":"https://github.com/beingvirus.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# JobMiner 🔍\n\nJobMiner is a powerful Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data. Built with modularity and extensibility in mind, it provides a robust foundation for job market analysis and automated job searching.\n\n## ✨ Features\n\n- **Modular Architecture**: Easy-to-extend scraper system with base classes\n- **Multiple Output Formats**: Export to JSON, CSV, or both\n- **Database Integration**: Optional SQLite/PostgreSQL storage with search capabilities\n- **CLI Interface**: Command-line tool for easy scraping operations\n- **Configuration Management**: Flexible configuration system with environment variables\n- **Rate Limiting**: Built-in delays and respectful scraping practices\n- **Error Handling**: Comprehensive logging and error recovery\n- **Template Generation**: Quick scraper template creation for new job sites\n\n## 🚀 Quick Start\n\n### Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/beingvirus/JobMiner.git\ncd JobMiner\n\n# Install dependencies\npip install -r requirements.txt\n\n# Optional: Install as package\npip install -e .\n```\n\n### Basic Usage\n\n```bash\n# List available scrapers\npython jobminer_cli.py list-scrapers\n\n# Run demo scraper\npython jobminer_cli.py scrape demo-company \"python developer\" --location \"san francisco\" --pages 2\n\n# Analyze scraped data\npython jobminer_cli.py analyze jobs.json\n```\n\n### Python API\n\n```python\nfrom scrapers.demo_company.demo_company import DemoCompanyScraper\n\n# Initialize scraper\nscraper = DemoCompanyScraper()\n\n# Scrape jobs\njobs = scraper.scrape_jobs(\n    search_term=\"python developer\",\n    location=\"san francisco\",\n    max_pages=2\n)\n\n# Save results\nscraper.save_to_json(jobs, \"jobs.json\")\nscraper.save_to_csv(jobs, \"jobs.csv\")\n```\n\n## 📁 Project Structure\n\n```\nJobMiner/\n├── base_scraper.py              # Base scraper class with common functionality\n├── jobminer_cli.py              # Command-line interface\n├── config.py                    # Configuration management\n├── database.py                  # Database integration (optional)\n├── requirements.txt             # Project dependencies\n├── setup.py                     # Package setup\n├── .env.example                 # Environment variables template\n├── scrapers/                    # Individual scraper implementations\n│   └── demo-company/\n│       ├── demo_company.py      # Demo scraper implementation\n│       ├── demo_company_readme.md\n│       └── requirements.txt\n└── output/                      # Default output directory\n```\n\n## 🛠 Creating New Scrapers\n\n### Using the Template Generator\n\n```bash\n# Generate a new scraper template\npython jobminer_cli.py init\n\n# Follow the prompts to create your scraper\n```\n\n### Manual Creation\n\n1. Create a new directory in `scrapers/`\n2. Implement the `BaseScraper` class:\n\n```python\nfrom base_scraper import BaseScraper, JobListing\n\nclass YourScraper(BaseScraper):\n    def get_job_urls(self, search_term, location=\"\", max_pages=1):\n        # Implement job URL extraction\n        pass\n    \n    def parse_job(self, job_url):\n        # Implement job detail parsing\n        return JobListing(...)\n```\n\n3. Test your scraper:\n\n```bash\npython your_scraper.py\n```\n\n## ⚙️ Configuration\n\n### Environment Variables\n\nCopy `.env.example` to `.env` and customize:\n\n```bash\n# Database\nJOBMINER_DATABASE_URL=sqlite:///jobminer.db\n\n# Logging\nJOBMINER_LOG_LEVEL=INFO\n\n# Scraper settings\nJOBMINER_DEFAULT_DELAY=2.0\n```\n\n### Configuration File\n\nJobMiner automatically creates `jobminer_config.json` with default settings:\n\n```json\n{\n  \"default_output_format\": \"both\",\n  \"output_directory\": \"output\",\n  \"default_scraper_config\": {\n    \"delay\": 2.0,\n    \"timeout\": 30,\n    \"max_retries\": 3\n  }\n}\n```\n\n## feature/complete-jobminer-toolkit\n\n## 💾 Database Integration\n\nEnable database storage for persistent job data:\n\n```python\nfrom config import get_config\nfrom database import get_db_manager\n\n# Enable database in config\nconfig = get_config()\nconfig.database.enabled = True\n\n# Save jobs to database\ndb_manager = get_db_manager()\ndb_manager.save_jobs(jobs, scraper_name=\"demo-company\")\n\n# Search jobs\nresults = db_manager.search_jobs(\"python developer\")\n```\n\n## 📊 CLI Commands\n\n```bash\n# List available scrapers\njobminer list-scrapers\n\n# Scrape jobs\njobminer scrape SCRAPER_NAME \"SEARCH_TERM\" [OPTIONS]\n\n# Analyze results\njobminer analyze FILE_PATH\n\n# Generate new scraper template\njobminer init\n```\n\n### CLI Options\n\n- `--location, -l`: Search location\n- `--pages, -p`: Number of pages to scrape\n- `--output, -o`: Output filename\n- `--format, -f`: Output format (json/csv/both)\n- `--delay, -d`: Delay between requests\n\n## 🤝 Contributing\n\nWe welcome contributions! This project is **Hacktoberfest-friendly** 🎃\n\n### Ways to Contribute\n\n- **Add new scrapers** for popular job sites\n- **Improve existing scrapers** with better parsing\n- **Add features** like advanced filtering or export options\n- **Fix bugs** and improve error handling\n- **Improve documentation** and examples\n\n### Getting Started\n\n1. Fork the repository\n2. Create a feature branch: `git checkout -b feature/your-feature`\n3. Make your changes and test thoroughly\n4. Submit a pull request with a clear description\n\nSee [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.\n\n## 📋 Supported Job Sites\n\nCurrently implemented scrapers:\n\n- **Demo Company** - Template/example scraper for testing\n\n### Planned Scrapers\n\n- LinkedIn Jobs\n- Indeed\n- Glassdoor\n- AngelList\n- Stack Overflow Jobs\n- Remote.co\n\n*Want to add a scraper for your favorite job site? Check out our [contribution guide](CONTRIBUTING.md)!*\n\n## 🔧 Requirements\n\n- Python 3.8+\n- requests\n- beautifulsoup4\n- pandas\n- click\n- sqlalchemy (optional, for database features)\n- selenium (optional, for JavaScript-heavy sites)\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\n\n## 🙏 Acknowledgments\n\n- Built for the open-source community\n- Hacktoberfest 2024 participant\n- Inspired by the need for better job market analysis tools\n\n## 📞 Support\n\n- 🐛 **Bug Reports**: [GitHub Issues](https://github.com/beingvirus/JobMiner/issues)\n- 💡 **Feature Requests**: [GitHub Discussions](https://github.com/beingvirus/JobMiner/discussions)\n- 📖 **Documentation**: [Project Wiki](https://github.com/beingvirus/JobMiner/wiki)\n\n---\n\n**Happy Job Mining! 🎯**\n\n*Made with ❤️ by the open-source community*\n=======\n---\n\n## Contributors ✨\n\n![Contributors Hall of Fame](https://contrib.rocks/image?repo=beingvirus/JobMiner)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeingvirus%2Fjobminer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbeingvirus%2Fjobminer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbeingvirus%2Fjobminer/lists"}