https://github.com/beingvirus/jobminer

JobMiner – A Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data.
https://github.com/beingvirus/jobminer

automation beautifulsoup career crawler data-collection data-mining hacktoberfest hacktoberfest-accepted hacktoberfest2025 job-scraper jobs open-source python selenium web-scraping

Last synced: 8 months ago
JSON representation

JobMiner – A Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data.

Host: GitHub
URL: https://github.com/beingvirus/jobminer
Owner: beingvirus
License: mit
Created: 2025-10-01T03:42:53.000Z (9 months ago)
Default Branch: main
Last Pushed: 2025-10-01T17:44:59.000Z (9 months ago)
Last Synced: 2025-10-01T18:18:14.099Z (9 months ago)
Topics: automation, beautifulsoup, career, crawler, data-collection, data-mining, hacktoberfest, hacktoberfest-accepted, hacktoberfest2025, job-scraper, jobs, open-source, python, selenium, web-scraping
Language: Python
Homepage:
Size: 31.3 KB
Stars: 5
Watchers: 1
Forks: 9
Open Issues: 13
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
- Security: Security.md

Awesome Lists containing this project

README

          # JobMiner 🔍

JobMiner is a powerful Python-based web scraping toolkit for extracting and organizing job listings from multiple websites into structured data. Built with modularity and extensibility in mind, it provides a robust foundation for job market analysis and automated job searching.

## ✨ Features

- **Modular Architecture**: Easy-to-extend scraper system with base classes

- **Multiple Output Formats**: Export to JSON, CSV, or both

- **Database Integration**: Optional SQLite/PostgreSQL storage with search capabilities

- **CLI Interface**: Command-line tool for easy scraping operations

- **Configuration Management**: Flexible configuration system with environment variables

- **Rate Limiting**: Built-in delays and respectful scraping practices

- **Error Handling**: Comprehensive logging and error recovery

- **Template Generation**: Quick scraper template creation for new job sites

## 🚀 Quick Start

### Installation

```bash

# Clone the repository

git clone https://github.com/beingvirus/JobMiner.git

cd JobMiner

# Install dependencies

pip install -r requirements.txt

# Optional: Install as package

pip install -e .

```

### Basic Usage

```bash

# List available scrapers

python jobminer_cli.py list-scrapers

# Run demo scraper

python jobminer_cli.py scrape demo-company "python developer" --location "san francisco" --pages 2

# Analyze scraped data

python jobminer_cli.py analyze jobs.json

```

### Python API

```python

from scrapers.demo_company.demo_company import DemoCompanyScraper

# Initialize scraper

scraper = DemoCompanyScraper()

# Scrape jobs

jobs = scraper.scrape_jobs(

    search_term="python developer",

    location="san francisco",

    max_pages=2

)

# Save results

scraper.save_to_json(jobs, "jobs.json")

scraper.save_to_csv(jobs, "jobs.csv")

```

## 📁 Project Structure

```

JobMiner/

├── base_scraper.py              # Base scraper class with common functionality

├── jobminer_cli.py              # Command-line interface

├── config.py                    # Configuration management

├── database.py                  # Database integration (optional)

├── requirements.txt             # Project dependencies

├── setup.py                     # Package setup

├── .env.example                 # Environment variables template

├── scrapers/                    # Individual scraper implementations

│   └── demo-company/

│       ├── demo_company.py      # Demo scraper implementation

│       ├── demo_company_readme.md

│       └── requirements.txt

└── output/                      # Default output directory

```

## 🛠 Creating New Scrapers

### Using the Template Generator

```bash

# Generate a new scraper template

python jobminer_cli.py init

# Follow the prompts to create your scraper

```

### Manual Creation

1. Create a new directory in `scrapers/`

2. Implement the `BaseScraper` class:

```python

from base_scraper import BaseScraper, JobListing

class YourScraper(BaseScraper):

    def get_job_urls(self, search_term, location="", max_pages=1):

        # Implement job URL extraction

        pass

    

    def parse_job(self, job_url):

        # Implement job detail parsing

        return JobListing(...)

```

3. Test your scraper:

```bash

python your_scraper.py

```

## ⚙️ Configuration

### Environment Variables

Copy `.env.example` to `.env` and customize:

```bash

# Database

JOBMINER_DATABASE_URL=sqlite:///jobminer.db

# Logging

JOBMINER_LOG_LEVEL=INFO

# Scraper settings

JOBMINER_DEFAULT_DELAY=2.0

```

### Configuration File

JobMiner automatically creates `jobminer_config.json` with default settings:

```json

{

  "default_output_format": "both",

  "output_directory": "output",

  "default_scraper_config": {

    "delay": 2.0,

    "timeout": 30,

    "max_retries": 3

  }

}

```

## feature/complete-jobminer-toolkit

## 💾 Database Integration

Enable database storage for persistent job data:

```python

from config import get_config

from database import get_db_manager

# Enable database in config

config = get_config()

config.database.enabled = True

# Save jobs to database

db_manager = get_db_manager()

db_manager.save_jobs(jobs, scraper_name="demo-company")

# Search jobs

results = db_manager.search_jobs("python developer")

```

## 📊 CLI Commands

```bash

# List available scrapers

jobminer list-scrapers

# Scrape jobs

jobminer scrape SCRAPER_NAME "SEARCH_TERM" [OPTIONS]

# Analyze results

jobminer analyze FILE_PATH

# Generate new scraper template

jobminer init

```

### CLI Options

- `--location, -l`: Search location

- `--pages, -p`: Number of pages to scrape

- `--output, -o`: Output filename

- `--format, -f`: Output format (json/csv/both)

- `--delay, -d`: Delay between requests

## 🤝 Contributing

We welcome contributions! This project is **Hacktoberfest-friendly** 🎃

### Ways to Contribute

- **Add new scrapers** for popular job sites

- **Improve existing scrapers** with better parsing

- **Add features** like advanced filtering or export options

- **Fix bugs** and improve error handling

- **Improve documentation** and examples

### Getting Started

1. Fork the repository

2. Create a feature branch: `git checkout -b feature/your-feature`

3. Make your changes and test thoroughly

4. Submit a pull request with a clear description

See [CONTRIBUTING.md](CONTRIBUTING.md) for detailed guidelines.

## 📋 Supported Job Sites

Currently implemented scrapers:

- **Demo Company** - Template/example scraper for testing

### Planned Scrapers

- LinkedIn Jobs

- Indeed

- Glassdoor

- AngelList

- Stack Overflow Jobs

- Remote.co

*Want to add a scraper for your favorite job site? Check out our [contribution guide](CONTRIBUTING.md)!*

## 🔧 Requirements

- Python 3.8+

- requests

- beautifulsoup4

- pandas

- click

- sqlalchemy (optional, for database features)

- selenium (optional, for JavaScript-heavy sites)

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

- Built for the open-source community

- Hacktoberfest 2024 participant

- Inspired by the need for better job market analysis tools

## 📞 Support

- 🐛 **Bug Reports**: [GitHub Issues](https://github.com/beingvirus/JobMiner/issues)

- 💡 **Feature Requests**: [GitHub Discussions](https://github.com/beingvirus/JobMiner/discussions)

- 📖 **Documentation**: [Project Wiki](https://github.com/beingvirus/JobMiner/wiki)

---

**Happy Job Mining! 🎯**

*Made with ❤️ by the open-source community*

=======

---

## Contributors ✨

![Contributors Hall of Fame](https://contrib.rocks/image?repo=beingvirus/JobMiner)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/beingvirus/jobminer

Awesome Lists containing this project

README