https://github.com/federicodeponte/openjobs

AI-powered job scraper - extract listings from any careers page in 3 lines of code
https://github.com/federicodeponte/openjobs

ai careers firecrawl gemini job-scraper python scraping web-scraping

Last synced: 5 months ago
JSON representation

AI-powered job scraper - extract listings from any careers page in 3 lines of code

Host: GitHub
URL: https://github.com/federicodeponte/openjobs
Owner: federicodeponte
License: mit
Created: 2026-01-08T15:45:29.000Z (5 months ago)
Default Branch: main
Last Pushed: 2026-01-09T04:47:13.000Z (5 months ago)
Last Synced: 2026-01-12T18:27:40.220Z (5 months ago)
Topics: ai, careers, firecrawl, gemini, job-scraper, python, scraping, web-scraping
Language: Python
Homepage: https://pypi.org/project/openjobs/
Size: 204 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 3
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Security: SECURITY.md

Awesome Lists containing this project

README

          # OpenJobs

[![PyPI version](https://badge.fury.io/py/openjobs.svg)](https://pypi.org/project/openjobs/)

[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)

[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)

[![Tests](https://github.com/federicodeponte/openjobs/actions/workflows/ci.yml/badge.svg)](https://github.com/federicodeponte/openjobs/actions)

**Scrape jobs from any careers page in 3 lines of code.** No custom scrapers needed.



  



Works with JavaScript-heavy sites, React/Next.js SPAs, and complex ATS systems.

---

## Why OpenJobs?

| Feature | OpenJobs | Scrapy | BeautifulSoup | Selenium |

|---------|----------|--------|---------------|----------|

| Works on any site | Yes | No (custom spider per site) | No (static HTML only) | Yes (but slow) |

| Handles JavaScript | Yes (Firecrawl) | No | No | Yes |

| AI extraction | Yes (Gemini) | No | No | No |

| Setup time | 30 seconds | Hours | Hours | Minutes |

| Maintenance | Zero | High | High | Medium |

**The problem:** Every careers page has different HTML. Scrapy/BeautifulSoup need custom code per site. Selenium is slow and breaks often.

**The solution:** OpenJobs uses Firecrawl (JS rendering) + Gemini AI (smart extraction) = works everywhere, no maintenance.

---

## Install

```bash

pip install openjobs

```

## Quick Start

```python

from openjobs import scrape_careers_page

# Scrape any careers page

jobs = scrape_careers_page("https://linear.app/careers")

for job in jobs:

    print(f"{job['title']} - {job['location']}")

```

**Environment variables needed:**

```bash

export GOOGLE_API_KEY=your_key  # Free: https://aistudio.google.com/apikey

```

That's it. No Firecrawl key needed for basic usage (uses cloud with generous free tier).

---

## Features

### Find Careers Page URL

Don't know the exact URL? OpenJobs finds it:

```python

from openjobs import discover_careers_url

url = discover_careers_url("stripe.com")

# Returns: https://stripe.com/jobs/search

```

### AI Enrichment

Extract tech stacks, salary ranges, and categorize jobs:

```python

from openjobs import scrape_careers_page, process_jobs

jobs = scrape_careers_page("https://figma.com/careers")

enriched = process_jobs(jobs, enrich=True)

for job in enriched:

    print(f"{job['title_original']}")

    print(f"  Category: {job['category']}")

    print(f"  Tech: {job.get('tech_stack', [])}")

```

### Filter by Category

```python

# Only engineering jobs

eng_jobs = process_jobs(jobs, enrich=True, filter_categories=["Software Engineering"])

```

### Self-Hosted (Unlimited Free)

Run Firecrawl locally for unlimited scraping:

```bash

git clone https://github.com/federicodeponte/openjobs.git

cd openjobs && docker compose up -d

export FIRECRAWL_URL=http://localhost:3002

```

---

## Output

```json

{

  "company": "Linear",

  "title": "Senior Software Engineer",

  "department": "Engineering",

  "location": "Remote (US/EU)",

  "job_url": "https://linear.app/careers/...",

  "slug": "linear-senior-software-engineer",

  "date_scraped": "2025-01-08T10:00:00"

}

```

With enrichment:

```json

{

  "category": "Software Engineering",

  "subcategory": "Backend Engineer",

  "tech_stack": ["TypeScript", "PostgreSQL", "Redis"],

  "experience_years": "5+",

  "salary_range": "$150,000 - $200,000"

}

```

---

## Supported Sites

Works with most careers pages:

| Type | Examples | Status |

|------|----------|--------|

| Company sites | stripe.com, linear.app, figma.com | Supported |

| JavaScript SPAs | React, Next.js, Vue apps | Supported |

| ATS platforms | Lever, Greenhouse, Ashby | Supported |

| Heavy SPAs | Retool, Airtable, Vercel, Notion | Supported |

| Job boards | LinkedIn, Indeed, Glassdoor | Blocked (ToS) |

---

## API Reference

| Function | Description |

|----------|-------------|

| `scrape_careers_page(url)` | Scrape jobs from a careers page |

| `discover_careers_url(domain)` | Find careers URL from domain |

| `process_jobs(jobs, enrich=True)` | Enrich with AI categorization |

| `scrape_with_firecrawl(url)` | Get page content as markdown |

| `extract_jobs_from_markdown(md)` | Extract jobs from markdown |

---

## Environment Variables

| Variable | Required | Description |

|----------|----------|-------------|

| `GOOGLE_API_KEY` | Yes | Gemini API key ([free](https://aistudio.google.com/apikey)) |

| `FIRECRAWL_URL` | No | Self-hosted Firecrawl URL |

| `FIRECRAWL_API_KEY` | No | Firecrawl cloud key ([500 free/mo](https://firecrawl.dev)) |

---

## How It Works

```

URL → Firecrawl (renders JS) → Gemini AI (extracts jobs) → Structured JSON

```

1. **Firecrawl** renders JavaScript and returns clean markdown

2. **Fallback** extracts embedded JSON from React/Next.js data

3. **Gemini AI** parses job listings intelligently

4. **Output** returns structured job data

---

## Contributing

```bash

git clone https://github.com/federicodeponte/openjobs.git

cd openjobs

pip install -e ".[dev]"

make test

```

---

## License

MIT

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/federicodeponte/openjobs

Awesome Lists containing this project

README