{"id":36685088,"url":"https://github.com/federicodeponte/openjobs","last_synced_at":"2026-01-15T03:13:14.278Z","repository":{"id":332070026,"uuid":"1130502433","full_name":"federicodeponte/openjobs","owner":"federicodeponte","description":"AI-powered job scraper - extract listings from any careers page in 3 lines of code","archived":false,"fork":false,"pushed_at":"2026-01-09T04:47:13.000Z","size":209,"stargazers_count":0,"open_issues_count":3,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-01-12T18:27:40.220Z","etag":null,"topics":["ai","careers","firecrawl","gemini","job-scraper","python","scraping","web-scraping"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/openjobs/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/federicodeponte.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-01-08T15:45:29.000Z","updated_at":"2026-01-09T04:50:17.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/federicodeponte/openjobs","commit_stats":null,"previous_names":["federicodeponte/openjobs"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/federicodeponte/openjobs","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/federicodeponte%2Fopenjobs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/federicodeponte%2Fopenjobs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/federicodeponte%2Fopenjobs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/federicodeponte%2Fopenjobs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/federicodeponte","download_url":"https://codeload.github.com/federicodeponte/openjobs/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/federicodeponte%2Fopenjobs/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28442247,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-15T00:55:22.719Z","status":"online","status_checked_at":"2026-01-15T02:00:08.019Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","careers","firecrawl","gemini","job-scraper","python","scraping","web-scraping"],"created_at":"2026-01-12T11:12:35.316Z","updated_at":"2026-01-15T03:13:14.273Z","avatar_url":"https://github.com/federicodeponte.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OpenJobs\n\n[![PyPI version](https://badge.fury.io/py/openjobs.svg)](https://pypi.org/project/openjobs/)\n[![Python 3.9+](https://img.shields.io/badge/python-3.9+-blue.svg)](https://www.python.org/downloads/)\n[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)\n[![Tests](https://github.com/federicodeponte/openjobs/actions/workflows/ci.yml/badge.svg)](https://github.com/federicodeponte/openjobs/actions)\n\n**Scrape jobs from any careers page in 3 lines of code.** No custom scrapers needed.\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/demo.svg\" alt=\"OpenJobs Demo\" width=\"700\"\u003e\n\u003c/p\u003e\n\nWorks with JavaScript-heavy sites, React/Next.js SPAs, and complex ATS systems.\n\n---\n\n## Why OpenJobs?\n\n| Feature | OpenJobs | Scrapy | BeautifulSoup | Selenium |\n|---------|----------|--------|---------------|----------|\n| Works on any site | Yes | No (custom spider per site) | No (static HTML only) | Yes (but slow) |\n| Handles JavaScript | Yes (Firecrawl) | No | No | Yes |\n| AI extraction | Yes (Gemini) | No | No | No |\n| Setup time | 30 seconds | Hours | Hours | Minutes |\n| Maintenance | Zero | High | High | Medium |\n\n**The problem:** Every careers page has different HTML. Scrapy/BeautifulSoup need custom code per site. Selenium is slow and breaks often.\n\n**The solution:** OpenJobs uses Firecrawl (JS rendering) + Gemini AI (smart extraction) = works everywhere, no maintenance.\n\n---\n\n## Install\n\n```bash\npip install openjobs\n```\n\n## Quick Start\n\n```python\nfrom openjobs import scrape_careers_page\n\n# Scrape any careers page\njobs = scrape_careers_page(\"https://linear.app/careers\")\n\nfor job in jobs:\n    print(f\"{job['title']} - {job['location']}\")\n```\n\n**Environment variables needed:**\n```bash\nexport GOOGLE_API_KEY=your_key  # Free: https://aistudio.google.com/apikey\n```\n\nThat's it. No Firecrawl key needed for basic usage (uses cloud with generous free tier).\n\n---\n\n## Features\n\n### Find Careers Page URL\n\nDon't know the exact URL? OpenJobs finds it:\n\n```python\nfrom openjobs import discover_careers_url\n\nurl = discover_careers_url(\"stripe.com\")\n# Returns: https://stripe.com/jobs/search\n```\n\n### AI Enrichment\n\nExtract tech stacks, salary ranges, and categorize jobs:\n\n```python\nfrom openjobs import scrape_careers_page, process_jobs\n\njobs = scrape_careers_page(\"https://figma.com/careers\")\nenriched = process_jobs(jobs, enrich=True)\n\nfor job in enriched:\n    print(f\"{job['title_original']}\")\n    print(f\"  Category: {job['category']}\")\n    print(f\"  Tech: {job.get('tech_stack', [])}\")\n```\n\n### Filter by Category\n\n```python\n# Only engineering jobs\neng_jobs = process_jobs(jobs, enrich=True, filter_categories=[\"Software Engineering\"])\n```\n\n### Self-Hosted (Unlimited Free)\n\nRun Firecrawl locally for unlimited scraping:\n\n```bash\ngit clone https://github.com/federicodeponte/openjobs.git\ncd openjobs \u0026\u0026 docker compose up -d\n\nexport FIRECRAWL_URL=http://localhost:3002\n```\n\n---\n\n## Output\n\n```json\n{\n  \"company\": \"Linear\",\n  \"title\": \"Senior Software Engineer\",\n  \"department\": \"Engineering\",\n  \"location\": \"Remote (US/EU)\",\n  \"job_url\": \"https://linear.app/careers/...\",\n  \"slug\": \"linear-senior-software-engineer\",\n  \"date_scraped\": \"2025-01-08T10:00:00\"\n}\n```\n\nWith enrichment:\n\n```json\n{\n  \"category\": \"Software Engineering\",\n  \"subcategory\": \"Backend Engineer\",\n  \"tech_stack\": [\"TypeScript\", \"PostgreSQL\", \"Redis\"],\n  \"experience_years\": \"5+\",\n  \"salary_range\": \"$150,000 - $200,000\"\n}\n```\n\n---\n\n## Supported Sites\n\nWorks with most careers pages:\n\n| Type | Examples | Status |\n|------|----------|--------|\n| Company sites | stripe.com, linear.app, figma.com | Supported |\n| JavaScript SPAs | React, Next.js, Vue apps | Supported |\n| ATS platforms | Lever, Greenhouse, Ashby | Supported |\n| Heavy SPAs | Retool, Airtable, Vercel, Notion | Supported |\n| Job boards | LinkedIn, Indeed, Glassdoor | Blocked (ToS) |\n\n---\n\n## API Reference\n\n| Function | Description |\n|----------|-------------|\n| `scrape_careers_page(url)` | Scrape jobs from a careers page |\n| `discover_careers_url(domain)` | Find careers URL from domain |\n| `process_jobs(jobs, enrich=True)` | Enrich with AI categorization |\n| `scrape_with_firecrawl(url)` | Get page content as markdown |\n| `extract_jobs_from_markdown(md)` | Extract jobs from markdown |\n\n---\n\n## Environment Variables\n\n| Variable | Required | Description |\n|----------|----------|-------------|\n| `GOOGLE_API_KEY` | Yes | Gemini API key ([free](https://aistudio.google.com/apikey)) |\n| `FIRECRAWL_URL` | No | Self-hosted Firecrawl URL |\n| `FIRECRAWL_API_KEY` | No | Firecrawl cloud key ([500 free/mo](https://firecrawl.dev)) |\n\n---\n\n## How It Works\n\n```\nURL → Firecrawl (renders JS) → Gemini AI (extracts jobs) → Structured JSON\n```\n\n1. **Firecrawl** renders JavaScript and returns clean markdown\n2. **Fallback** extracts embedded JSON from React/Next.js data\n3. **Gemini AI** parses job listings intelligently\n4. **Output** returns structured job data\n\n---\n\n## Contributing\n\n```bash\ngit clone https://github.com/federicodeponte/openjobs.git\ncd openjobs\npip install -e \".[dev]\"\nmake test\n```\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffedericodeponte%2Fopenjobs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffedericodeponte%2Fopenjobs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffedericodeponte%2Fopenjobs/lists"}