{"id":20106773,"url":"https://github.com/nneji123/ycombinator-scraper","last_synced_at":"2025-09-21T00:31:10.180Z","repository":{"id":219221739,"uuid":"747209179","full_name":"Nneji123/ycombinator-scraper","owner":"Nneji123","description":"A Python library and cli tool for scraping companies, jobs, and founders data from Workatastartup.com.","archived":false,"fork":false,"pushed_at":"2024-03-06T00:35:57.000Z","size":1334,"stargazers_count":20,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-19T14:04:02.159Z","etag":null,"topics":["automation","cli","library","mkdocs-material","package","pypi","python","selenium","webscraping","ycombinator"],"latest_commit_sha":null,"homepage":"https://nneji123.github.io/ycombinator-scraper","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Nneji123.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-23T13:41:15.000Z","updated_at":"2025-08-07T17:41:15.000Z","dependencies_parsed_at":"2024-02-16T18:53:37.775Z","dependency_job_id":null,"html_url":"https://github.com/Nneji123/ycombinator-scraper","commit_stats":null,"previous_names":["nneji123/ycombinator-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Nneji123/ycombinator-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nneji123%2Fycombinator-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nneji123%2Fycombinator-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nneji123%2Fycombinator-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nneji123%2Fycombinator-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Nneji123","download_url":"https://codeload.github.com/Nneji123/ycombinator-scraper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Nneji123%2Fycombinator-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276179260,"owners_count":25598565,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-20T02:00:10.207Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","cli","library","mkdocs-material","package","pypi","python","selenium","webscraping","ycombinator"],"created_at":"2024-11-13T17:54:43.994Z","updated_at":"2025-09-21T00:31:09.843Z","avatar_url":"https://github.com/Nneji123.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# YCombinator-Scraper\n\n\u003cdiv align=\"center\"\u003e\n\n\u003cimg src=\"https://raw.githubusercontent.com/nneji123/ycombinator-scraper/main/docs/img/logo.png\" alt=\"Ycombinator_Scraper logo\" width=\"200\" height=\"200\" role=\"img\"\u003e\n\n| | |\n| --- | --- |\n| CI/CD | [![CI - Test](https://github.com/Nneji123/ycombinator-scraper/actions/workflows/tests.yml/badge.svg)](https://github.com/Nneji123/ycombinator-scraper/actions/workflows/tests.yml) [![publish-pypi](https://github.com/Nneji123/ycombinator-scraper/actions/workflows/pypi.yml/badge.svg)](https://github.com/Nneji123/ycombinator-scraper/actions/workflows/pypi.yml) [![Coverage](https://codecov.io/gh/Nneji123/ycombinator-scraper/graph/badge.svg?token=37muKJo0SL)](https://codecov.io/gh/Nneji123/ycombinator-scraper)|\n| Docs | [![Docs](https://github.com/Nneji123/ycombinator-scraper/actions/workflows/docs.yml/badge.svg)](https://github.com/Nneji123/ycombinator-scraper/actions/workflows/docs.yml) |\n| Package | [![PyPI - Version](https://img.shields.io/pypi/v/ycombinator-scraper.svg?logo=pypi\u0026label=PyPI\u0026logoColor=gold)](https://pypi.org/project/ycombinator-scraper/) [![PyPI - Downloads](https://img.shields.io/pypi/dm/ycombinator-scraper.svg?color=blue\u0026label=Downloads\u0026logo=pypi\u0026logoColor=gold)](https://pypi.org/project/ycombinator-scraper/) [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/ycombinator-scraper.svg?logo=python\u0026label=Python\u0026logoColor=gold)](https://pypi.org/project/ycombinator-scraper/) |\n| Meta |  [![linting - Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff) [![License - MIT](https://img.shields.io/badge/license-MIT-9400d3.svg)](./LICENSE) |\n\n\u003c/div\u003e\n\n-----\n\nYCombinator-Scraper provides a web scraping tool for extracting data from [Workatastartup](https://www.workatastartup.com/) website. The package uses Selenium and BeautifulSoup to navigate through the pages and extract information.\n\n---\n\n**Documentation**: \u003ca href=\"https://nneji123.github.io/ycombinator-scraper\" target=\"_blank\"\u003ehttps://nneji123.github.io/ycombinator-scraper\u003c/a\u003e\n\n**Source Code**: \u003ca href=\"https://github.com/nneji123/ycombinator-scraper\" target=\"_blank\"\u003ehttps://github.com/nneji123/ycombinator-scraper\u003c/a\u003e\n\n---\n\n# Sponsor\n[Proxycurl APIs](https://nubela.co/proxycurl/?utm_campaign=influencer_marketing\u0026utm_source=github\u0026utm_medium=social\u0026utm_content=ifeanyi_nneji_ycombinator_scraper)\n\n\n[\u003cimg src=\"https://github.com/Nneji123/ycombinator-scraper/assets/101701760/2f59fe31-f69d-41a8-ab7b-5b66fbe590ed\"\u003e](https://nubela.co/proxycurl?utm_campaign=influencer_marketing\u0026utm_source=github\u0026utm_medium=social\u0026utm_content=ifeanyi_nneji_ycombinator_scraper)\n\nScrape public LinkedIn profile data at scale with Proxycurl APIs.\n\n- Scraping Public profiles are battle tested in court in HiQ VS LinkedIn case.\n- GDPR, CCPA, SOC2 compliant.\n- High rate limit - 300 requests/minute.\n- Fast - APIs respond in ~2s.\n- Fresh data - 88% of data is scraped real-time, other 12% are not older than 29 days.\n- High accuracy.\n- Tons of data points returned per profile\n\nBuilt for developers, by developers.\n\n\n## Features\n\n- **Web Scraping Capabilities:**\n  - Extract detailed information about companies, including name, description, tags, images, job links, and social media links.\n  - Scrape job-specific details such as title, salary range, tags, and description.\n\n- **Founder and Company Data Extraction:**\n  - Obtain information about company founders, including name, image, description, linkedIn profile, and optional email addresses.\n\n- **Headless Mode:**\n  - Run the scraper in headless mode to perform web scraping without displaying a browser window.\n\n- **Configurability:**\n  - Easily configure scraper settings such as login credentials, logs directory, automatic install of webdriver based on browser with `webdriver-manager package` and using environment variables or a configuration file.\n\n- **Command-Line Interface (CLI):**\n  - Command-line tools to perform various scraping tasks interactively or in batch mode.\n\n- **Data Output Formats:**\n  - Save scraped data in JSON or CSV format, providing flexibility for further analysis or integration with other tools.\n\n- **Caching Mechanism:**\n  - Implement a caching feature to store function results for a specified duration, reducing redundant web requests and improving performance.\n\n- **Docker Support:**\n  - Package the scraper as a Docker image, enabling easy deployment and execution in containerized environments or run the prebuilt docker image `docker pull nneji123/ycombinator_scraper`.\n\n## Requirements\n\n- Python 3.9+\n- Chrome or Chromium browser installed.\n\n## Installation\n\n```console\n$ pip install ycombinator-scraper\n$ ycscraper --help\n\n# Output\nYCombinator-Scraper Version 0.7.0\nUsage: python -m ycombinator_scraper [OPTIONS] COMMAND [ARGS]...\n\nOptions:\n  --help  Show this message and exit.\n\nCommands:\n  login\n  scrape-company\n  scrape-founders\n  scrape-job\n  version\n```\n\n### With Docker\n```bash\n$ git clone https://github.com/Nneji12/ycombinator-scraper\n$ cd ycombinator-scraper\n$ docker build -t your_name/scraper_name . # e.g docker build -t nneji123/ycombinator_scraper .\n$ docker run nneji123/ycombinator_scraper python -m ycombinator_scraper --help\n```\n\n## Dependencies\n\n- **click**: Enables the creation of a command-line interface for interacting with the scraper tool.\n- **beautifulsoup4**: Facilitates the parsing and extraction of data from HTML and XML in the web scraping process.\n- **loguru**: Provides a robust logging framework to track and manage log messages generated during the scraping process.\n- **pandas**: Utilized for the manipulation and organization of data, particularly in generating CSV files from scraped information.\n- **pathlib**: Offers an object-oriented approach to handle file system paths, contributing to better file management within the project.\n- **pydantic**: Used for data validation and structuring the models that represent various aspects of scraped data.\n- **pydantic-settings**: Extends Pydantic to enhance the management of settings in the project.\n- **selenium**: Employs browser automation for web scraping, allowing interaction with dynamic web pages and extraction of information.\n\n## Usage\n\n### With CLI\n```bash\nycscraper scrape-company --company-url https://www.workatastartup.com/companies/example-inc\n```\n\nThis command will scrape data for the specified company and save it in the default output format (JSON).\n\n### With Library\n\n```python\nfrom ycombinator_scraper import Scraper\n\nscraper = Scraper()\ncompany_data = scraper.scrape_company_data(\"https://www.workatastartup.com/companies/example-inc\")\nprint(company_data.model_dump_json(by_alias=True,indent=2))\n```\nPydantic is used under the hood so methods like `model_dump_json` are available for all the scraped data.\n\n\u003e **You can view more examples here: [Examples](https://nneji123.github.io/ycombinator-scraper/usage/examples)**\n\n\n## Contribution\n\nWe welcome contributions from the community! To contribute to this project, follow the steps below.\n\n### Setting Up Development Environment\n\n#### Gitpod\n\nYou can use Gitpod, a free online VS Code-like environment, to quickly start contributing.\n\n[![Open in Gitpod](https://gitpod.io/button/open-in-gitpod.svg)](https://gitpod.io/#https://github.com/nneji123/ycombinator-scraper)\n\n#### Local Setup\n\n1. Clone the repository:\n\n    ```bash\n    git clone https://github.com/nneji123/ycombinator-scraper.git\n    cd ycombinator-scraper\n    ```\n\n2. Create a virtual environment (optional but recommended):\n\n    ```bash\n    python -m venv venv\n    source venv/bin/activate  # On Windows: venv\\Scripts\\activate\n    ```\n\n3. Install dependencies:\n\n    ```bash\n    pip install -r requirements.txt\n    ```\n\n### Running Tests\n\nMake sure to run tests before submitting a pull request.\n\n```bash\npip install -r requirements-test.txt\npytest tests\n```\n\n### Installing Documentation Requirements\n\nIf you make changes to documentation, install the necessary dependencies:\n\n```bash\npip install -r requirements-docs.txt\nmkdocs serve\n```\n\n### Setting Up Pre-Commit Hooks\n\nWe use `pre-commit` to ensure code quality. Install it by running:\n\n```bash\npip install pre-commit\npre-commit install\n```\n\nNow, `pre-commit` will run automatically before each commit to check for linting and other issues.\n\n### Submitting a Pull Request\n\n1. Fork the repository and create a new branch for your contribution:\n\n    ```bash\n    git checkout -b feature-or-fix-branch\n    ```\n\n2. Make your changes and commit them:\n\n    ```bash\n    git add .\n    git commit -am \"Your meaningful commit message\"\n    ```\n\n3. Push the changes to your fork:\n\n    ```bash\n    git push origin feature-or-fix-branch\n    ```\n\n4. Open a pull request on GitHub. Provide a clear title and description of your changes.\n\n\n## Documentation\n\nThe [documentation](https://nneji123.github.io/ycombinator-scraper/) is made with [Material for MkDocs](https://github.com/squidfunk/mkdocs-material) and is hosted by [GitHub Pages](https://docs.github.com/en/pages).\n\n## License\n\nYCombinator-Scraper is distributed under the terms of the [MIT](./LICENSE) license.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnneji123%2Fycombinator-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnneji123%2Fycombinator-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnneji123%2Fycombinator-scraper/lists"}