https://github.com/afk-procrastinator/crunchbase-scraper
A Python-based tool for scraping company information from Crunchbase.
https://github.com/afk-procrastinator/crunchbase-scraper
crunchbase scraper selenium
Last synced: 4 months ago
JSON representation
A Python-based tool for scraping company information from Crunchbase.
- Host: GitHub
- URL: https://github.com/afk-procrastinator/crunchbase-scraper
- Owner: afk-procrastinator
- Created: 2025-01-14T16:10:03.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-01-15T16:37:14.000Z (over 1 year ago)
- Last Synced: 2025-07-09T11:04:44.308Z (12 months ago)
- Topics: crunchbase, scraper, selenium
- Language: Python
- Homepage:
- Size: 13.7 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Crunchbase Company Scraper
A Python-based tool for scraping company information from Crunchbase.
## Features
- 🔐 Automatic and manual login support
- 📋 Batch scraping from company list
- 🤖 Anti-detection measures with randomized delays
- 💾 CSV export with detailed company information
- 💱 Currency conversion
- 🌐 Proxy support via Selenium
## Data Points Collected
- Company name and legal name
- About/Description
- Funding information
- Location
- Employee count
- Company type (Public/Private)
- Website
- Year founded
- Company ranking
- Acquisitions count
- Investments count
- Exits count
- Stock symbol
- Operating status
## Prerequisites
- Python 3.8+
- Chrome browser
- Crunchbase account
## Installation
1. Clone the repository:
```bash
git clone https://github.com/afk-procrastinator/crunchbase-scraper
cd crunchbase-scraper
```
2. Create and activate a virtual environment:
```bash
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
```
3. Install dependencies:
```bash
pip install -r requirements.txt
```
4. Set up environment variables:
```bash
cp .env.template .env
```
Edit `.env` with your Crunchbase credentials:
```
CRUNCHBASE_EMAIL=your-email@example.com
CRUNCHBASE_PASSWORD=your-password
```
## Usage
1. Create a list of companies to scrape in `company_list.txt`, separated by newlines:
```
Company Name 1
Company Name 2
```
2. Run the scraper:
```bash
python main.py
```
The script will:
- Log in to Crunchbase
- Process each company in the list
- Save results to `companies.csv`
## Project Structure
```
├── src/
│ ├── auth.py # Authentication handling
│ ├── models.py # Data models
│ ├── scraper.py # Core scraping logic
│ ├── selectors.py # CSS selectors
│ └── utils.py # Utility functions
├── main.py # Entry point
├── requirements.txt # Dependencies
├── .env.template # Environment template
└── company_list.txt # Input companies
```
## Error Handling
- The scraper includes automatic retry logic for failed requests
- Manual login fallback if automatic login fails
- Graceful handling of missing data points
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## Disclaimer
This tool is for educational purposes only. Please review and comply with Crunchbase's terms of service before use.