Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/hasnocool/indeed-job-scraper
A web scraper built using Selenium and Python to extract job listings from Indeed.com with rate limiting and logging features.
https://github.com/hasnocool/indeed-job-scraper
indeed-job-scraper job-listings json-csv-export python-data-extraction rate-limiting selenium-web-scraping
Last synced: about 2 months ago
JSON representation
A web scraper built using Selenium and Python to extract job listings from Indeed.com with rate limiting and logging features.
- Host: GitHub
- URL: https://github.com/hasnocool/indeed-job-scraper
- Owner: hasnocool
- Created: 2023-09-13T17:38:06.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-18T01:02:36.000Z (3 months ago)
- Last Synced: 2024-09-18T04:53:44.649Z (3 months ago)
- Topics: indeed-job-scraper, job-listings, json-csv-export, python-data-extraction, rate-limiting, selenium-web-scraping
- Language: Python
- Homepage:
- Size: 8.79 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
**README.md**
================================**Indeed Job Scraper**
======================**Project Title**: Indeed Job Scraper
----------------------------------------I built this to automate the process of scraping job listings from Indeed.com, making it easier to collect and analyze data on job postings in a specific location. This project leverages web scraping techniques using Selenium and JSON parsing with Python.
**Description**
---------------Indeed Job Scraper is designed to fetch job listings from Indeed.com based on specified criteria (e.g., sponsorship, Chicago, IL), then parse the extracted data into a more structured format (JSON) for further analysis. The tool includes rate limiting to prevent overloading the website and ensure smooth operation.
**Features**
------------* **Web Scraping**: Utilizes Selenium to fetch job listings from Indeed.com.
* **Rate Limiting**: Includes a retry mechanism with delays to avoid overwhelming the website.
* **JSON Output**: Exports extracted data in JSON format for further processing.
* **CSV Conversion**: Optionally, parses JSON output into a CSV file.**Installation**
----------------### Prerequisites
* Python 3.x (preferably 3.9 or later)
* Selenium WebDriver (ChromeDriver)
* json and csv libraries### Installation Steps
1. Clone this repository using Git.
2. Install required libraries using pip: `pip install selenium`
3. Download the ChromeDriver from [here](https://chromedriver.chromium.org/downloads) and add it to your system's PATH.**Usage**
----------### Running the Scraper
1. Execute the `job_scraper_with_rate_limiting.py` script.
2. The tool will fetch job listings based on the specified criteria (sponsorship, Chicago, IL).
3. It will parse extracted data into JSON format and save it to a file named `log_{timestamp}.json`.### Optional CSV Conversion
1. After running the scraper, execute the `parse_json_file_to_csv.py` script.
2. This will convert the JSON output from the previous step into a CSV file named `job_data_extended.csv`.**Contributing**
---------------Contributions are welcome! If you'd like to enhance this project or add new features, please follow these steps:
1. Fork this repository on GitHub.
2. Make your changes in a new branch (e.g., `feature/new-feature`).
3. Commit your changes with descriptive commit messages.
4. Submit a pull request for review.**License**
----------Indeed Job Scraper is released under the [MIT License](https://opensource.org/licenses/MIT).
**Tags/Keywords**
-----------------Indeed, web scraping, Selenium, rate limiting, JSON parsing, CSV conversion
[![Python Version](https://img.shields.io/badge/Python-3.x-green.svg)](https://www.python.org/)
[![Selenium](https://img.shields.io/badge/Selenium-4.0-green.svg)](https://selenium.dev/)