https://github.com/hasnocool/indeed-job-scraper

A web scraper built using Selenium and Python to extract job listings from Indeed.com with rate limiting and logging features.
https://github.com/hasnocool/indeed-job-scraper

chromedriver indeed job json listings logging pagination python scraper scraping script selenium web webdriver

Last synced: 5 months ago
JSON representation

A web scraper built using Selenium and Python to extract job listings from Indeed.com with rate limiting and logging features.

Host: GitHub
URL: https://github.com/hasnocool/indeed-job-scraper
Owner: hasnocool
Created: 2023-09-13T17:38:06.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-09-18T01:02:36.000Z (10 months ago)
Last Synced: 2024-12-25T20:41:41.792Z (6 months ago)
Topics: chromedriver, indeed, job, json, listings, logging, pagination, python, scraper, scraping, script, selenium, web, webdriver
Language: Python
Homepage:
Size: 8.79 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        **README.md**

================================

**Indeed Job Scraper**

======================

**Project Title**: Indeed Job Scraper

----------------------------------------

I built this to automate the process of scraping job listings from Indeed.com, making it easier to collect and analyze data on job postings in a specific location. This project leverages web scraping techniques using Selenium and JSON parsing with Python.

**Description**

---------------

Indeed Job Scraper is designed to fetch job listings from Indeed.com based on specified criteria (e.g., sponsorship, Chicago, IL), then parse the extracted data into a more structured format (JSON) for further analysis. The tool includes rate limiting to prevent overloading the website and ensure smooth operation.

**Features**

------------

*   **Web Scraping**: Utilizes Selenium to fetch job listings from Indeed.com.

*   **Rate Limiting**: Includes a retry mechanism with delays to avoid overwhelming the website.

*   **JSON Output**: Exports extracted data in JSON format for further processing.

*   **CSV Conversion**: Optionally, parses JSON output into a CSV file.

**Installation**

----------------

### Prerequisites

*   Python 3.x (preferably 3.9 or later)

*   Selenium WebDriver (ChromeDriver)

*   json and csv libraries

### Installation Steps

1.  Clone this repository using Git.

2.  Install required libraries using pip: `pip install selenium`

3.  Download the ChromeDriver from [here](https://chromedriver.chromium.org/downloads) and add it to your system's PATH.

**Usage**

----------

### Running the Scraper

1.  Execute the `job_scraper_with_rate_limiting.py` script.

2.  The tool will fetch job listings based on the specified criteria (sponsorship, Chicago, IL).

3.  It will parse extracted data into JSON format and save it to a file named `log_{timestamp}.json`.

### Optional CSV Conversion

1.  After running the scraper, execute the `parse_json_file_to_csv.py` script.

2.  This will convert the JSON output from the previous step into a CSV file named `job_data_extended.csv`.

**Contributing**

---------------

Contributions are welcome! If you'd like to enhance this project or add new features, please follow these steps:

1.  Fork this repository on GitHub.

2.  Make your changes in a new branch (e.g., `feature/new-feature`).

3.  Commit your changes with descriptive commit messages.

4.  Submit a pull request for review.

**License**

----------

Indeed Job Scraper is released under the [MIT License](https://opensource.org/licenses/MIT).

**Tags/Keywords**

-----------------

Indeed, web scraping, Selenium, rate limiting, JSON parsing, CSV conversion

[![Python Version](https://img.shields.io/badge/Python-3.x-green.svg)](https://www.python.org/)

[![Selenium](https://img.shields.io/badge/Selenium-4.0-green.svg)](https://selenium.dev/)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/hasnocool/indeed-job-scraper

Awesome Lists containing this project

README