https://github.com/henrylin03/job-scraper
Python web-scraper for job postings
https://github.com/henrylin03/job-scraper
python scraper selenium web-scraping
Last synced: about 1 month ago
JSON representation
Python web-scraper for job postings
- Host: GitHub
- URL: https://github.com/henrylin03/job-scraper
- Owner: henrylin03
- License: mit
- Created: 2022-09-03T03:32:35.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2023-05-06T08:59:45.000Z (about 3 years ago)
- Last Synced: 2025-09-07T20:02:49.879Z (10 months ago)
- Topics: python, scraper, selenium, web-scraping
- Language: Python
- Homepage:
- Size: 46.5 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# job-scraper
This project is a Python-based web scraper that extracts job listings from [Indeed.com](https://indeed.com) to accelerate the job application process. The scraper uses Selenium to extract information such as job titles, company names, and salary estimates.
## Technologies Used
- **`selenium`**
- **`ChromeDriver`**
- **`xlsxwriter`**
## How to Use the Project
1. Clone this repository to your local machine:
```bash
git clone https://github.com/YOUR_USERNAME/job-scraper.git
```
2. Install the required packages by running the following command in your terminal:
```bash
pip install -r requirements.txt
```
3. In the `main` function, update the `search_url()` function with your desired keywords and location to search. For example, to search for "data analyst" positions in "Australia":
```python
search("data analyst", "Australia")
```
4. In the `main` function, update the second argument of the `scrape_pages()` function with the number of pages of search results to scrape. For example, the following code extracts job information for the first 5 pages of search results:
```python
scrape_pages("url", 5)
```
By default, the script extracts job information from the first page only.
5. Run the script to install ChromeDriver through `ChromeDriverManager().install()`. The script will then extract job posting information as a `pandas` DataFrame and export it to an Excel workbook (`output.xlsx`).
```python
python jobs-scraper.py
```
6. To modify the appearance of the output Excel workbook, feel free to modify the `.add_format()` arguments.
## Conclusion
Through developing this job scraper, I sharpened my skills in web scraping and data extraction. Future features may include extracting job postings from other job posting sites, and extracting additional attributes such as type of work and closing date. If you have any feedback or suggestions, please feel free to raise a [GitHub Issue](https://github.com/henrylin03/job-scraper/issues). Thank you for your support!