Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/invictusaman/indeed-webscraper
I created a webscraper tool to fetch indeed data. It will return job title, company name, url of the job, salary(if present) and many more.
https://github.com/invictusaman/indeed-webscraper
Last synced: about 1 month ago
JSON representation
I created a webscraper tool to fetch indeed data. It will return job title, company name, url of the job, salary(if present) and many more.
- Host: GitHub
- URL: https://github.com/invictusaman/indeed-webscraper
- Owner: invictusaman
- License: mit
- Created: 2024-08-03T14:54:57.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-25T03:53:04.000Z (4 months ago)
- Last Synced: 2024-08-25T04:32:33.641Z (4 months ago)
- Language: Python
- Size: 1.7 MB
- Stars: 3
- Watchers: 1
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Indeed Scraper
I created a webscraper πΈοΈ tool to fetch indeed data. It will return job title, company name, job id, url of the job, salary(if present) and whole description of respective job.
## Step 1: Install dependencies
Install required dependencies in your project folder.
```
pip install -r requirements.txt
```
## Step 2: Run Indeed_Scraper.py
Make sure you have Chrome β¬οΈ latest version installed in your system. This step creates `scraped_job_file.csv`, however, you won't have job descriptions.
## Step 3: Run Extract_Description_Indeed.py
**Recommended: Clean your `scraped_job_file.csv` for duplicate values, before running this code.**
This step extracts `job_description` and assign them to the respective rows. It will take good amount of time, go grab a coffee β. O/P is updated `scraped_job_file.csv` with merged job description.
*I did not implement multi threading 𧡠(which would have otherwise saved you a lot of time), because of time and limited knowledge. Feel free to fork this repo and implement. Good luck. π€*
### Further Work:
Implement a pretrained NER model and extract information such as programming languages included, type of work(remote, hybrid, in-person), salaries from description column. Or, you can use simple logic to match respective words.
#### Follow my data-analyst journey: [Portfolio_Link](https://www.amanbhattarai.com)