Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/alehkot/job-funnel-ts
Automated tool for scraping job postings into a .xlsx files inspired by Job Funnel.
https://github.com/alehkot/job-funnel-ts
crawler hacktoberfest jobs typescript
Last synced: about 2 months ago
JSON representation
Automated tool for scraping job postings into a .xlsx files inspired by Job Funnel.
- Host: GitHub
- URL: https://github.com/alehkot/job-funnel-ts
- Owner: alehkot
- License: gpl-3.0
- Created: 2020-10-03T19:54:42.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2020-10-31T14:05:46.000Z (about 4 years ago)
- Last Synced: 2024-04-29T19:21:14.179Z (9 months ago)
- Topics: crawler, hacktoberfest, jobs, typescript
- Language: TypeScript
- Homepage:
- Size: 297 KB
- Stars: 27
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# Job Funnel TS
Automated tool for scraping job postings into a .xlsx files inspired by [Job Funnel](https://github.com/PaulMcInnis/JobFunnel) and written in Typescript. Currently supports [LinkedIn](https://www.linkedin.com/), [Monster](https://www.monster.com/) (experimental), [Glassdoor](https://www.glassdoor.com/) (experimental), [Indeed](https://www.indeed.com/) (experimental).
## Usage
1. Ensure that you have Node 14.x+ installed and available globally
1. Install Job Funnel as a global package: `npm install -g job-funnel`
1. Generate a new config file in a local folder: `job-funnel generate-config`
1. Update 'config.yaml' file and specify your credentials
1. Update 'pages' section in 'config.yaml' files with a list of search pages URLs. You can get a URL by running a search on a supported job website, e.g. [LinkedIn Jobs](https://www.linkedin.com/jobs/) page, and copying the URL of the results page
1. Run the crawlers: `job-funnel scan`. To run all of the supported crawlers, including experimental, specify them explicitly using '--sites' parameter: `job-funnel scan --sites linkedin monster glassdoor indeed`
1. Export the results to 'report.xlsx' file: `job-funnel export`### Optional Steps
1. If needed, run `job-funnel wipe-db` to wipe the cached job results database
1. Run `job-funnel --debug scan` to see the crawling process. It's useful for troubleshooting sometimes. Use `job-funnel --debug scan --sites linkedin monster glassdoor indeed` to use all of the available crawlers in debug mode
1. Run `job-funnel` without any parameters to see help### Config Example
```yaml
crawlers:
linkedin:
pages:
- https://www.linkedin.com/jobs/search/?f_E=2%2C3%2C4&f_TPR=r604800&geoId=90000070&keywords=qa%20analyst&location=New%20York%20City%20Metropolitan%20Area&f_TP=1%2C2&redirect=false&position=1&pageNum=0
- https://www.linkedin.com/jobs/search/?distance=50&f_E=2%2C3%2C4&f_TPR=r86400&geoId=104047727&keywords=qa%20analyst&location=Jersey%20City%2C%20New%20Jersey%2C%20United%20States&f_TP=1%2C2&redirect=false&position=1&pageNum=0
- https://www.linkedin.com/jobs/search?keywords=Qa%20tester&location=New%20York%20City%20Metropolitan%20Area&geoId=90000070&trk=public_jobs_jobs-search-bar_search-submit&redirect=false&position=1&pageNum=0
credentials:
username: [email protected]
password: foobarbaz
monster:
pages:
- https://www.monster.com/jobs/search/Full-Time_8?q=qa-analyst&where=07302&rad=20&tm=3&jobid=220754835
filters:
radius: 40 # supported values: (empty), 5, 10, 20, 30, 40, 50, 60, 75, 100, 150, 200
job_status: Full-Time # supported values: (empty), Part-Time
posted: 1 # supported values: (empty), -1 (any date), 0 (today), 1 (yesterday), 3 (last 3 days), 7 (last 7 days), 14 (last 14 days), 30 (last 30 days)
credentials:
username: [email protected]
password: foobarbaz
glassdoor:
pages:
- https://www.glassdoor.com/Job/jersey-city-qa-analyst-jobs-SRCH_IL.0,11_IC1126819_KO12,22.htm?jobType=fulltime&fromAge=1&radius=50
credentials:
username: [email protected]
password: foobarbaz
indeed:
pages:
- https://www.indeed.com/jobs?q=QA%20Analyst&l=Jersey%20City%2C%20NJ&radius=50&rbl=New%20York%2C%20NY&jlid=45f6c4ded55c00bf&jt=fulltime&vjk=a573133dd9847a53
credentials:
username: [email protected]
password: foobarbaz
```## Development
### Installation
1. Ensure that you have Node 14.x+ installed and available globally
1. Clone the repository: `git clone https://github.com/alehkot/job-funnel-ts.git`
1. Run `npm install`
1. Run `cp config.yaml.sample config.yaml` to create your configuration file## Running the crawlers
1. Update 'config.yaml' file and specify your credentials
1. Update 'pages' section in 'config.yaml' files with a list of search pages URLs. You can get a URL by running a search on a supported website copying the URL of the results page
1. Use `npm run start-dev -- scan` to run a crawler
1. Use `npm run start-dev -- export` to export results into 'report.xlsx' file
(it's possible to run this command any number of times, subsequent
runs will just append new results to the database table)
1. Use `npm run start-dev -- wipe-db` to wipe the database
1. Use `npm run start-dev -- generate-config` to generate a new config file
1. Use `--debug` global flag to disable headless Puppeteer mode and increase artificial crawling delays, for example `npm run start-dev -- --debug scan`## Notes
The development in the very early stages, so there might be considerable code changes in the future.