https://github.com/llorenspujol/linkedin-jobs-scraper

LinkedIn Jobs Scraper running in Node.js that uses Puppeteer and RxJS to scrape job offers from LinkedIn.
https://github.com/llorenspujol/linkedin-jobs-scraper

nodejs puppeteer rxjs

Last synced: 4 months ago
JSON representation

LinkedIn Jobs Scraper running in Node.js that uses Puppeteer and RxJS to scrape job offers from LinkedIn.

Host: GitHub
URL: https://github.com/llorenspujol/linkedin-jobs-scraper
Owner: llorenspujol
License: mit
Created: 2023-09-21T13:15:04.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2023-10-30T14:52:09.000Z (over 1 year ago)
Last Synced: 2025-01-09T22:52:06.555Z (4 months ago)
Topics: nodejs, puppeteer, rxjs
Language: TypeScript
Homepage:
Size: 10.7 MB
Stars: 52
Watchers: 3
Forks: 8
Open Issues: 2
Metadata Files:
- Readme: README.md
- License: LICENSE.md

Awesome Lists containing this project

README

# LinkedIn Jobs Scraper

LinkedIn Jobs Scraper running in Node.js that uses Puppeteer and RxJS to scrape job offers from LinkedIn.

![Example video scraping linkedin job offers](/assets/video-showcase.gif)

> IMPORTANT: Web scraping can frequently violate the terms of service of a website. Always review and respect a website's robots.txt file and its Terms of Service. In this instance, this code should be used ONLY for teaching and hobby purposes. LinkedIn specifically prohibits any data extraction from its website; you can read more here: https://www.linkedin.com/legal/crawling-terms.

## Highlights
- 🔧 Parses LinkedIn job offers and returns the data in JSON format
- 📄 Loops through all the pages for a specified search params
- 🔁 Loops through as many search params as needed.
- ⚡️ Uses RxJS Observables instead of Promises
- 🛑 Handles 429 status code error
- 🛡 Handles Linkedin Authwall
- 💾 Saves the scraped data as JSON in an auto-generated `/data` folder
- 📝 It is written entirely in Typescript.

## How this code works
I wrote a blog explaining the code written in this repo with all the steps involved. You can find it [here](https://gironajs.com/en/blog/web-scraping-linkedin-jobs-using-puppeteer-and-rxjs)

### Quick start
**Node version >= 12 and NPM >= 6**

```bash
# clone the repo.
git clone https://github.com/your-username/linkedin-jobs-scraper.git

# go to the repo
cd linkedin-jobs-scraper

# install the dependencies via npm
npm install

# start scraping
npm run start
```

### NPM scripts

* `npm run start` - runs with puppeteer in headless mode.
* `npm run start:debug` - runs with puppeteer in non-headless mode.
* `npm run clean:data` - removes the folder `/data`

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/llorenspujol/linkedin-jobs-scraper

Awesome Lists containing this project

README