Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dreamjet31/linkedinscraping-using-zenrow
Scraping linkedin company profile using puppeteer and zenrow
https://github.com/dreamjet31/linkedinscraping-using-zenrow
axios cheerio googlesheetapi javascript linkedin linkedin-scraper parallel puppeteer zenrows
Last synced: about 1 month ago
JSON representation
Scraping linkedin company profile using puppeteer and zenrow
- Host: GitHub
- URL: https://github.com/dreamjet31/linkedinscraping-using-zenrow
- Owner: dreamjet31
- Created: 2023-08-26T14:04:11.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-04-12T15:33:08.000Z (9 months ago)
- Last Synced: 2024-06-04T00:09:51.967Z (8 months ago)
- Topics: axios, cheerio, googlesheetapi, javascript, linkedin, linkedin-scraper, parallel, puppeteer, zenrows
- Language: JavaScript
- Homepage:
- Size: 15.6 KB
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# LinkedIn Company Profile Scraper
## Introduction
This project is aimed to scrape public company profile data from LinkedIn and store them into Google Sheets. The project utilizes ZenRows API for web scraping in order to bypass LinkedIn's CAPTCHA tools and avoid IP blocking problems caused by sending too many requests in a short period of time. Axios and cheerio are used for API interaction. 8 requests are sent in parallel each time and the results are saved into a Google Sheet.## Pre-requisites
You need to fill up the environment variables in the .env. Here is an example file:.env.example:
```
GCLOUD_PROJECT=
GOOGLE_APPLICATION_CREDENTIALS=./service_account_credentials.json
SPREADSHEETID=
SHEETNAME=
PARALLEL=8
APIKEY=
```
Ensure you have node.js installed, and the suitable authentication and access permissions for Zenrows and Google Sheets API.## Steps to run the project:
## Installation
1. Clone the repository from [here](https://github.com/flurryunicorn/linkedinScraping-using-zenrow).
2. Run `npm install` to install all the necessary dependencies.
3. Create a `.env` file based on the example `.env.example` and fill it with your real credentials.
4. Run the scraping command using `node index.js`.Make sure your Googlesheet is shared with the client email.
## Data Storage
The scraped data is stored in a Google Sheet using Google Sheets API (googleapis). This makes it easy to view, share, and work with the scraped data. The spreadsheet ID and sheet name are specified in the .env file.## Built With
- Node.js
- Axios - Used to make HTTP requests
- Cheerio - Used for web scraping
- ZenRows API - Used to bypass CAPTCHA and IP blocking
- Google Sheets API (googleapis) - Used to store the scraped dataFeel free to fork or clone this repository for your own purposes. Contributions are also welcomed!