An open API service indexing awesome lists of open source software.

https://github.com/khaleddallah/LinkedinScraper

Python Scrapy project parse people profiles of Linkedin Search and arrange result content in Excel and Json file
https://github.com/khaleddallah/LinkedinScraper

crawler excel json linkedin python scraper scrapy spider

Last synced: about 1 year ago
JSON representation

Python Scrapy project parse people profiles of Linkedin Search and arrange result content in Excel and Json file

Awesome Lists containing this project

README

          

# Linkedin Scraper using Scrapy
![](https://github.com/khaleddallah/LinkedinScraperProject/blob/master/Readme-Images/E.png)
* Scrape number of profiles that exist in result of Linkedin searchUrl.
* Export the content of profiles to Excel and Json files.


## Installation

* Use the package manager [pip](https://pip.pypa.io/en/stable/) to install Scrapy.
(Anaconda Recomended)
```
cd LinkedinScraperProject
pip install -r requirements.txt
```
* clone the project
```
git clone https://github.com/khaleddallah/GoogleImageScrapyDownloader.git
```

## Usage
* get into the directory of the project:
```
cd LinkedinScraperProject
```
* to get help :
```
python LinkedinScraper -h
```


usage:
python LinkedinScraper [-h] [-n NUM] [-o OUTPUT] [-p] [-f format] [-m excelMode] (searchUrl or profilesUrl)

positional arguments:
searchUrl URL of Linkedin search URL or Profiles URL

optional arguments:
-h, --help show this help message and exit
-n NUM num of profiles
** the number must be lower or equal of result number
'page' will parse profiles of url page (10 profiles) (Default)
-o OUTPUT Output file
-p Enable Parse Profiles
-f FORMAT json Json output file
excel Excel file output
all Json and Excel output files
-m EXCELMODE 1 to make each profile in Excel file appear in one row
m to make each profile in Excel file appear in multi row

## Examples

* Parse ( https://www.linkedin.com/in/khaled-dallah/ and https://www.linkedin.com/in/linustorvalds/ ) profiles and export the result content to ABC.xlsx and ABC.json

(-p) because of parsing single profiles
```
python LinkedinScraper -p -o 'ABC' 'https://www.linkedin.com/in/khaled-dallah/' 'https://www.linkedin.com/in/linustorvalds/'
```

* Parse 23 profiles of searchUrl [https://www.linkedin.com/.../?keywords=Robotic&...&](https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER)

if you don't set output name by (-o), Name of result files will be value of keywords (Robotic)
```
python LinkedinScraper -n 23 'https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER'
```

* Parse 17 profiles of searchUrl [https://www.linkedin.com/.../?keywords=Robotic&...&](https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER)

and get output as excel file and put the information of each profile in one row
```
python LinkedinScraper -n 17 -f excel -m 1 'https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER'
```

## Built with
* Python 3.7
* Scrapy
* openpyxl

## Author

* **Khaled Dallah** - *Software Engineer* | *Python/c++ Developer*
khaled.dallah0@gmail.com

## Issues:
Report bugs and feature requests
[here](https://github.com/khaleddallah/LinkedinScraperProject/issues).

## Contribute
Contributions are always welcome!

## License

This project is licensed under the LGPL-V3.0 License - see the [LICENSE.md](https://github.com/khaleddallah/LinkedinScraperProject/blob/master/LICENSE) file for details