https://github.com/khaleddallah/LinkedinScraper
Python Scrapy project parse people profiles of Linkedin Search and arrange result content in Excel and Json file
https://github.com/khaleddallah/LinkedinScraper
crawler excel json linkedin python scraper scrapy spider
Last synced: about 1 year ago
JSON representation
Python Scrapy project parse people profiles of Linkedin Search and arrange result content in Excel and Json file
- Host: GitHub
- URL: https://github.com/khaleddallah/LinkedinScraper
- Owner: khaleddallah
- License: lgpl-3.0
- Created: 2018-12-18T10:45:18.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-09-07T14:12:23.000Z (almost 4 years ago)
- Last Synced: 2024-11-05T18:51:26.357Z (over 1 year ago)
- Topics: crawler, excel, json, linkedin, python, scraper, scrapy, spider
- Language: Python
- Size: 7.38 MB
- Stars: 6
- Watchers: 3
- Forks: 2
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Linkedin Scraper using Scrapy

* Scrape number of profiles that exist in result of Linkedin searchUrl.
* Export the content of profiles to Excel and Json files.
## Installation
* Use the package manager [pip](https://pip.pypa.io/en/stable/) to install Scrapy.
(Anaconda Recomended)
```
cd LinkedinScraperProject
pip install -r requirements.txt
```
* clone the project
```
git clone https://github.com/khaleddallah/GoogleImageScrapyDownloader.git
```
## Usage
* get into the directory of the project:
```
cd LinkedinScraperProject
```
* to get help :
```
python LinkedinScraper -h
```
usage:
python LinkedinScraper [-h] [-n NUM] [-o OUTPUT] [-p] [-f format] [-m excelMode] (searchUrl or profilesUrl)
positional arguments:
searchUrl URL of Linkedin search URL or Profiles URL
optional arguments:
-h, --help show this help message and exit
-n NUM num of profiles
** the number must be lower or equal of result number
'page' will parse profiles of url page (10 profiles) (Default)
-o OUTPUT Output file
-p Enable Parse Profiles
-f FORMAT json Json output file
excel Excel file output
all Json and Excel output files
-m EXCELMODE 1 to make each profile in Excel file appear in one row
m to make each profile in Excel file appear in multi row
## Examples
* Parse ( https://www.linkedin.com/in/khaled-dallah/ and https://www.linkedin.com/in/linustorvalds/ ) profiles and export the result content to ABC.xlsx and ABC.json
(-p) because of parsing single profiles
```
python LinkedinScraper -p -o 'ABC' 'https://www.linkedin.com/in/khaled-dallah/' 'https://www.linkedin.com/in/linustorvalds/'
```
* Parse 23 profiles of searchUrl [https://www.linkedin.com/.../?keywords=Robotic&...&](https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER)
if you don't set output name by (-o), Name of result files will be value of keywords (Robotic)
```
python LinkedinScraper -n 23 'https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER'
```
* Parse 17 profiles of searchUrl [https://www.linkedin.com/.../?keywords=Robotic&...&](https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER)
and get output as excel file and put the information of each profile in one row
```
python LinkedinScraper -n 17 -f excel -m 1 'https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER'
```
## Built with
* Python 3.7
* Scrapy
* openpyxl
## Author
* **Khaled Dallah** - *Software Engineer* | *Python/c++ Developer*
khaled.dallah0@gmail.com
## Issues:
Report bugs and feature requests
[here](https://github.com/khaleddallah/LinkedinScraperProject/issues).
## Contribute
Contributions are always welcome!
## License
This project is licensed under the LGPL-V3.0 License - see the [LICENSE.md](https://github.com/khaleddallah/LinkedinScraperProject/blob/master/LICENSE) file for details