https://github.com/ankush-chander/github-crawler
Crawl information from github in friendly manner.
https://github.com/ankush-chander/github-crawler
human-resource-analytics web-crawling
Last synced: 10 months ago
JSON representation
Crawl information from github in friendly manner.
- Host: GitHub
- URL: https://github.com/ankush-chander/github-crawler
- Owner: Ankush-Chander
- License: mit
- Created: 2021-06-09T04:33:31.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2023-10-03T10:07:42.000Z (over 2 years ago)
- Last Synced: 2025-06-20T01:09:53.509Z (12 months ago)
- Topics: human-resource-analytics, web-crawling
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[![Contributors][contributors-shield]][contributors-url]
[![Forks][forks-shield]][forks-url]
[![Stargazers][stars-shield]][stars-url]
[![Issues][issues-shield]][issues-url]
[![MIT License][license-shield]][license-url]
[![LinkedIn][linkedin-shield]][linkedin-url]
github-crawler
Friendly github crawler.
# Setup
1. Install requirements
```
pip install -r requirement.txt
```
2. Update source url as per your need in `github/github/spiders/github-user.py`
```
def start_requests(self):
urls = [
"your search url here"
]
```
## For CSV (default)
Set folllowing variables in `settings.py`
```
ITEM_PIPELINES = {
'GithubCsvPipeline': 300,
}
```
## For Elasticsearch
Set folllowing variables in `settings.py`
```
ELASTICSEARCH_HOST = ''
ELASTICSEARCH_PORT = 9200
ITEM_PIPELINES = {
'GithubElasticsearchPipeline': 300,
}
```
Note: This option requires index to be already created in the elasticsearch server
## For Google sheet:
1. Set folllowing variables in `settings.py`
```
GOOGLE_SHEET =""
ITEM_PIPELINES = {
'github.pipeline.GithubExcelPipeline': 300,
}
```
2. Store googleapi credentials in `utility/gsheets_credentials.json`
Note: This option requires an existing google sheet with permissions "Editable by anyone who has link"
# Run instructions
```
cd github
scrapy crawl github-user-search
```
[contributors-shield]: https://img.shields.io/github/contributors/Ankush-Chander/github-crawler.svg?style=for-the-badge
[contributors-url]: https://github.com/Ankush-Chander/github-crawler/graphs/contributors
[forks-shield]: https://img.shields.io/github/forks/Ankush-Chander/github-crawler.svg?style=for-the-badge
[forks-url]: https://github.com/Ankush-Chander/github-crawler/network/members
[stars-shield]: https://img.shields.io/github/stars/Ankush-Chander/github-crawler.svg?style=for-the-badge
[stars-url]: https://github.com/Ankush-Chander/github-crawler/stargazers
[issues-shield]: https://img.shields.io/github/issues/Ankush-Chander/github-crawler.svg?style=for-the-badge
[issues-url]: https://github.com/Ankush-Chander/github-crawler/issues
[license-shield]: https://img.shields.io/github/license/Ankush-Chander/github-crawler.svg?style=for-the-badge
[license-url]: https://github.com/Ankush-Chander/github-crawler/blob/main/LICENSE.txt
[linkedin-shield]: https://img.shields.io/badge/-LinkedIn-black.svg?style=for-the-badge&logo=linkedin&colorB=555
[linkedin-url]: https://www.linkedin.com/in/ankush-chander-8248a876/