Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mewim/githubstargazerscrawler
Get the user profile of all stargazers of a specified GitHub repository
https://github.com/mewim/githubstargazerscrawler
Last synced: 7 days ago
JSON representation
Get the user profile of all stargazers of a specified GitHub repository
- Host: GitHub
- URL: https://github.com/mewim/githubstargazerscrawler
- Owner: mewim
- License: unlicense
- Created: 2023-08-28T04:56:02.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2023-08-28T10:32:28.000Z (about 1 year ago)
- Last Synced: 2024-10-15T05:54:07.312Z (22 days ago)
- Language: Python
- Homepage:
- Size: 9.77 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# GitHubStargazersCrawler
A simple crawler to get the user profile of all stargazers of the specified GitHub repository via GitHub API. The crawler persists the data in a SQLite database## Usage
1. Install the requirements: `pip install -r requirements.txt`
1. Create a GitHub API token and store it as an environment variable: `export GITHUB_TOKEN=`
1. Initialize the database: `python init_db.py`
1. Run the crawler: `python crawler.py `, for example `python3 crawler.py kuzudb/kuzu`## Notes
- If the crawler is stopped, the current state will be saved in the SQLite database. It is possible to run the crawler multiple times for the same repository. It will automatically skip already crawled users and continue from the last saved state.
- The crawler will also automatically handle the GitHub API rate limit. If the rate limit is reached, the crawler will wait until the limit is reset and then continue the crawling process.
- It is possible to run the crawler for multiple repositories over the same database.
- It is possible to run the crawler without a GitHub API token, but the rate limit is much lower. Also, some information, such as the email address, will not be available without a token.
- The crawler will try to crawl each user only once. If an error occurs during crawling, the request will not be repeated. However, when restarting the crawler, the skipped users will be crawled again.