https://github.com/samber/github-stackoverflow-email-scrapping
Scrape top Github and Stack-Overflow users to find email address
https://github.com/samber/github-stackoverflow-email-scrapping
Last synced: 11 months ago
JSON representation
Scrape top Github and Stack-Overflow users to find email address
- Host: GitHub
- URL: https://github.com/samber/github-stackoverflow-email-scrapping
- Owner: samber
- License: other
- Created: 2015-04-30T12:26:20.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2015-04-30T16:23:14.000Z (over 10 years ago)
- Last Synced: 2025-02-16T09:31:35.445Z (11 months ago)
- Language: Go
- Size: 2.44 MB
- Stars: 2
- Watchers: 4
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Github and Stack-Overflow email scrapping
=========================================
## WARNING
It's my first Go program, so please be clement ;-)
## HOW-TO
- Set your configuration in config.js (Postgresql DB + scrapping constants)
- Open src/app.go
- Comment/uncomment what to scrape :
```
scrape_repos() // use github search pages (ordering by stars and forks number)
scrape_repos_contributors() // MUST call scrape_repos() before !
scrape_repos_owner() // MUST call scrape_repos() before !
scrape_orga_members() // MUST call scrape_repos_owners() before !
```
- Then, compile and execute :
```
make vendor_get
make build
make run
```
## TODO-List
### Github:
- Check all errors and nil pointers
- Remove redundant writes in db
- Store in the db the origin of the scrapping (repo owner ? commiter ? organization member ?) and associate this information to the repo/organization ID
- Search repos with maximum stars and forks number to push the 100 pages limit
- Scrape only top contributors of a repo
- Build a "point" algo to detect top contributors. Example : 5 commit in a 10.000 stars and 1.000 commits repo = 0.05 * 1,000 * 10,000
### Others:
- Develop the stack-overflow equivalent