Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/tienisto/pr-scraper
Github PR Scraper
https://github.com/tienisto/pr-scraper
Last synced: 27 days ago
JSON representation
Github PR Scraper
- Host: GitHub
- URL: https://github.com/tienisto/pr-scraper
- Owner: Tienisto
- Created: 2020-12-13T19:58:01.000Z (about 4 years ago)
- Default Branch: master
- Last Pushed: 2023-02-21T22:20:35.000Z (almost 2 years ago)
- Last Synced: 2024-12-07T20:41:43.288Z (about 1 month ago)
- Language: Kotlin
- Size: 91.8 KB
- Stars: 1
- Watchers: 1
- Forks: 5
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Github PR Scraper
Get useful information about pull requests of a repository## Getting Started
Using JAR:
1. Add `application.properties` file next to the jar:
```
# scraping config
scraping.repository=elastic/elasticsearch
scraping.auth=token e8c931efc42c74c2e4e027a2a82cc2cf2a58246b# database config
spring.datasource.url=jdbc:postgresql://localhost:5432/pr
spring.datasource.username=myuser
spring.datasource.password=123456
```2. Run JAR:
`java -jar prscraper-1.0.0.jar`
With custom path to properties file:
`java -jar prscraper-1.0.0.jar --spring.config.location=`
## Database Schema
The schema will be generated automatically for you.**pull_request**
column|type|nullable|description
---|---|---|---
id|BIGINT|NO|postgres id
gh_number|BIGINT|NO|github id uniquely accross issues and pull requests
title|TEXT|NO|pull request title
state|VARCHAR|NO|can be `open` or `closed`
author|TEXT|NO|author's github nickname
created_at|TIMESTAMPZ|NO|pull request creation timestamp**commit**
column|type|nullable|description
---|---|---|---
id|BIGINT|NO|postgres id
message|TEXT|NO|commit message
pull_request_id|BIGINT|NO|foreign key referencing a pull request
hash|VARCHAR|NO|git hash (sha)
hash_parent|VARCHAR|YES|git hash of parent commit (sha)
tree|VARCHAR|NO|hash of git tree of commit
author|TEXT|NO|author's github nickname
created_at|TIMESTAMPZ|NO|commit creation timestamp**comment**
column|type|nullable|description
---|---|---|---
id|BIGINT|NO|postgres id
message|TEXT|NO|comment body message
gh_id|BIGINT|NO|github id of comment
gh_reply_id|BIGINT|YES|github id of parent comment
pull_request_id|BIGINT|NO|foreign key referencing a pull request
commit_id|BIGINT|YES|foreign key referencing the corresponding commit
commit_fallback_id|BIGINT|YES|foreign key referencing the last commit of PR
hunk_diff|TEXT|YES|referenced hunk diff
hunk_file|TEXT|YES|file path of hunk diff
author|TEXT|NO|author's github nickname
created_at|TIMESTAMPZ|NO|comment creation timestamp**git_file**
column|type|nullable|description
---|---|---|---
id|BIGINT|NO|postgres id
file_path|TEXT|NO|file path
file_content|TEXT|NO|file content
commit_id|BIGINT|NO|foreign key referencing the corresponding commit**scraping_status**
column|type|nullable|description
---|---|---|---
key|VARCHAR|NO|status key
value|VARCHAR|NO|status valuePossible entries for **scraping_status**:
key|value|description
---|---|---
STAGE|`PULL_REQUESTS`, `COMMITS`, `COMMENTS`, `FILES`, `DONE`|the server will pickup the stage value and start at that stage