Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/anuvyklack/github-issues-processing
https://github.com/anuvyklack/github-issues-processing
Last synced: 9 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/anuvyklack/github-issues-processing
- Owner: anuvyklack
- Created: 2018-11-30T14:33:26.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2019-01-22T12:18:49.000Z (almost 6 years ago)
- Last Synced: 2024-10-29T21:13:24.240Z (about 2 months ago)
- Language: Python
- Size: 596 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# GitHub-Issues-Processing
This script uses the approach "bag with words". Each issue represented by a vector in vector space, where each unique word is an additional dimension in this vector space.
The tf-idf is used to calculate the weight of each term. The degree of similarity is the cosine of an angle between these vectors. This means, that the more two different issues consist of the same words, the higher their degree of similarity is.Instruction:
* First run gitHub-collect-issues.py script. Use your GitHub login/password, or otherwise GitHub API will block you.
* Then run duplicates.py script.### WARNING:
If you use two-factor authentication you need to create personal access token, and use it in the password field.
https://github.com/settings/tokens