https://github.com/williamfzc/git-file-keyword
Extract keywords from git history for better understanding your code files. For human and LLM.
https://github.com/williamfzc/git-file-keyword
codebase git llm openai
Last synced: 7 months ago
JSON representation
Extract keywords from git history for better understanding your code files. For human and LLM.
- Host: GitHub
- URL: https://github.com/williamfzc/git-file-keyword
- Owner: williamfzc
- License: apache-2.0
- Created: 2023-09-04T14:22:51.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2023-11-29T06:17:08.000Z (almost 2 years ago)
- Last Synced: 2025-03-14T02:40:07.720Z (8 months ago)
- Topics: codebase, git, llm, openai
- Language: Python
- Homepage:
- Size: 65.4 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# git-file-keyword
> Auto-generate Code File Descriptions with Git History and LLM.
Extract keywords from git history for better understanding your code files. For human and LLM.
## What is it?
We use https://github.com/axios/axios.git for example.
With a simple command:
```bash
gfk --repo ./axios --include "**/*.js" --output_csv ./output.csv
```
You can get a keywords list of all your code files, which is extracted from your git history:

These keywords can be used to guide developers/maintainers in understanding the potential functionality and history associated with these files.
And, also LLM. If you provide an openai key ...
```bash
gfk --repo ../axios --include "**/*.js" --output_csv ./output.csv --openai_key="sk-***"
```

You will see the human-readable descriptions for every source files. For example:
```text
lib/core/Axios.js
A core file that handles Axios configuration, interceptors, and request defaults.
```
We used LLM as a keyword parser to analyze, organize, and summarize the functionality of each file, and present it in a human-readable format.
## Usage
```commandline
pip3 install git-file-keyword
```
### In terminal
```commandline
gfk --repo ../axios --include "**/*.js" --output_csv ./output.csv --openai_key="sk-***"
```
Of course, there will be a significant number of meaningless phrases in the commit records.
While we have utilized extensive existing stop-word libraries to address some of them, the same words may carry different meanings in different repositories, and there is no universal solution.
So you can simply exclude them by adding `your_stopwords.txt` file:
```text
stop_word1
stop_word2
stop_word3
```
And add it to your command:
```commandline
--stopword_txt your_stopwords.txt
```
### As a lib
We provided some examples:
- [example/diff.py](example/diff.py): Get diff files and extract what they actually mean
- [example/stopword_extractor.py](example/stopword_extractor.py): Extract global keywords
- [git_file_keyword/cli/__init__.py](git_file_keyword/cli/__init__.py): Our cmd client
## Motivation
- Automatic maintenance of an always up-to-date document.
- By extracting sufficient business context from the git history, git-file-keyword allows developers and LLMs to quickly understand the meaning behind each code file at a lower cost.
- Enable clear positive feedback loops within the team through the use of commit messages.
## How it works?
gfk consists of 3 layers:
- Word extractor: extract words from git history and related platforms like JIRA
- Keyword finder: find the keywords from words
- LLM connector: prompt and communication with llm
## Contribution
Issues and PRs are always welcome :)
## License
[Apache 2.0](LICENSE)