https://github.com/williamfzc/git-file-keyword
Extract keywords from git history for better understanding your code files. For human and LLM.
https://github.com/williamfzc/git-file-keyword
codebase git llm openai
Last synced: about 2 months ago
JSON representation
Extract keywords from git history for better understanding your code files. For human and LLM.
- Host: GitHub
- URL: https://github.com/williamfzc/git-file-keyword
- Owner: williamfzc
- License: apache-2.0
- Created: 2023-09-04T14:22:51.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2023-11-29T06:17:08.000Z (over 1 year ago)
- Last Synced: 2025-03-14T02:40:07.720Z (2 months ago)
- Topics: codebase, git, llm, openai
- Language: Python
- Homepage:
- Size: 65.4 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# git-file-keyword
> Auto-generate Code File Descriptions with Git History and LLM.
Extract keywords from git history for better understanding your code files. For human and LLM.
## What is it?
We use https://github.com/axios/axios.git for example.
With a simple command:
```bash
gfk --repo ./axios --include "**/*.js" --output_csv ./output.csv
```You can get a keywords list of all your code files, which is extracted from your git history:
These keywords can be used to guide developers/maintainers in understanding the potential functionality and history associated with these files.
And, also LLM. If you provide an openai key ...
```bash
gfk --repo ../axios --include "**/*.js" --output_csv ./output.csv --openai_key="sk-***"
```
You will see the human-readable descriptions for every source files. For example:
```text
lib/core/Axios.jsA core file that handles Axios configuration, interceptors, and request defaults.
```We used LLM as a keyword parser to analyze, organize, and summarize the functionality of each file, and present it in a human-readable format.
## Usage
```commandline
pip3 install git-file-keyword
```### In terminal
```commandline
gfk --repo ../axios --include "**/*.js" --output_csv ./output.csv --openai_key="sk-***"
```Of course, there will be a significant number of meaningless phrases in the commit records.
While we have utilized extensive existing stop-word libraries to address some of them, the same words may carry different meanings in different repositories, and there is no universal solution.So you can simply exclude them by adding `your_stopwords.txt` file:
```text
stop_word1
stop_word2
stop_word3
```And add it to your command:
```commandline
--stopword_txt your_stopwords.txt
```### As a lib
We provided some examples:
- [example/diff.py](example/diff.py): Get diff files and extract what they actually mean
- [example/stopword_extractor.py](example/stopword_extractor.py): Extract global keywords
- [git_file_keyword/cli/__init__.py](git_file_keyword/cli/__init__.py): Our cmd client## Motivation
- Automatic maintenance of an always up-to-date document.
- By extracting sufficient business context from the git history, git-file-keyword allows developers and LLMs to quickly understand the meaning behind each code file at a lower cost.
- Enable clear positive feedback loops within the team through the use of commit messages.## How it works?
gfk consists of 3 layers:
- Word extractor: extract words from git history and related platforms like JIRA
- Keyword finder: find the keywords from words
- LLM connector: prompt and communication with llm## Contribution
Issues and PRs are always welcome :)
## License
[Apache 2.0](LICENSE)