{"id":16391740,"url":"https://github.com/williamfzc/srctag","last_synced_at":"2025-03-23T04:31:44.983Z","repository":{"id":208225072,"uuid":"721108949","full_name":"williamfzc/srctag","owner":"williamfzc","description":"Tag source files with real-world stories.","archived":false,"fork":false,"pushed_at":"2024-01-30T09:20:20.000Z","size":202,"stargazers_count":3,"open_issues_count":3,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-03-18T07:04:36.411Z","etag":null,"topics":["code-analysis","code-understanding","git","llm"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/williamfzc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-20T11:26:03.000Z","updated_at":"2024-07-01T22:00:28.000Z","dependencies_parsed_at":"2024-10-28T15:25:10.826Z","dependency_job_id":"74c95bb9-cc59-44c3-b4cb-fac7f2168ff0","html_url":"https://github.com/williamfzc/srctag","commit_stats":null,"previous_names":["williamfzc/srctag"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/williamfzc%2Fsrctag","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/williamfzc%2Fsrctag/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/williamfzc%2Fsrctag/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/williamfzc%2Fsrctag/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/williamfzc","download_url":"https://codeload.github.com/williamfzc/srctag/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245056889,"owners_count":20553855,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["code-analysis","code-understanding","git","llm"],"created_at":"2024-10-11T04:47:12.842Z","updated_at":"2025-03-23T04:31:44.400Z","avatar_url":"https://github.com/williamfzc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# srctag\n\n[![PyPI version](https://badge.fury.io/py/srctag.svg)](https://badge.fury.io/py/srctag)\n[![Smoke Test](https://github.com/williamfzc/srctag/actions/workflows/python-package.yml/badge.svg)](https://github.com/williamfzc/srctag/actions/workflows/python-package.yml)\n\nTag source files with real-world stories.\n\n## What' s it?\n\nBased on user-provided tag lists, srctag associates files with relevant tags and provides a measure of relevance by\nmining the commits.\n\nFor example, axios is a famous JavaScript library. We can extract some features (tags) it provides from the README and\npass them to srctag:\n\n| File                                            | XMLHttpRequests | HTTP requests (Node.js) | Promise API support | Request/response interception | Request/response data transformation | Request cancellation | Automatic JSON data transforms | Automatic serialization of data objects | Client-side XSRF protection |\n|-------------------------------------------------|-----------------|-------------------------|---------------------|-------------------------------|--------------------------------------|----------------------|--------------------------------|-----------------------------------------|-----------------------------|\n| lib/adapters/http.js                            | 1               | 1                       | 1                   | 1                             | 1                                    | 1                    | 1                              | 1                                       | 1                           |\n| lib/adapters/xhr.js                             | 0.980769231     | 0.981132075             | 0.980769231         | 0.979591837                   | 0.981132075                          | 0.98                 | 0.98                           | 0.960784314                             | 0.981132075                 |\n| lib/utils.js                                    | 0.961538462     | 0.962264151             | 0.961538462         | 0.959183673                   | 0.962264151                          | 0.94                 | 0.96                           | 0.980392157                             | 0.962264151                 |\n| lib/platform/browser/index.js                   | 0.942307692     | 0.924528302             | 0.846153846         | 0.795918367                   | 0.830188679                          | 0.72                 | 0.84                           | 0.705882353                             | 0.924528302                 |\n| lib/helpers/buildURL.js                         | 0.923076923     | 0.867924528             | 0.884615385         | 0.836734694                   | 0.811320755                          | 0.88                 | 0.82                           | 0.843137255                             | 0.773584906                 |\n| lib/core/dispatchRequest.js                     | 0.903846154     | 0.943396226             | 0.903846154         | 0.897959184                   | 0.905660377                          | 0.96                 | 0.86                           | 0.882352941                             | 0.886792453                 |\n| lib/helpers/toFormData.js                       | 0.884615385     | 0.905660377             | 0.923076923         | 0.857142857                   | 0.943396226                          | 0.78                 | 0.94                           | 0.941176471                             | 0.867924528                 |\n| lib/axios.js                                    | 0.865384615     | 0.773584906             | 0.942307692         | 0.918367347                   | 0.924528302                          | 0.9                  | 0.92                           | 0.901960784                             | 0.943396226                 |\n| lib/defaults/index.js                           | 0.846153846     | 0.830188679             | 0.826923077         | 0.87755102                    | 0.886792453                          | 0.86                 | 0.9                            | 0.862745098                             | 0.849056604                 |\n| lib/core/Axios.js                               | 0.826923077     | 0.886792453             | 0.865384615         | 0.93877551                    | 0.867924528                          | 0.92                 | 0.88                           | 0.921568627                             | 0.830188679                 |\n| lib/core/AxiosError.js                          | 0.807692308     | 0.849056604             | 0.673076923         | 0.816326531                   | 0.773584906                          | 0.84                 | 0.78                           | 0.803921569                             | 0.811320755                 |\n| lib/helpers/parseHeaders.js                     | 0.788461538     | 0.811320755             | 0.653846154         | 0.551020408                   | 0.452830189                          | 0.68                 | 0.4                            | 0.450980392                             | 0.339622642                 |\n| lib/helpers/isURLSameOrigin.js                  | 0.769230769     | 0.698113208             | 0.403846154         | 0.571428571                   | 0.641509434                          | 0                    | 0.7                            | 0.352941176                             | 0.905660377                 |\n| lib/platform/node/index.js                      | 0.75            | 0.735849057             | 0.788461538         | 0.653061224                   | 0.735849057                          | 0.44                 | 0.64                           | 0.529411765                             | 0.735849057                 |\n| lib/platform/browser/classes/FormData.js        | 0.730769231     | 0.716981132             | 0.711538462         | 0.428571429                   | 0.716981132                          | 0                    | 0.56                           | 0.078431373                             | 0.698113208                 |\n| lib/helpers/fromDataURI.js                      | 0.711538462     | 0.754716981             | 0.769230769         | 0.428571429                   | 0.509433962                          | 0.34                 | 0.42                           | 0.078431373                             | 0.679245283                 |\n| lib/platform/index.js                           | 0.692307692     | 0.660377358             | 0.519230769         | 0.367346939                   | 0.566037736                          | 0.44                 | 0.5                            | 0.529411765                             | 0.641509434                 |\n| lib/platform/browser/classes/URLSearchParams.js | 0.673076923     | 0.641509434             | 0.807692308         | 0.591836735                   | 0.698113208                          | 0.44                 | 0.74                           | 0.764705882                             | 0.509433962                 |\n| lib/helpers/cookies.js                          | 0.653846154     | 0.679245283             | 0.692307692         | 0.306122449                   | 0.641509434                          | 0.42                 | 0.68                           | 0.352941176                             | 0.79245283                  |\n| lib/core/transformData.js                       | 0.634615385     | 0.79245283              | 0.75                | 0.734693878                   |\n\nThen we can obtain the relevance of each code file with these tags. You can choose your preferred format to process this\ndata: CSV, pandas, or even networkx with Graphviz.\n\n![my_graph](https://github.com/williamfzc/srctx/assets/13421694/f6d239b4-a1cc-42f4-bfb0-38bf6421505f)\n\n## How to use?\n\n### Installation\n\nRequires Python 3.8 or later and the sentence-transformers library.\n\n```shell\n# For full installation with dependencies\npip install \"srctag[embedding]\"\n\n# For manual installation of sentence-transformers\npip install srctag\n```\n\n### Use as LIB\n\nYou can check the links below for more detailed information:\n\n- [examples](./examples)\n- [test cases](./tests)\n\n```python\nimport pathlib\nimport sys\nimport warnings\n\nimport networkx\n\nfrom srctag.collector import Collector\nfrom srctag.storage import Storage\nfrom srctag.tagger import Tagger\n\naxios_repo = pathlib.Path(__file__).parent.parent / \"axios\"\nif not axios_repo.is_dir():\n    warnings.warn(f\"clone axios to {axios_repo} first\")\n    sys.exit(0)\n\ncollector = Collector()\ncollector.config.repo_root = axios_repo\ncollector.config.max_depth_limit = -1\ncollector.config.include_regex = r\"lib.*\"\n\nctx = collector.collect_metadata()\nstorage = Storage()\nstorage.embed_ctx(ctx)\ntagger = Tagger()\ntagger.config.tags = [\n    \"XMLHttpRequests from browser\",\n    \"HTTP requests from node.js\",\n    \"Promise API support\",\n    \"Request and response interception\",\n    \"Request and response data transformation\",\n    \"Request cancellation\",\n    \"Automatic JSON data transforms\",\n    \"Automatic serialization of data objects\",\n    \"Client-side XSRF protection\"\n]\ntag_result = tagger.tag(storage)\n\n# access the pandas.DataFrame\nprint(tag_result.scores_df)\n\n# csv dump\ntag_result.export_csv()\n\n# dot file dump\ngraph = tag_result.export_networkx()\nnetworkx.drawing.nx_pydot.write_dot(graph, sys.stdout)\n```\n\n### Use as CLI\n\n```shell\n➜  examples git:(main) ✗ srctag tag --help\nUsage: srctag tag [OPTIONS]\n\n  tag your repo\n\nOptions:\n  --repo-root TEXT             Repository root directory\n  --max-depth-limit INTEGER    Maximum depth limit\n  --include-regex TEXT         File include regex pattern\n  --tags-file FILENAME         Path to a text file containing tags\n  --output-path TEXT           Output file path for CSV\n  --file-level TEXT            Scan file level, FILE or DIR, default to FILE\n  --st-model TEXT              Sentence Transformer Model\n  --commit-include-regex TEXT  Commit message include regex pattern\n  --help                       Show this message and exit.\n```\n\n## Goal \u0026 Motivation\n\n### Diff Analysis\n\nThis project was initially created to address the following issue. In complex business projects, there are often\nnumerous modules with many contributors. The tight coupling between modules can easily lead to changes affecting each\nother among developers. Detecting such issues through code review is time-consuming, labor-intensive, and prone to\noversights.\n\nWe aim to help evaluate the potential impact of a change on various functionalities, guiding subsequent testing efforts.\n\nAlso we have a WIP Github Actions project for supporting PR evaluations:\nhttps://github.com/williamfzc/srctag-action\n\n### API for LLM\n\nWith the rise of large language models (LLMs), many teams are considering how to make LLMs understand the entire\ncodebase.\nFrom the current progress, LLMs can understand details at the code implementation level well, but their understanding of\nthe business functionalities they represent is limited.\n\nWe also hope to use this approach to enable LLMs to establish associations between code files and specific business\nfunctionalities at a lower cost, enhancing their overall understanding of the code repository.\n\n## How it actually works?\n\n- Collector: Collects sufficient metadata from the code repository, such as commit messages.\n- Storage: Organizes this metadata and embeds it into a vector database in an appropriate form.\n- Tagger: Searches for relevant files based on the existing tag list and further establishes associations.\n\n## License\n\n[Apache 2.0](LICENSE)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwilliamfzc%2Fsrctag","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwilliamfzc%2Fsrctag","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwilliamfzc%2Fsrctag/lists"}