{"id":19228126,"url":"https://github.com/unit-mesh/unit-gen","last_synced_at":"2025-10-16T11:18:16.818Z","repository":{"id":210545925,"uuid":"725993089","full_name":"unit-mesh/unit-gen","owner":"unit-mesh","description":"UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据：代码补全、测试生成、文档生成等。UnitGen is a code fine-tuning data framework that generates data from your existing codebase.","archived":false,"fork":false,"pushed_at":"2024-07-01T13:27:39.000Z","size":1323,"stargazers_count":54,"open_issues_count":0,"forks_count":11,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-01T07:54:21.740Z","etag":null,"topics":["data-engineering","evaluating","finetuning","llm"],"latest_commit_sha":null,"homepage":"https://gen.unitmesh.cc/","language":"Kotlin","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mpl-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/unit-mesh.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-12-01T09:59:03.000Z","updated_at":"2025-03-05T01:39:00.000Z","dependencies_parsed_at":"2024-11-09T15:28:30.185Z","dependency_job_id":"e4ae3c9d-e207-48f1-bcd4-9e0bf977a3ec","html_url":"https://github.com/unit-mesh/unit-gen","commit_stats":null,"previous_names":["unit-mesh/unit-eval","unit-mesh/unit-sets","unit-mesh/unit-gen"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unit-mesh%2Funit-gen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unit-mesh%2Funit-gen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unit-mesh%2Funit-gen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/unit-mesh%2Funit-gen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/unit-mesh","download_url":"https://codeload.github.com/unit-mesh/unit-gen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":249982560,"owners_count":21355719,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-engineering","evaluating","finetuning","llm"],"created_at":"2024-11-09T15:26:49.509Z","updated_at":"2025-10-16T11:18:11.751Z","avatar_url":"https://github.com/unit-mesh.png","language":"Kotlin","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/logo.svg\" width=\"160px\" height=\"160px\" alt=\"UnitGen Logo\"\u003e\n\u003c/p\u003e\n\u003ch1 align=\"center\"\u003eUnitGen\u003c/h1\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/unit-mesh/unit-gen/actions/workflows/build.yml\"\u003e\n    \u003cimg src=\"https://github.com/unit-mesh/unit-gen/actions/workflows/build.yml/badge.svg\" alt=\"CI/CD\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/unit-mesh/chocolate-factory\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/powered_by-chocolate_factory-blue?logo=kotlin\u0026logoColor=fff\" alt=\"Powered By\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://central.sonatype.com/artifact/cc.unitmesh/unit-picker\"\u003e\n    \u003cimg src=\"https://img.shields.io/maven-central/v/cc.unitmesh/unit-picker\"  alt=\"Maven\"/\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://openbayes.com/console/signup?r=phodal_uVxU\"\u003e\n    \u003cimg src=\"https://openbayes.com/img/badge-open-in-openbayes.svg\" alt=\"Open In OpenBayes\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://openbayes.com/console/signup?r=phodal_uVxU\"\u003e\n    \u003cimg src=\"https://openbayes.com/img/badge-built-with-openbayes.svg\" alt=\"Built with OpenBayes\" /\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/unit-mesh/unit-gen\"\u003e\n    \u003cimg src=\"https://codecov.io/gh/unit-mesh/unit-gen/branch/master/graph/badge.svg?token=nt22RX52DV\" alt=\"codecov\" /\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\u003e UnitGen 是一个用于生成微调代码的数据框架 —— 直接从你的代码库中生成微调数据：代码补全、测试生成、文档生成等。\n\nDocs: [https://gen.unitmesh.cc/](https://gen.unitmesh.cc/)\n\nThanks to [OpenBayes](https://openbayes.com/console/signup?r=phodal_uVxU) for providing computing resources.\n\nFinetune Model Examples:\n\n| name          | model download (HuggingFace)                                              | finetune Notebook                    | model download (OpenBayes)                                                          |\n|---------------|---------------------------------------------------------------------------|--------------------------------------|-------------------------------------------------------------------------------------|\n| DeepSeek 6.7B | [unit-mesh/autodev-coder](https://huggingface.co/unit-mesh/autodev-coder) | [finetune.ipynb](finetunes/deepseek) | [AutoDev Coder](https://openbayes.com/console/phodal/models/rCmer1KQSgp/9/overview) |\n\nLanguage support by [Chapi](https://github.com/phodal/chapi)\n\n- supported:\n    - [x] Java\n    - [x] Kotlin\n- doing:\n    - [x] TypeScript/JavaScript\n    - [x] Rust\n- future:\n    - [ ] Go\n    - [ ] Python\n    - [ ] C/C++\n    - [ ] C#\n    - [ ] Scala\n\nFeatures:\n\n- Code context\n  strategy: [Related code completion](https://gen.unitmesh.cc/instruction/related-code-completion), [Similar Code Completion](https://gen.unitmesh.cc/instruction/similar-code-completion)\n- Instruction Builder type: inline, block, after block, documentation, test gen\n- [Code quality](https://gen.unitmesh.cc/quality) filter and pipeline. Code smell, test smell, estimation and more.\n\n## Architecture\n\nLayered Architecture\n\n![Architecture](docs/architecture.svg)\n\nWorkflow\n\n![UnitGen Workflow](docs/workflow.svg)\n\n### Design Philosophy\n\n- Unique prompt. Integrated use of fine-tuning, evaluation, and tooling.\n- Code quality pipeline. With estimate with code complex, bad smell, test bad smell, and more rules.\n- Extendable customize quality thresholds. Custom rules, custom thresholds, custom quality type or more.\n\n### Unique Prompt\n\nKeep the same prompt: AutoDev \u003c-\u003e UnitGen \u003c-\u003e UnitEval\n\n#### AutoDev prompt\n\nAutoDev prompt template example:\n\n    Write unit test for following code.\n    \n    ${context.coc}\n    \n    ${context.framework}\n    \n    ${context.related_model}\n    \n    ```${context.language}\n    ${context.selection}\n    ```\n\n#### Unit Picker prompt\n\nUnit Picker prompt should keep the same structure as the AutoDev prompt. Prompt example:\n\n```kotlin\nInstruction(\n    instruction = \"Complete ${it.language} code, return rest code, no explaining\",\n    output = it.output,\n    input = \"\"\"\n    |```${it.language}\n    |${it.relatedCode}\n    |```\n    |\n    |Code:\n    |```${it.language}\n    |${it.beforeCursor}\n    |```\"\"\".trimMargin()\n)\n```\n\n#### UnitGen prompt\n\nUnitGen prompt should keep the same structure as the AutoDev prompt. Prompt example:\n\n    Complete ${language} code, return rest code, no explaining\n    \n    ```${language}\n    ${relatedCode}\n    ```\n    \n    Code:\n    ```${language}\n    ${beforeCursor}\n    ```\n\n### Code quality pipeline\n\n![Code Quality Workflow](docs/workflow.svg)\n\n### Extendable customize quality thresholds\n\nOptional quality type:\n\n```kotlin\nenum class CodeQualityType {\n    BadSmell,\n    TestBadSmell,\n    JavaController,\n    JavaRepository,\n    JavaService,\n}\n```\n\nCustom thresholds' config:\n\n```kotlin\ndata class BsThresholds(\n    val bsLongParasLength: Int = 5,\n    val bsIfSwitchLength: Int = 8,\n    val bsLargeLength: Int = 20,\n    val bsMethodLength: Int = 30,\n    val bsIfLinesLength: Int = 3,\n)\n```\n\nCustom rules:\n\n```kotlin\nval apis = apiAnalyser.toContainerServices()\nval ruleset = RuleSet(\n    RuleType.SQL_SMELL,\n    \"normal\",\n    UnknownColumnSizeRule(),\n    LimitTableNameLengthRule()\n    // more rules\n)\n\nval issues = WebApiRuleVisitor(apis).visitor(listOf(ruleset))\n// if issues are not empty, then the code has bad smell\n```\n\n## Quick Start\n\nfor examples, see: [examples](https://github.com/unit-mesh/unit-gen/tree/master/examples) folder\n\n### use CLI\n\nsee in [config-examples](https://github.com/unit-mesh/unit-gen/tree/master/examples/config-examples/)\n\ndownload the latest version from [GitHub Release](https://github.com/unit-mesh/unit-gen/releases)\n\n#### Generate Instructions\n\n1. config project by `processor.yml`\n2. run picker: `java -jar unit-gen.jar`\n\n### use Java API\n\nsee in [config-example](examples/project-example/)\n\n1.add dependency\n\n```groovy\ndependencies {\n    implementation(\"cc.unitmesh:unit-picker:0.1.5\")\n    implementation(\"cc.unitmesh:code-quality:0.1.5\")\n}\n```\n\n2.config the `unit-gen.yml` file and `connection.yml`\n\n3.write code\n\n```java\npublic class App {\n    public static void main(String[] args) {\n        List\u003cInstructionType\u003e builderTypes = new ArrayList\u003c\u003e();\n        builderTypes.add(InstructionType.RELATED_CODE_COMPLETION);\n\n        List\u003cCodeQualityType\u003e codeQualityTypes = new ArrayList\u003c\u003e();\n        codeQualityTypes.add(CodeQualityType.BadSmell);\n        codeQualityTypes.add(CodeQualityType.JavaService);\n\n        PickerOption pickerOption = new PickerOption(\n                \"https://github.com/unit-mesh/unit-gen-testing\", \"master\", \"java\",\n                \".\", builderTypes, codeQualityTypes, new BuilderConfig()\n        );\n\n        SimpleCodePicker simpleCodePicker = new SimpleCodePicker(pickerOption);\n        List\u003cInstruction\u003e output = simpleCodePicker.blockingExecute();\n\n        // handle output in here\n    }\n} \n```\n\n## Thanks to\n\n- abstract syntax tree: [Chapi](https://github.com/phodal/chapi). Used features: multiple language to same data\n  structure.\n- legacy system analysis: [Coca](https://github.com/phodal/coca). Inspired: Bad Smell, Test Bad Smell\n- architecture governance tool: [ArchGuard](https://github.com/archguard/archguard).\n  Used features: Estimation, Rule Lint (API, SQL)\n- code database [CodeDB](https://github.com/archguard/codedb). Used features: Code analysis pipeline\n\n## LICENSE\n\nThis code is distributed under the MPL 2.0 license. See `LICENSE` in this directory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funit-mesh%2Funit-gen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Funit-mesh%2Funit-gen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Funit-mesh%2Funit-gen/lists"}