{"id":31750143,"url":"https://github.com/guilt/neuron-kernel-generator","last_synced_at":"2026-05-09T16:41:31.212Z","repository":{"id":316685674,"uuid":"1064434541","full_name":"guilt/Neuron-Kernel-Generator","owner":"guilt","description":"A modular C++ tool for generating HLO and NEFF files from GGML/llama.cpp kernels for AWS Neuron (Inferentia/Trainium).","archived":false,"fork":false,"pushed_at":"2025-09-26T03:28:05.000Z","size":13,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-26T05:41:32.593Z","etag":null,"topics":["aws","ggml","kernel","neuron"],"latest_commit_sha":null,"homepage":"https://karthikkumar.org","language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/guilt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-09-26T03:13:44.000Z","updated_at":"2025-09-26T03:28:09.000Z","dependencies_parsed_at":"2025-09-28T13:31:24.378Z","dependency_job_id":null,"html_url":"https://github.com/guilt/Neuron-Kernel-Generator","commit_stats":null,"previous_names":["guilt/neuron-kernel-generator"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/guilt/Neuron-Kernel-Generator","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuron-Kernel-Generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuron-Kernel-Generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuron-Kernel-Generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuron-Kernel-Generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/guilt","download_url":"https://codeload.github.com/guilt/Neuron-Kernel-Generator/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/guilt%2FNeuron-Kernel-Generator/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279001655,"owners_count":26083147,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","ggml","kernel","neuron"],"created_at":"2025-10-09T15:26:43.576Z","updated_at":"2025-10-09T15:26:45.025Z","avatar_url":"https://github.com/guilt.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Neuron Kernel Generator\n\nA modular C++ tool for generating HLO and NEFF files from GGML/llama.cpp kernels for AWS Neuron (Inferentia/Trainium).\n\n## Features\n\n- **Modular Architecture**: Clean base class with extensible kernel implementations\n- **GGML Support**: Essential kernels for llama.cpp backend development\n- **Mock Testing**: Build and test without full Neuron SDK installation\n- **Docker Ready**: Designed for Neuron SDK Docker environment\n- **Automated Generation**: Scripts to generate all kernels with organized output\n\n## Available Kernels\n\n### Arithmetic Operations\n- `add` - Element-wise addition\n- `mul` - Element-wise multiplication  \n- `sub` - Element-wise subtraction\n\n### Matrix Operations\n- `matmul` - Matrix multiplication\n- `transpose` - Matrix transpose\n- `reshape` - Tensor reshape\n\n### Activation Functions\n- `relu` - Rectified Linear Unit\n- `gelu` - Gaussian Error Linear Unit\n- `silu` - Sigmoid Linear Unit (Swish)\n- `softmax` - Softmax normalization\n\n## Quick Start\n\n### Generate All Kernels (Recommended)\n```bash\n# Local generation\n./generate_all_kernels.sh trn1\n\n# Docker generation\n./build_and_generate.sh trn1 true\n\n# Docker Compose\ndocker-compose up\n```\n\n### Manual Build and Generate\n```bash\nmkdir build \u0026\u0026 cd build\ncmake ..\nmake\n\n# Generate individual kernels\n./kernel-generator add 1024 1024 my_add_kernel\n./kernel-generator matmul 512 512 llama_matmul\n```\n\n## Output Structure\n\n```\noutput/\n├── generic/hlo/          # HLO files (device-independent)\n│   ├── add.hlo\n│   ├── matmul.hlo\n│   └── ...\n└── trn1/neff/           # NEFF files (device-specific)\n    ├── add.neff\n    ├── matmul.neff\n    └── ...\n```\n\n## Docker Usage (Production)\n\n### Build and Run\n```bash\n# Build image\ndocker build -t neuron-kernel-generator .\n\n# Generate for trn1\ndocker run --rm -v $(pwd)/output:/workspace/output neuron-kernel-generator\n\n# Generate for inf2\ndocker run --rm -v $(pwd)/output:/workspace/output neuron-kernel-generator ./generate_all_kernels.sh inf2\n```\n\n### Docker Compose\n```bash\n# Generate for trn1\ndocker-compose up neuron-kernel-generator\n\n# Generate for inf2\ndocker-compose up neuron-kernel-generator-inf2\n```\n\n## Architecture\n\n```\nkernel-generator.cpp          # Main entry point\nbase_kernel.h/cpp            # Abstract base class\nmock_xla.h                   # Mock XLA for testing\nkernels/\n├── arithmetic_kernels.h/cpp # Add, mul, sub\n├── matrix_kernels.h/cpp     # MatMul, transpose, reshape\n└── activation_kernels.h/cpp # ReLU, GELU, SiLU, softmax\nscripts/\n├── generate_all_kernels.sh  # Generate all kernels\n├── build_and_generate.sh    # Build and generate wrapper\n└── docker-compose.yml       # Docker orchestration\n```\n\n## Adding New Kernels\n\n1. **Create kernel class**:\n```cpp\nclass MyKernel : public BaseKernel {\npublic:\n    MyKernel(const std::vector\u003cint64_t\u003e\u0026 shape) : BaseKernel(shape, \"my_kernel\") {}\nprotected:\n    std::unique_ptr\u003cxla::HloModule\u003e build_hlo() override;\n};\n```\n\n2. **Register in factory**:\n```cpp\n{\"my_kernel\", [](const std::vector\u003cint64_t\u003e\u0026 shape) { \n    return std::make_unique\u003cMyKernel\u003e(shape); \n}}\n```\n\n3. **Add to generation script**:\n```bash\n# In generate_all_kernels.sh\nKERNELS=\"... my_kernel:1024,1024\"\n```\n\n## Output Files\n\n- `output/generic/hlo/\u003ckernel\u003e.hlo` - HLO intermediate representation\n- `output/\u003cdevice\u003e/neff/\u003ckernel\u003e.neff` - Neuron Executable File Format\n\n## Requirements\n\n- **Development**: CMake 3.16+, C++17\n- **Production**: AWS Neuron SDK with neuronx-cc compiler\n- **Testing**: Mock XLA headers (included)\n- **Docker**: Docker Engine for containerized builds\n\n## GGML Backend Integration\n\nThis tool generates the NEFF files needed for a GGML Neuron backend. Each kernel corresponds to GGML operations used in llama.cpp.\n\nGenerated files can be directly integrated into your GGML backend by loading the appropriate NEFF files for your target device.\n\n## License\n\n[MIT License](LICENSE.md).\n\n\n## Thank You and Feedback\n\nReach out to us for any feedback or contributions.\n\nNow Enjoy!\n\n* Author: Karthik Kumar Viswanathan\n* Web   : https://karthikkumar.org\n* Email : me@karthikkumar.org\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguilt%2Fneuron-kernel-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fguilt%2Fneuron-kernel-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fguilt%2Fneuron-kernel-generator/lists"}