{"id":24103195,"url":"https://github.com/seemueller-io/toak","last_synced_at":"2025-06-11T20:37:24.939Z","repository":{"id":267866424,"uuid":"902600586","full_name":"seemueller-io/toak","owner":"seemueller-io","description":"instantly tokenize a git repository","archived":false,"fork":false,"pushed_at":"2025-01-09T22:10:36.000Z","size":752,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-01-09T22:32:07.773Z","etag":null,"topics":["cli","code","code-assistant","dev","markdown","prompt","repository","tokenize"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/seemueller-io.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-12T22:21:29.000Z","updated_at":"2025-01-09T22:30:34.000Z","dependencies_parsed_at":null,"dependency_job_id":"6dc3363e-6206-4a23-a51f-ad63e66b876b","html_url":"https://github.com/seemueller-io/toak","commit_stats":null,"previous_names":["seemueller-io/toak"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seemueller-io%2Ftoak","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seemueller-io%2Ftoak/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seemueller-io%2Ftoak/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/seemueller-io%2Ftoak/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/seemueller-io","download_url":"https://codeload.github.com/seemueller-io/toak/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241062557,"owners_count":19902910,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","code","code-assistant","dev","markdown","prompt","repository","tokenize"],"created_at":"2025-01-10T19:04:27.543Z","updated_at":"2025-06-11T20:37:24.933Z","avatar_url":"https://github.com/seemueller-io.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# toak\nit's no joke\n\n[![npm version](https://img.shields.io/npm/v/toak)](https://www.npmjs.com/package/toak)\n![Tests](https://github.com/seemueller-io/toak/actions/workflows/tests.yml/badge.svg)\n[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0.html)\n\n## Overview\n\n`toak` is an intentionally simple yet powerful tool that processes git repository files, cleans code, redacts sensitive information, and generates markdown documentation with token counts using the Llama 3 tokenizer.\n\n```shell\n$ cd your-git-repo\n$ npx toak\n```\n\n![toak](https://github.com/seemueller-io/toak/blob/471c2a359e342c0103d2074650afe1f1b2b5f71d/toak.jpg?raw=true)\n\n## Philosophy\n1. _Human-first_ technologies for a better future.\n2. If you don't like the name...good.\n---\n\n## Features\n\n### Data Processing\n- Reads tracked files from git repository\n- Removes comments, imports, and unnecessary whitespace\n- Redacts sensitive information (API keys, tokens, JWT, hashes)\n- Counts tokens using llama3-tokenizer-js\n- Supports nested .toak-ignore files\n\n### Token Cleaning\n- Removes single-line and multi-line comments\n- Strips console.log statements\n- Removes import statements\n- Cleans up whitespace and empty lines\n\n### Security Features\n- Redacts API keys and secrets\n- Masks JWT tokens\n- Hides authorization tokens\n- Redacts Base64 encoded strings\n- Masks cryptographic hashes\n\n## Requirements\n\n- Node.js (\u003e=14.0.0)\n- Git repository\n- Bun runtime (for development)\n\n## Installation\n\n```bash\nnpm install toak\n```\n\n## Usage\n\n### CLI\n```bash\nnpx toak\n```\n\n### Programmatic Usage\n\n```typescript\nimport { MarkdownGenerator } from 'toak';\n\nconst generator = new MarkdownGenerator({\n  dir: './project',\n  outputFilePath: './output.md',\n  verbose: true\n});\n\nconst result = await generator.createMarkdownDocument();\n```\n\n## Configuration\n\n### MarkdownGenerator Options\n\n```typescript\ninterface MarkdownGeneratorOptions {\n  dir?: string;                    // Project directory (default: '.')\n  outputFilePath?: string;         // Output file path (default: './prompt.md')\n  fileTypeExclusions?: Set\u003cstring\u003e;// File types to exclude\n  fileExclusions?: string[];      // File patterns to exclude\n  customPatterns?: Record\u003cstring, any\u003e;      // Custom cleaning patterns\n  customSecretPatterns?: Record\u003cstring, any\u003e;// Custom redaction patterns\n  verbose?: boolean;              // Enable verbose logging (default: true)\n}\n```\n\n### Ignore File Configuration\n\nCreate a `.toak-ignore` file in any directory to specify exclusions. The tool supports nested ignore files that affect their directory and subdirectories.\n\nExample `.toak-ignore`:\n```plaintext\n# Ignore specific files\nsecrets.json\nconfig.private.ts\n\n# Ignore directories\nbuild/\ntemp/\n\n# Glob patterns\n**/*.test.ts\n**/._*\n```\n\n#### Default Exclusions\n\nThe tool automatically excludes common file types and patterns:\n\nFile Types:\n- Images: .jpg, .jpeg, .png, .gif, .bmp, .svg, .webp, etc.\n- Fonts: .ttf, .woff, .woff2, .eot, .otf\n- Binaries: .exe, .dll, .so, .dylib, .bin\n- Archives: .zip, .tar, .gz, .rar, .7z\n- Media: .mp3, .mp4, .avi, .mov, .wav\n- Data: .db, .sqlite, .sqlite3\n- Config: .lock\n\nFile Patterns:\n- Configuration files: .*rc, tsconfig.json, package-lock.json\n- Version control: .git*, .hg*, .svn*\n- Environment files: .env*\n- Build outputs: build/, dist/, out/\n- Dependencies: node_modules/\n- Documentation: docs/, README*, CHANGELOG*\n- IDE settings: .idea/, .vscode/\n- Test files: test/, spec/, __tests__/\n\n## Development\n\nThis project uses [Bun](https://bun.sh) for development. To contribute:\n\n### Setup\n```bash\ngit clone \u003crepository\u003e\ncd toak\nbun install\n```\n\n### Scripts\n```bash\n# Build the project\nbun run build\n\n# Run tests\nbun test\n\n# Lint code\nbun run lint\n\n# Fix linting issues\nbun run lint:fix\n\n# Format code\nbun run format\n\n# Fix all (format + lint)\nbun run fix\n\n# Development mode\nbun run dev\n\n# Publish development version\nbun run deploy:dev\n```\n\n### Project Structure\n```\nsrc/\n├── index.ts              # Main exports\n├── TokenCleaner.ts       # Code cleaning and redaction\n├── MarkdownGenerator.ts  # Markdown generation logic\n├── cli.ts               # CLI implementation\n├── fileExclusions.ts    # File exclusion patterns\n└── fileTypeExclusions.ts # File type exclusions\n```\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Commit your changes\n4. Push to the branch\n5. Open a Pull Request\n\n### Guidelines\n- Write TypeScript code following the project's style\n- Include appropriate error handling\n- Add documentation for new features\n- Include tests for new functionality\n- Update the README for significant changes\n\n\n## Note\n\nThis tool requires a git repository to function properly as it uses `git ls-files` to identify tracked files.\n\n## License\n\n### GNU AFFERO GENERAL PUBLIC LICENSE\nVersion 3, 19 November 2007\n© 2024 Geoff Seemueller\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseemueller-io%2Ftoak","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fseemueller-io%2Ftoak","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fseemueller-io%2Ftoak/lists"}