Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/seemueller-io/toak
instantly tokenize a git repository
https://github.com/seemueller-io/toak
cli code code-assistant dev markdown prompt repository tokenize
Last synced: about 4 hours ago
JSON representation
instantly tokenize a git repository
- Host: GitHub
- URL: https://github.com/seemueller-io/toak
- Owner: seemueller-io
- License: other
- Created: 2024-12-12T22:21:29.000Z (29 days ago)
- Default Branch: main
- Last Pushed: 2025-01-09T22:10:36.000Z (1 day ago)
- Last Synced: 2025-01-09T22:32:07.773Z (1 day ago)
- Topics: cli, code, code-assistant, dev, markdown, prompt, repository, tokenize
- Language: TypeScript
- Homepage:
- Size: 734 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# toak
it's no joke[![npm version](https://img.shields.io/npm/v/toak)](https://www.npmjs.com/package/toak)
![Tests](https://github.com/seemueller-io/toak/actions/workflows/tests.yml/badge.svg)
[![License: AGPL v3](https://img.shields.io/badge/License-AGPL_v3-blue.svg)](https://www.gnu.org/licenses/agpl-3.0.html)## Overview
`toak` is an intentionally simple yet powerful tool that processes git repository files, cleans code, redacts sensitive information, and generates markdown documentation with token counts using the Llama 3 tokenizer.
```shell
$ cd your-git-repo
$ npx toak
```![toak](https://github.com/seemueller-io/toak/blob/471c2a359e342c0103d2074650afe1f1b2b5f71d/toak.jpg?raw=true)
## Philosophy
1. _Human-first_ technologies for a better future.
2. If you don't like the name...good.
---## Features
### Data Processing
- Reads tracked files from git repository
- Removes comments, imports, and unnecessary whitespace
- Redacts sensitive information (API keys, tokens, JWT, hashes)
- Counts tokens using llama3-tokenizer-js
- Supports nested .toak-ignore files### Token Cleaning
- Removes single-line and multi-line comments
- Strips console.log statements
- Removes import statements
- Cleans up whitespace and empty lines### Security Features
- Redacts API keys and secrets
- Masks JWT tokens
- Hides authorization tokens
- Redacts Base64 encoded strings
- Masks cryptographic hashes## Requirements
- Node.js (>=14.0.0)
- Git repository
- Bun runtime (for development)## Installation
```bash
npm install toak
```## Usage
### CLI
```bash
npx toak
```### Programmatic Usage
```typescript
import { MarkdownGenerator } from 'toak';const generator = new MarkdownGenerator({
dir: './project',
outputFilePath: './output.md',
verbose: true
});const result = await generator.createMarkdownDocument();
```## Configuration
### MarkdownGenerator Options
```typescript
interface MarkdownGeneratorOptions {
dir?: string; // Project directory (default: '.')
outputFilePath?: string; // Output file path (default: './prompt.md')
fileTypeExclusions?: Set;// File types to exclude
fileExclusions?: string[]; // File patterns to exclude
customPatterns?: Record; // Custom cleaning patterns
customSecretPatterns?: Record;// Custom redaction patterns
verbose?: boolean; // Enable verbose logging (default: true)
}
```### Ignore File Configuration
Create a `.toak-ignore` file in any directory to specify exclusions. The tool supports nested ignore files that affect their directory and subdirectories.
Example `.toak-ignore`:
```plaintext
# Ignore specific files
secrets.json
config.private.ts# Ignore directories
build/
temp/# Glob patterns
**/*.test.ts
**/._*
```#### Default Exclusions
The tool automatically excludes common file types and patterns:
File Types:
- Images: .jpg, .jpeg, .png, .gif, .bmp, .svg, .webp, etc.
- Fonts: .ttf, .woff, .woff2, .eot, .otf
- Binaries: .exe, .dll, .so, .dylib, .bin
- Archives: .zip, .tar, .gz, .rar, .7z
- Media: .mp3, .mp4, .avi, .mov, .wav
- Data: .db, .sqlite, .sqlite3
- Config: .lock, .yaml, .yml, .toml, .confFile Patterns:
- Configuration files: .*rc, tsconfig.json, package-lock.json
- Version control: .git*, .hg*, .svn*
- Environment files: .env*
- Build outputs: build/, dist/, out/
- Dependencies: node_modules/
- Documentation: docs/, README*, CHANGELOG*
- IDE settings: .idea/, .vscode/
- Test files: test/, spec/, __tests__/## Development
This project uses [Bun](https://bun.sh) for development. To contribute:
### Setup
```bash
git clone
cd toak
bun install
```### Scripts
```bash
# Build the project
bun run build# Run tests
bun test# Lint code
bun run lint# Fix linting issues
bun run lint:fix# Format code
bun run format# Fix all (format + lint)
bun run fix# Development mode
bun run dev# Publish development version
bun run deploy:dev
```### Project Structure
```
src/
├── index.ts # Main exports
├── TokenCleaner.ts # Code cleaning and redaction
├── MarkdownGenerator.ts # Markdown generation logic
├── cli.ts # CLI implementation
├── fileExclusions.ts # File exclusion patterns
└── fileTypeExclusions.ts # File type exclusions
```## Contributing
1. Fork the repository
2. Create a feature branch
3. Commit your changes
4. Push to the branch
5. Open a Pull Request### Guidelines
- Write TypeScript code following the project's style
- Include appropriate error handling
- Add documentation for new features
- Include tests for new functionality
- Update the README for significant changes## Note
This tool requires a git repository to function properly as it uses `git ls-files` to identify tracked files.
## License
### GNU AFFERO GENERAL PUBLIC LICENSE
Version 3, 19 November 2007
© 2024 Geoff Seemueller