{"id":26245523,"url":"https://github.com/timakin/llmstxt-gen","last_synced_at":"2025-06-15T02:10:09.627Z","repository":{"id":279928747,"uuid":"940471691","full_name":"timakin/llmstxt-gen","owner":"timakin","description":null,"archived":false,"fork":false,"pushed_at":"2025-04-13T15:50:44.000Z","size":2523,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-13T16:40:36.189Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/timakin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-02-28T08:32:44.000Z","updated_at":"2025-04-13T15:49:46.000Z","dependencies_parsed_at":"2025-04-13T16:29:39.620Z","dependency_job_id":"c2ed695f-8598-435c-b6b4-d8bf4a13e80d","html_url":"https://github.com/timakin/llmstxt-gen","commit_stats":null,"previous_names":["timakin/llmstxt-gen"],"tags_count":5,"template":false,"template_full_name":null,"purl":"pkg:github/timakin/llmstxt-gen","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timakin%2Fllmstxt-gen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timakin%2Fllmstxt-gen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timakin%2Fllmstxt-gen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timakin%2Fllmstxt-gen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/timakin","download_url":"https://codeload.github.com/timakin/llmstxt-gen/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/timakin%2Fllmstxt-gen/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259910742,"owners_count":22930713,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-03-13T12:35:31.066Z","updated_at":"2025-06-15T02:10:09.615Z","avatar_url":"https://github.com/timakin.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# llmstxt-gen: HTML to LLMsTXT Converter\n\n[![CI](https://github.com/timakin/llmstxt-gen/actions/workflows/ci.yml/badge.svg)](https://github.com/timakin/llmstxt-gen/actions/workflows/ci.yml)\n[![Release](https://github.com/timakin/llmstxt-gen/actions/workflows/release.yml/badge.svg)](https://github.com/timakin/llmstxt-gen/actions/workflows/release.yml)\n\nThis tool converts HTML files into the LLMsTXT format, designed to make website content more accessible to Large Language Models (LLMs). It extracts readable content from HTML using the `go-readability` library.\n\n## Overview\n\nThe LLMsTXT format is a standardized way to provide information to help LLMs understand and utilize website content effectively. This tool processes HTML files, either by scanning a directory or using a sitemap, extracts the main textual content, and formats it according to the LLMsTXT specification.\n\n## Installation\n\n### Option 1: Download pre-built binaries (Recommended)\n\nDownload the pre-built binaries from the [Releases](https://github.com/timakin/llmstxt-gen/releases) page.\n\n### Option 2: Install with go install\n\n#### From GitHub\n\n```bash\n# Install the latest version\ngo install github.com/timakin/llmstxt-gen@latest\n\n# The binary will be installed to your $GOPATH/bin directory\n# Make sure $GOPATH/bin is in your PATH\n```\n\n#### From local repository\n\n```bash\n# Clone the repository\ngit clone git@github.com:timakin/llmstxt-gen.git\ncd llmstxt-gen\n\n# Install the tool\ngo install .\n\n# The binary will be installed to your $GOPATH/bin directory\n# Make sure $GOPATH/bin is in your PATH\n```\n\n### Option 3: Clone and Build\n\n```bash\n# Clone the repository\ngit clone https://github.com/timakin/llmstxt-gen.git\ncd llmstxt-gen\n\n# Build the tool\ngo build -o llmstxt-gen\n```\n\n## Usage\n\n### Basic Usage (Scanning a Directory)\n\n```bash\n# Generate llms.txt from HTML files in ./public directory\nllmstxt-gen --html-dir ./public --output-file ./llms.txt\n\n# Customize the project name\nllmstxt-gen --html-dir ./public --output-file ./llms.txt --project-name \"My Website\"\n\n# Enable verbose logging\nllmstxt-gen --html-dir ./public --output-file ./llms.txt --verbose\n```\n\n### Using a Sitemap\n\n```bash\n# Generate llms.txt using a sitemap.xml located in ./public\n# Assumes HTML files are also in ./public and sitemap URLs map accordingly\nllmstxt-gen --html-dir ./public --sitemap ./public/sitemap.xml --output-file ./llms.txt\n\n# Specify a different project name with sitemap\nllmstxt-gen --html-dir ./public --sitemap ./public/sitemap.xml --output-file ./llms.txt --project-name \"My Blog\"\n```\n\n### Command-line Options\n\n- `--html-dir`: Input directory containing HTML files (default: \"./html\"). This directory is scanned if `--sitemap` is not provided. It's also used to find local files corresponding to sitemap URLs.\n- `--sitemap`: Path to the sitemap XML file (optional). If provided, only URLs listed in the sitemap will be processed.\n- `--output-file`: Output file path (default: \"./llms.txt\").\n- `--project-name`: Project name for the LLMsTXT output (default: \"Documentation\").\n- `--verbose`: Enable verbose logging.\n- `--version`, `-v`: Display version information.\n\n## How It Works\n\n1.  **Input Source Determination**: Checks if a `--sitemap` path is provided.\n2.  **File List Generation**:\n    *   **Sitemap Mode**: Parses the sitemap, extracts URLs, and attempts to map each URL to a corresponding local HTML file within the `--html-dir`.\n    *   **Directory Scan Mode**: Recursively scans the `--html-dir` for `.html` and `.htm` files.\n3.  **Content Extraction**: For each identified HTML file:\n    *   Opens the file.\n    *   Uses `go-readability` (`github.com/mackee/go-readability`) to extract the main readable content (title, plain text content, excerpt).\n4.  **Formatting**: Organizes the extracted content (title, URL, excerpt, full text) into sections based on the directory structure relative to `--html-dir`. Formats the collected information according to the LLMsTXT specification.\n5.  **Output**: Writes the formatted content to the specified `--output-file`.\n\n## LLMsTXT Format\n\nThe LLMsTXT format includes:\n\n1.  An H1 with the name of the project or site (required).\n2.  A blockquote with a short summary of the project.\n3.  Markdown sections with detailed information, often grouped by topic or directory structure.\n4.  Sections delimited by H2 headers containing file lists (links to specific pages/documents) with summaries.\n5.  Detailed content for each page under H3 headers.\n\nFor more information about the LLMsTXT format, see [llmstxt.org](https://llmstxt.org/).\n\n## Release Process\n\nThis project uses [GoReleaser](https://goreleaser.com/) to automate the release process. Here's how to create a new release:\n\n1.  Make sure all your changes are committed and pushed to the repository.\n2.  Create and push a new tag with the version number:\n    ```bash\n    git tag -a vX.Y.Z -m \"Release vX.Y.Z\"\n    git push origin vX.Y.Z\n    ```\n3.  The GitHub Actions workflow will automatically build and publish the release.\n\nYou can also test the release process locally without publishing:\n\n```bash\n# Install GoReleaser if you haven't already\n# go install github.com/goreleaser/goreleaser@latest\n\n# Test the release process (dry run)\ngoreleaser release --snapshot --clean --skip=publish\n```\n\nThis will create a release in the `dist/` directory without publishing it to GitHub.\n\n## License\n\nThis project is licensed under the Apache License 2.0. See the [LICENSE](LICENSE) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimakin%2Fllmstxt-gen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimakin%2Fllmstxt-gen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimakin%2Fllmstxt-gen/lists"}