{"id":24850245,"url":"https://github.com/bitnom/path2md","last_synced_at":"2025-10-08T06:39:50.148Z","repository":{"id":271795002,"uuid":"914589621","full_name":"bitnom/path2md","owner":"bitnom","description":"Input file or dir path, output markdown file(s).","archived":false,"fork":false,"pushed_at":"2025-01-09T23:45:40.000Z","size":67,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-26T12:12:44.380Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bitnom.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-01-09T22:30:35.000Z","updated_at":"2025-01-09T23:45:43.000Z","dependencies_parsed_at":"2025-01-09T23:27:32.220Z","dependency_job_id":"5ef49116-b736-4462-88fa-9a0904b5fdc2","html_url":"https://github.com/bitnom/path2md","commit_stats":null,"previous_names":["bitnom/path2md"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/bitnom/path2md","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitnom%2Fpath2md","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitnom%2Fpath2md/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitnom%2Fpath2md/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitnom%2Fpath2md/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bitnom","download_url":"https://codeload.github.com/bitnom/path2md/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bitnom%2Fpath2md/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278903014,"owners_count":26065786,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-31T13:17:55.413Z","updated_at":"2025-10-08T06:39:50.133Z","avatar_url":"https://github.com/bitnom.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# path2md\n\n**Version**: 0.4.0  \n**Author**: [bitnom](https://github.com/bitnom)  \n**License**: Apache License 2.0  \n\n## Table of Contents\n\n- [Overview](#overview)\n- [Installation](#installation)\n  - [Install via uv (Recommended)](#install-via-uv-recommended)\n  - [Install via pip](#install-via-pip)\n  - [Install via Poetry](#install-via-poetry)\n- [Usage](#usage)\n  - [Basic Example](#basic-example)\n  - [Specifying File Extensions](#specifying-file-extensions)\n  - [Omitting Files or Directories](#omitting-files-or-directories)\n  - [Truncating Lines or Strings](#truncating-lines-or-strings)\n  - [Removing Comments](#removing-comments)\n  - [Limiting Recursion Depth](#limiting-recursion-depth)\n  - [Whitelisting Files or Directories](#whitelisting-files-or-directories)\n  - [Using Gitignore](#using-gitignore)\n  - [Output Options](#output-options)\n  - [File Size Limit and Binary Files](#file-size-limit-and-binary-files)\n- [How It Works](#how-it-works)\n- [Known Limitations and Caveats](#known-limitations-and-caveats)\n- [Contributing](#contributing)\n- [License](#license)\n\n---\n\n## Overview\n\n`path2md` is a command-line tool designed to collect files from a given directory (and its subdirectories) and wrap each file’s content in Markdown code fences. This lets you quickly generate documentation or share your code snippets in a Markdown-friendly format. You can:\n\n- Restrict which files to include (by file extension, whitelists, or `.gitignore`).  \n- **Parse all file extensions by default** (unless you explicitly provide `--extensions`).  \n- Omit files by extension or filename but still note their presence in the output.  \n- Optionally strip comments to reduce clutter.  \n- Truncate lines and/or strings to limit overly long content.  \n- Limit consecutive empty lines to make the output more compact.  \n- **Skip files larger than a specified maximum size** (`--max-size`, default 100 KB).  \n- **Always skip binary files** automatically.  \n- Produce either a single Markdown file or multiple Markdown files (one per source file).\n\nThis tool is especially helpful if you want to share or document multiple files (e.g., sample code, config files) with an LLM (Large Language Model) without manually copying and pasting them into code blocks.\n\n[See output demo in example.md](example.md)\n\n---\n\n## Installation\n\n### Install via uv (Recommended)\n\nIf you haven’t used **uv** before, it’s a lightweight Python project management tool that helps isolate dependencies. You can install **uv** system-wide with:\n\n```bash\npip install uv\n```\n\nThen follow these steps:\n\n1. **Clone** or **download** this repository.\n2. In the project directory (where your `pyproject.toml` is located), run:\n\n   ```bash\n   uv install\n   ```\n\n3. Once installed, you can run:\n\n   ```bash\n   uv run path2md --help\n   ```\n\n   Or simply:\n\n   ```bash\n   uv path2md --help\n   ```\n\n   depending on your uv version/configuration.\n\n### Install via pip\n\n1. **Clone** or **download** this repository.\n2. From the top-level directory (with the `pyproject.toml`), run:\n\n   ```bash\n   pip install .\n   ```\n\n   This will build and install the package into your current Python environment.\n3. Once installed, you can run:\n\n   ```bash\n   path2md --help\n   ```\n\n\u003e **Note:** If you want to install in “editable”/dev mode, use:\n\u003e ```bash\n\u003e pip install -e .\n\u003e ```\n\u003e Then any local changes to the code reflect immediately.\n\n### Install via Poetry\n\n1. **Clone** or **download** this repository.\n2. In the project directory, run:\n\n   ```bash\n   poetry install\n   ```\n\n3. Once installed, you can:\n   - Use it directly via:\n     ```bash\n     poetry run path2md --help\n     ```\n   - Or activate the virtual environment (`poetry shell`) and then run:\n     ```bash\n     path2md --help\n     ```\n\n---\n\n## Usage\n\nAfter installation (via uv, pip, or Poetry), you’ll have a `path2md` CLI command in your PATH. Run:\n\n```bash\npath2md \u003cdirectory\u003e [options]\n```\n\nBelow is a summary of all the available options:\n\n```txt\npositional arguments:\n  directory             Directory containing files to process.\n\noptional arguments:\n  --output-file OUTPUT_FILE\n                        Output markdown file path.\n  --output-dir OUTPUT_DIR\n                        Output directory for individual markdown files.\n  --extensions EXTENSIONS\n                        Comma-separated list of file extensions to process.\n                        If not provided, ALL file extensions are processed\n                        (excluding binary files).\n  --omit OMIT\n                        Comma-separated list of file extensions to omit\n                        (source omitted but file is noted).\n  --omit-files OMIT_FILES\n                        Comma-separated list of filenames to omit (source\n                        omitted but file is noted).\n  --omit-dirs OMIT_DIRS\n                        Comma-separated list of directory names to omit\n                        from traversal entirely.\n  --truncln TRUNCLN\n                        Truncate lines longer than this many characters.\n  --truncstr TRUNCSTR\n                        Truncate strings longer than this many characters.\n  --nocom\n                        Omit all line/block comments from the output.\n  --maxlnspace MAXLNSPACE\n                        Maximum number of consecutive empty lines allowed.\n  --depth DEPTH\n                        Limit directory recursion depth.\n  --whitelist-files WHITELIST_FILES\n                        Comma-separated list of files to parse.\n  --whitelist-dirs WHITELIST_DIRS\n                        Comma-separated list of directory names to traverse.\n  --whitelist WHITELIST\n                        Comma-separated list of files/dirs to process.\n  --gitignore GITIGNORE\n                        Path to a .gitignore file (global).\n  --obey-gitignores\n                        Obey .gitignore files found in traversed directories.\n  --max-size MAX_SIZE\n                        Maximum file size in bytes to process (default: 100 KB).\n  --version\n                        Show program's version number and exit.\n```\n\n### Basic Example\n\n```bash\npath2md my_project --output-file project_snippets.md\n```\n\n- Traverses `my_project/` looking for **all file extensions** by default (skipping binary files and those over 100 KB).  \n- Outputs all discovered (non-binary) files into a single Markdown file named `project_snippets.md`.\n\n### Specifying File Extensions\n\nTo only include certain extensions:\n\n```bash\npath2md my_project --extensions py,js,json\n```\n\nThis processes **only** `.py`, `.js`, and `.json` files (still skipping binary files and those over 100 KB).\n\n### Omitting Files or Directories\n\nYou can omit files by extension or by exact filename:\n\n```bash\n# Omit .env and .lock files, but still note them in the output\npath2md my_project --omit env,lock\n```\n\nTo **completely skip** a directory, use `--omit-dirs`:\n\n```bash\npath2md my_project --omit-dirs node_modules,build\n```\n\nAny directory named `node_modules` or `build` will not be entered during traversal.\n\n### Truncating Lines or Strings\n\n- `--truncln` truncates individual lines if they exceed a certain length.\n- `--truncstr` truncates string literals (e.g., `\"...\"`, `'...'`, triple quotes, backticks).\n\nExample:\n\n```bash\npath2md my_project --truncln 120 --truncstr 200\n```\n\nLines over 120 characters will be shortened, and string literals over 200 characters will be truncated.\n\n### Removing Comments\n\nUse `--nocom` to strip out comments:\n\n```bash\npath2md my_project --nocom\n```\n\nCurrently, this removes:\n- `# ...` lines in Python.\n- `// ...` lines and `/* ... */` blocks in JS/TS/CSS/HTML.\n\nIt is a naive removal (simple regex-based) and won’t handle advanced edge cases (like `#` in a string).\n\n### Limiting Recursion Depth\n\nIf you only want to scan subdirectories up to a certain depth from the initial directory:\n\n```bash\npath2md my_project --depth 2\n```\n\n- `depth=0` means only the directory itself.  \n- `depth=1` means the directory and its immediate subdirectories.  \n\n### Whitelisting Files or Directories\n\nIf you only want to process specific files or directories:\n\n```bash\npath2md my_project --whitelist-files main.py,settings.py\n```\n\nThis will only process `main.py` and `settings.py` (within the given directory). Similarly, `--whitelist-dirs` only traverses directories whose names match the whitelist. The more general `--whitelist` applies to both file and directory names.\n\n### Using Gitignore\n\nYou can specify a global `.gitignore` to skip certain files:\n\n```bash\npath2md my_project --gitignore /path/to/.gitignore\n```\n\nOr, if you want the script to obey any `.gitignore` found inside subdirectories:\n\n```bash\npath2md my_project --obey-gitignores\n```\n\nThis means each subdirectory’s `.gitignore` rules are also applied.\n\n### Output Options\n\n1. **Output to a single Markdown file**:\n\n   ```bash\n   path2md my_project --output-file output.md\n   ```\n\n2. **Output to multiple Markdown files (one per source file)**:\n\n   ```bash\n   path2md my_project --output-dir output_folder\n   ```\n\n   This creates `output_folder/` if it doesn’t exist, then places individual `.md` files for each source file. The filenames are based on the relative paths of the source files but sanitized for filesystem safety.\n\n3. **Output to STDOUT** (default if neither `--output-file` nor `--output-dir` is specified):\n\n   ```bash\n   path2md my_project\n   ```\n\n### File Size Limit and Binary Files\n\n- By default, files larger than **100 KB** are skipped. You can customize this limit with `--max-size \u003cBYTES\u003e`.\n- The script **always skips binary files**. Any file containing a null byte (`\\0`) in the first 1024 bytes is treated as binary, and it won’t be included in the output.\n\n---\n\n## How It Works\n\n1. **Argument Parsing**  \n   The script reads all CLI options and determines which files/directories to traverse or skip.\n\n2. **File Collection**  \n   - Uses `os.walk()` to scan the specified directory.  \n   - Checks optional recursion depth, directory whitelists/omits, `.gitignore` rules, and maximum file size to filter out unwanted paths.  \n   - Automatically **skips binary files**.  \n   - By default, if `--extensions` is **not** provided, **all** non-binary files under the `max-size` limit are processed.\n\n3. **Fencing Content**  \n   For each file that passes the filters:\n   - If its extension or filename is in the “omit” lists, it’s only *referenced* (with a note that content is omitted).  \n   - Otherwise, the script reads the file content, optionally removes comments, truncates lines/strings, and limits consecutive empty lines.  \n   - Wraps the processed text in Markdown fences.\n\n4. **Output**  \n   - All fenced content is joined into a single string or separated into multiple Markdown files as requested.  \n   - If splitting into multiple files, the script uses a simple split logic on the combined string and writes each chunk to an individual `.md` file.\n\n---\n\n## Known Limitations and Caveats\n\n1. **Naive Regex for Comments and Strings**  \n   - The regex approach may remove content that merely *resembles* a comment (e.g., `//` in a string).  \n   - Similarly, string truncation might behave unexpectedly with nested quotes or escaped characters.\n\n2. **Splitting Output in `--output-dir` Mode**  \n   - The script splits combined content on `\\n**`, which might conflict if your files legitimately contain that exact sequence in code. This is unlikely but worth noting.  \n\n3. **Overwriting Files**  \n   - If two different source files sanitize to the same name, the second will overwrite the first in the output directory. (For example, `foo/bar.py` and `foo:bar.py` both becoming `foo_bar_py.md`.)\n\n4. **Case Sensitivity**  \n   - On some filesystems (e.g., Windows), filename case might cause collisions in `--output-dir` mode.\n\nIf these caveats don’t affect your typical use, the script should work fine.\n\n---\n\n## Contributing\n\nContributions, bug reports, and feature requests are welcome. Please open an issue or submit a pull request on the [GitHub repository](https://github.com/bitnom/path2md) (or wherever the project is hosted).\n\nWhen submitting code changes, please ensure you:\n\n1. Write clear commit messages.  \n2. Include testing or sample usage if you introduce new features.  \n3. Adhere to Pythonic style (PEP 8).\n\n---\n\n## License\n\nThis project is under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). Feel free to modify or distribute under the terms of that license, or use a different license if you prefer.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbitnom%2Fpath2md","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbitnom%2Fpath2md","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbitnom%2Fpath2md/lists"}