{"id":28719184,"url":"https://github.com/hamedfathi/recursivetextsplitter","last_synced_at":"2026-03-11T04:32:30.084Z","repository":{"id":298552601,"uuid":"1000365160","full_name":"HamedFathi/RecursiveTextSplitter","owner":"HamedFathi","description":"A smart C# text splitting library that intelligently chunks text while preserving semantic boundaries. Uses a hierarchical approach with configurable overlap and detailed metadata.","archived":false,"fork":false,"pushed_at":"2025-06-18T08:49:32.000Z","size":28,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-18T09:56:57.792Z","etag":null,"topics":["csharp","dotnet","dotnet-core","dotnet-library","dotnetcore","recursive","recursive-algorithm","recursive-text-splitter","text","text-split","text-splitter","text-splitting"],"latest_commit_sha":null,"homepage":"","language":"C#","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HamedFathi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-11T17:03:18.000Z","updated_at":"2025-07-11T08:10:24.000Z","dependencies_parsed_at":"2025-06-11T19:06:18.490Z","dependency_job_id":null,"html_url":"https://github.com/HamedFathi/RecursiveTextSplitter","commit_stats":null,"previous_names":["hamedfathi/recursivetextsplitter"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/HamedFathi/RecursiveTextSplitter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HamedFathi%2FRecursiveTextSplitter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HamedFathi%2FRecursiveTextSplitter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HamedFathi%2FRecursiveTextSplitter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HamedFathi%2FRecursiveTextSplitter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HamedFathi","download_url":"https://codeload.github.com/HamedFathi/RecursiveTextSplitter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HamedFathi%2FRecursiveTextSplitter/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266058481,"owners_count":23870157,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csharp","dotnet","dotnet-core","dotnet-library","dotnetcore","recursive","recursive-algorithm","recursive-text-splitter","text","text-split","text-splitter","text-splitting"],"created_at":"2025-06-15T06:00:36.045Z","updated_at":"2026-03-11T04:32:30.045Z","avatar_url":"https://github.com/HamedFathi.png","language":"C#","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RecursiveTextSplitter User Guide\n\n## Overview\n\nThe **RecursiveTextSplitter** is a C# library that provides intelligent text splitting functionality with semantic awareness. Unlike simple character-based splitting, this library attempts to preserve meaningful boundaries by using a hierarchical approach to text segmentation, from paragraph breaks down to character-level splitting as a last resort.\n\n## Key Features\n\n- **Semantic Awareness**: Maintains natural text boundaries (paragraphs, sentences, words)\n- **Configurable Overlap**: Supports overlapping chunks for better context preservation\n- **Flexible Separators**: Allows custom separator hierarchies or uses intelligent defaults\n- **Detailed Metadata**: Provides comprehensive information about each chunk including position data and line/column tracking\n- **Word-Safe Overlap**: Ensures overlap occurs at natural word boundaries\n- **Position Tracking**: Tracks both character positions and line/column coordinates in the original text\n\n## Installation\n\n### Via NuGet Package Manager\n\nInstall the RecursiveTextSplitter package from NuGet:\n\n```bash\ndotnet add package RecursiveTextSplitter\n```\n\nOr via Package Manager Console in Visual Studio:\n\n```powershell\nInstall-Package RecursiveTextSplitter\n```\n\nOr search for \"RecursiveTextSplitter\" in the Visual Studio NuGet Package Manager UI.\n\n**NuGet Package:** https://www.nuget.org/packages/RecursiveTextSplitter/\n\n### Usage\n\nAdd the namespace to your C# project:\n\n```csharp\nusing RecursiveTextSplitting;\n```\n\n## Basic Usage\n\n### Simple Text Splitting\n\nThe most straightforward way to split text is using the `RecursiveSplit` extension method:\n\n```csharp\nstring document = \"Artificial intelligence is transforming every industry.\\nFrom healthcare to finance, automation is becoming smarter and more adaptive.\\n\\nHowever, challenges like bias, interpretability, and safety remain important areas of research.\";\n\nvar chunks = document.RecursiveSplit(chunkSize: 80, chunkOverlap: 0);\n\nforeach (var chunk in chunks)\n{\n    Console.WriteLine($\"Chunk: {chunk}\");\n    Console.WriteLine(\"---\");\n}\n```\n\n### Advanced Splitting with Metadata\n\nFor more detailed information about each chunk, including line and column positions, use the `AdvancedRecursiveSplit` method:\n\n```csharp\nstring document = \"Artificial intelligence is transforming every industry.\\nFrom healthcare to finance, automation is becoming smarter and more adaptive.\\n\\nHowever, challenges like bias, interpretability, and safety remain important areas of research.\";\n\nvar chunks = document.AdvancedRecursiveSplit(chunkSize: 80, chunkOverlap: 0);\n\nforeach (var chunk in chunks)\n{\n    Console.WriteLine($\"Chunk {chunk.ChunkIndex}: {chunk.Text}\");\n    Console.WriteLine($\"Start Position: {chunk.StartPosition} (Line {chunk.StartLine}, Column {chunk.StartColumn})\");\n    Console.WriteLine($\"End Position: {chunk.EndPosition} (Line {chunk.EndLine}, Column {chunk.EndColumn})\");\n    Console.WriteLine($\"Separator Used: {chunk.SeparatorUsed}\");\n    Console.WriteLine(\"---\");\n}\n```\n\n## Working with Overlap\n\nOverlap allows consecutive chunks to share some content, which is particularly useful for maintaining context in applications like search indexing or machine learning.\n\n### Basic Overlap Example\n\n```csharp\nstring document = \"Artificial intelligence is transforming every industry.\\nFrom healthcare to finance, automation is becoming smarter and more adaptive.\\n\\nHowever, challenges like bias, interpretability, and safety remain important areas of research.\";\n\n// Split with 25 characters of overlap\nvar chunks = document.RecursiveSplit(chunkSize: 80, chunkOverlap: 25);\n\nforeach (var chunk in chunks)\n{\n    Console.WriteLine($\"Chunk: {chunk}\");\n    Console.WriteLine(\"---\");\n}\n```\n\n### Advanced Overlap with Metadata\n\n```csharp\nstring document = \"Artificial intelligence is transforming every industry.\\nFrom healthcare to finance, automation is becoming smarter and more adaptive.\\n\\nHowever, challenges like bias, interpretability, and safety remain important areas of research.\";\n\nvar chunks = document.AdvancedRecursiveSplit(chunkSize: 80, chunkOverlap: 25);\n\nforeach (var chunk in chunks)\n{\n    Console.WriteLine($\"Chunk {chunk.ChunkIndex}:\");\n    Console.WriteLine($\"  Full Text: {chunk.Text}\");\n    Console.WriteLine($\"  Overlap: '{chunk.OverlapText}'\");\n    Console.WriteLine($\"  Original Content: '{chunk.ChunkText}'\");\n    Console.WriteLine($\"  Position: {chunk.StartPosition}-{chunk.EndPosition}\");\n    Console.WriteLine($\"  Location: Lines {chunk.StartLine}-{chunk.EndLine}\");\n    Console.WriteLine(\"---\");\n}\n```\n\n## Understanding the TextChunk Class\n\nThe `TextChunk` class provides comprehensive metadata about each split segment:\n\n```csharp\npublic class TextChunk\n{\n    public string Text { get; set; }           // Complete text including overlap\n    public string OverlapText { get; set; }    // Only the overlap portion\n    public string ChunkText { get; set; }      // Original chunk without overlap\n    public int StartPosition { get; set; }     // 1-based start position in original text\n    public int EndPosition { get; set; }       // 1-based end position in original text\n    public string SeparatorUsed { get; set; }  // Separator that created this chunk\n    public int ChunkIndex { get; set; }        // Sequential chunk number (1-based)\n    public int StartColumn { get; set; }       // 1-based column where chunk starts\n    public int StartLine { get; set; }         // 1-based line where chunk starts\n    public int EndColumn { get; set; }         // 1-based column where chunk ends\n    public int EndLine { get; set; }           // 1-based line where chunk ends\n}\n```\n\n### Position Tracking Features\n\nThe library now provides detailed position tracking with both character-level and line/column coordinates:\n\n- **Character Positions**: `StartPosition` and `EndPosition` provide 1-based character indices in the original text\n- **Line/Column Tracking**: `StartLine`, `StartColumn`, `EndLine`, `EndColumn` provide 1-based line and column coordinates\n- **Comprehensive Coverage**: All positions are tracked accurately even when overlap is applied\n\n## Custom Separators\n\nYou can provide your own separator hierarchy for specialized splitting needs:\n\n```csharp\nstring document = \"Section 1|Subsection A;Item 1,Item 2|Section 2;Item 3\";\n\n// Custom separators prioritizing sections, then subsections, then items\nstring[] customSeparators = { \"|\", \";\", \",\" };\n\nvar chunks = document.AdvancedRecursiveSplit(\n    chunkSize: 20, \n    chunkOverlap: 0, \n    separators: customSeparators\n);\n\nforeach (var chunk in chunks)\n{\n    Console.WriteLine($\"Chunk: {chunk.Text}\");\n    Console.WriteLine($\"Split using: {chunk.SeparatorUsed}\");\n    Console.WriteLine($\"At line {chunk.StartLine}, column {chunk.StartColumn}\");\n    Console.WriteLine(\"---\");\n}\n```\n\n## Separator Hierarchy\n\nThe library uses a hierarchical approach to splitting, trying larger semantic units first:\n\n1. **Paragraph breaks** (`\\r\\n\\r\\n`, `\\n\\n`) - Largest semantic units\n2. **Sentence endings with line breaks** (`.\\r\\n`, `!\\r\\n`, `?\\r\\n`, `:\\r\\n`, `;\\r\\n`)\n3. **Single line breaks** (`\\r\\n`)\n4. **Sentence endings with newlines** (`.\\n`, `!\\n`, `?\\n`, `:\\n`, `;\\n`)\n5. **Single newlines** (`\\n`)\n6. **Sentence endings with spaces** (`. `, `! `, `? `)\n7. **Punctuation with spaces** (`; `, `, `)\n8. **Word boundaries** (` `) - Single spaces\n9. **Character-by-character** (`\"\"`) - Last resort\n\n## Contributing\n\nWe welcome contributions to make RecursiveTextSplitter even better! Here are some ways you can help:\n\n### 🌟 **Star this repository** if you find it useful!\n\nYour star helps others discover this library and motivates continued development.\n\n### 🔧 **Pull Requests Welcome**\n\nWe're open to pull requests! Whether you want to:\n\n- Fix bugs or improve existing functionality\n- Add new features or splitting strategies\n- Improve documentation or examples\n- Optimize performance\n- ...\n\nPlease feel free to fork the repository and submit a pull request. For larger changes, consider opening an issue first to discuss your approach.\n\n### 📝 **Reporting Issues**\n\nFound a bug or have a suggestion? Please open an issue with:\n\n- A clear description of the problem or enhancement\n- Steps to reproduce (for bugs)\n- Sample code demonstrating the issue\n- Expected vs actual behavior\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhamedfathi%2Frecursivetextsplitter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhamedfathi%2Frecursivetextsplitter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhamedfathi%2Frecursivetextsplitter/lists"}