{"id":31920218,"url":"https://github.com/iamgerwin/csharp-razor-docx-parser-poc","last_synced_at":"2026-02-18T15:32:09.977Z","repository":{"id":317941968,"uuid":"1069450171","full_name":"iamgerwin/csharp-razor-docx-parser-poc","owner":"iamgerwin","description":"A proof of concept Blazor web application that accepts DOCX file uploads and provides intelligent parsing with multiple output formats. Built with .NET 9 and modern web technologies.","archived":false,"fork":false,"pushed_at":"2025-10-04T02:07:14.000Z","size":810,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-22T09:52:20.805Z","etag":null,"topics":["csharp","docx","dotnet","parser"],"latest_commit_sha":null,"homepage":"","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iamgerwin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-10-04T00:52:57.000Z","updated_at":"2025-10-04T05:14:49.000Z","dependencies_parsed_at":"2025-10-04T03:29:50.576Z","dependency_job_id":"a8381845-b129-413e-8a83-fa6437162893","html_url":"https://github.com/iamgerwin/csharp-razor-docx-parser-poc","commit_stats":null,"previous_names":["iamgerwin/csharp-razor-docx-parser-poc"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/iamgerwin/csharp-razor-docx-parser-poc","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamgerwin%2Fcsharp-razor-docx-parser-poc","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamgerwin%2Fcsharp-razor-docx-parser-poc/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamgerwin%2Fcsharp-razor-docx-parser-poc/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamgerwin%2Fcsharp-razor-docx-parser-poc/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iamgerwin","download_url":"https://codeload.github.com/iamgerwin/csharp-razor-docx-parser-poc/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iamgerwin%2Fcsharp-razor-docx-parser-poc/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":29583918,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-02-18T13:56:48.962Z","status":"ssl_error","status_checked_at":"2026-02-18T13:54:34.145Z","response_time":162,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csharp","docx","dotnet","parser"],"created_at":"2025-10-13T21:56:24.100Z","updated_at":"2026-02-18T15:32:09.961Z","avatar_url":"https://github.com/iamgerwin.png","language":"HTML","readme":"# DocX Parser - Blazor Web Application\r\n\r\n[![.NET](https://img.shields.io/badge/.NET-9.0-512BD4)](https://dotnet.microsoft.com/)\r\n[![Blazor](https://img.shields.io/badge/Blazor-Web-512BD4)](https://dotnet.microsoft.com/apps/aspnet/web-apps/blazor)\r\n[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)\r\n\r\n## 🎥 Demo\r\n\r\n\u003cdiv\u003e\r\n    \u003ca href=\"https://www.loom.com/share/75a806613fb24c1385534d9352d938c2\"\u003e\r\n      \u003cp\u003eDocX Parser - 4 October 2025 - Watch Video\u003c/p\u003e\r\n    \u003c/a\u003e\r\n    \u003ca href=\"https://www.loom.com/share/75a806613fb24c1385534d9352d938c2\"\u003e\r\n      \u003cimg style=\"max-width:300px;\" src=\"https://cdn.loom.com/sessions/thumbnails/75a806613fb24c1385534d9352d938c2-8c429fde1cc6cafd-full-play.gif\"\u003e\r\n    \u003c/a\u003e\r\n  \u003c/div\u003e\r\n\r\n---\r\n\r\nA proof of concept Blazor web application that accepts DOCX file uploads and provides intelligent parsing with multiple output formats. Built with .NET 9 and modern web technologies.\r\n\r\n## ✨ Features\r\n\r\n- **📄 DOCX File Upload** - Click to upload .docx files (up to 10MB)\r\n- **🔍 Intelligent Parsing** - Extract and categorize document content using DocumentFormat.OpenXml\r\n- **📊 Multiple Output Formats**:\r\n  - **Raw Text** - Simple plain text extraction\r\n  - **Categorized** - Organized view with headings, paragraphs, and tables\r\n  - **HTML** - Full HTML conversion with styling\r\n  - **JSON** - Structured JSON output for programmatic use\r\n  - **Markdown** - Formatted Markdown output with proper syntax\r\n- **🎨 Modern UI** - shadcn-inspired design with clean, responsive interface\r\n- **📋 Clipboard Support** - Copy HTML/JSON/Markdown output with visual feedback\r\n- **⚡ Real-time Processing** - Fast client-side processing with loading states\r\n- **🔒 Secure** - Temporary file handling with automatic cleanup\r\n\r\n## 🚀 Quick Start\r\n\r\n### Prerequisites\r\n\r\n- [.NET 9 SDK](https://dotnet.microsoft.com/download/dotnet/9.0) installed\r\n- A modern web browser (Chrome, Firefox, Safari, Edge)\r\n\r\n### Installation\r\n\r\n1. Clone the repository:\r\n```bash\r\ngit clone https://github.com/iamgerwin/csharp-razor-docx-parser-poc.git\r\ncd csharp-razor-docx-parser-poc\r\n```\r\n\r\n2. Restore dependencies:\r\n```bash\r\ndotnet restore\r\n```\r\n\r\n3. Run the application:\r\n```bash\r\ndotnet run\r\n```\r\n\r\n4. Open your browser and navigate to:\r\n```\r\nhttp://localhost:5000\r\n```\r\n\r\n## 🏗️ Project Structure\r\n\r\n```\r\nDocxParserApp/\r\n├── Components/\r\n│   ├── Layout/          # Layout components (Nav, Main)\r\n│   └── Pages/           # Page components (Home, About)\r\n├── Services/            # Business logic layer\r\n│   └── DocxParserService.cs\r\n├── wwwroot/             # Static assets\r\n│   ├── app.css          # Application styles\r\n│   └── clipboard.js     # Clipboard \u0026 toast functionality\r\n├── DocxParserApp.Tests/ # Unit tests\r\n└── Program.cs           # Application entry point\r\n```\r\n\r\n## 🛠️ Technologies Used\r\n\r\n### Backend\r\n- **.NET 9** - Latest .NET framework\r\n- **Blazor Server** - Interactive server-side rendering\r\n- **DocumentFormat.OpenXml 3.3.0** - DOCX parsing and manipulation\r\n- **System.IO.Packaging** - Document package handling\r\n\r\n### Frontend\r\n- **Blazor Components** - Reusable UI components\r\n- **shadcn-inspired CSS** - Modern, accessible design system\r\n- **JavaScript Interop** - Clipboard API integration\r\n- **Responsive Design** - Mobile-friendly interface\r\n\r\n## 📖 How to Use\r\n\r\n1. **Upload a File**\r\n   - Click \"Choose DOCX File\" or drag \u0026 drop a .docx file\r\n   - Maximum file size: 10MB\r\n   - Supported format: .docx only\r\n\r\n2. **Select Output Format**\r\n   - Choose from Raw Text, Categorized, HTML, or JSON\r\n   - Switch between formats instantly without re-uploading\r\n\r\n3. **View Results**\r\n   - Categorized view shows headings, paragraphs, and tables separately\r\n   - HTML view provides ready-to-use HTML markup\r\n   - JSON view offers structured data for integration\r\n\r\n4. **Copy to Clipboard**\r\n   - Click \"Copy HTML\" or \"Copy JSON\" to copy the output\r\n   - Toast notification confirms successful copy\r\n\r\n## 🔧 Configuration\r\n\r\n### Port Configuration\r\n\r\nEdit `Properties/launchSettings.json` to change the default port:\r\n\r\n```json\r\n{\r\n  \"profiles\": {\r\n    \"http\": {\r\n      \"commandName\": \"Project\",\r\n      \"dotnetRunMessages\": true,\r\n      \"launchBrowser\": true,\r\n      \"applicationUrl\": \"http://localhost:5000\",\r\n      \"environmentVariables\": {\r\n        \"ASPNETCORE_ENVIRONMENT\": \"Development\"\r\n      }\r\n    }\r\n  }\r\n}\r\n```\r\n\r\n### File Upload Limits\r\n\r\nModify the max file size in `Components/Pages/Home.razor`:\r\n\r\n```csharp\r\nawait file.OpenReadStream(maxAllowedSize: 10 * 1024 * 1024) // 10MB\r\n```\r\n\r\n## 🧪 Testing\r\n\r\nUnit tests are included in the `DocxParserApp.Tests` project:\r\n\r\n```bash\r\ndotnet test\r\n```\r\n\r\nTests cover:\r\n- HTML generation with various content types\r\n- JSON serialization and formatting\r\n- Special character escaping\r\n- Data model initialization\r\n- Edge cases and error handling\r\n\r\n## 📝 API Reference\r\n\r\n### DocxParserService\r\n\r\nThe core service for document parsing:\r\n\r\n```csharp\r\npublic class DocxParserService\r\n{\r\n    // Parse a DOCX file and extract structured content\r\n    public DocxParseResult ParseDocument(string filePath)\r\n\r\n    // Generate HTML from parsed result\r\n    public string GenerateHtml(DocxParseResult parseResult)\r\n\r\n    // Generate JSON from parsed result\r\n    public string GenerateJson(DocxParseResult parseResult)\r\n}\r\n```\r\n\r\n### Data Models\r\n\r\n```csharp\r\npublic class DocxParseResult\r\n{\r\n    public string RawText { get; set; }\r\n    public List\u003cHeadingElement\u003e Headings { get; set; }\r\n    public List\u003cParagraphElement\u003e Paragraphs { get; set; }\r\n    public List\u003cTableElement\u003e Tables { get; set; }\r\n}\r\n\r\npublic class HeadingElement\r\n{\r\n    public int Level { get; set; }      // 1-6\r\n    public string Text { get; set; }\r\n}\r\n\r\npublic class ParagraphElement\r\n{\r\n    public string Text { get; set; }\r\n    public bool IsBold { get; set; }\r\n    public bool IsItalic { get; set; }\r\n}\r\n\r\npublic class TableElement\r\n{\r\n    public List\u003cList\u003cstring\u003e\u003e Rows { get; set; }\r\n}\r\n```\r\n\r\n## 🤝 Contributing\r\n\r\nContributions are welcome! Please feel free to submit a Pull Request.\r\n\r\n## 📄 License\r\n\r\nThis project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.\r\n\r\n## 👨‍💻 Author\r\n\r\n**Gerwin**\r\n- GitHub: [@iamgerwin](https://github.com/iamgerwin)\r\n- LinkedIn: [iamgerwin](https://ph.linkedin.com/in/iamgerwin)\r\n\r\n## 🙏 Acknowledgments\r\n\r\n- [DocumentFormat.OpenXml](https://github.com/OfficeDev/Open-XML-SDK) - Microsoft's Open XML SDK\r\n- [Blazor](https://dotnet.microsoft.com/apps/aspnet/web-apps/blazor) - Microsoft's Blazor framework\r\n- [shadcn/ui](https://ui.shadcn.com/) - Design inspiration\r\n\r\n## 🐛 Known Issues\r\n\r\n- Tests require additional configuration due to project structure\r\n- Large files (\u003e10MB) may cause performance issues\r\n- Complex DOCX formatting may not be fully preserved\r\n- Drag-and-drop file upload not supported (Blazor limitation)\r\n\r\n## 🗺️ Roadmap\r\n\r\n- [ ] Add support for .doc files\r\n- [ ] Implement batch file processing\r\n- [ ] Add export to PDF functionality\r\n- [ ] Improve table formatting preservation\r\n- [ ] Add support for images and embedded objects\r\n- [ ] Implement file size optimization\r\n- [ ] Add progress indicators for large files\r\n\r\n---\r\n\r\n**Note**: This is a proof of concept application built for demonstration purposes. For production use, additional security hardening and performance optimization may be required.\r\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamgerwin%2Fcsharp-razor-docx-parser-poc","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiamgerwin%2Fcsharp-razor-docx-parser-poc","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiamgerwin%2Fcsharp-razor-docx-parser-poc/lists"}