{"id":17767808,"url":"https://github.com/amrsa1/uniparser","last_synced_at":"2025-04-21T19:32:06.586Z","repository":{"id":257951441,"uuid":"873908811","full_name":"amrsa1/UniParser","owner":"amrsa1","description":null,"archived":false,"fork":false,"pushed_at":"2024-10-19T12:13:45.000Z","size":228,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-10-26T21:35:14.885Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/amrsa1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-16T23:43:39.000Z","updated_at":"2024-10-19T12:12:54.000Z","dependencies_parsed_at":"2024-10-26T21:16:31.498Z","dependency_job_id":"f3b8efa0-ab5f-4c29-b86f-0848f6b22c50","html_url":"https://github.com/amrsa1/UniParser","commit_stats":null,"previous_names":["amrsa1/uniparser"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amrsa1%2FUniParser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amrsa1%2FUniParser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amrsa1%2FUniParser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/amrsa1%2FUniParser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/amrsa1","download_url":"https://codeload.github.com/amrsa1/UniParser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223876417,"owners_count":17218387,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-26T20:51:23.452Z","updated_at":"2024-11-09T20:04:24.054Z","avatar_url":"https://github.com/amrsa1.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 📜 **UniParser**: Universal File Parsing for Node.js\n\n**UniParser** is a powerful, lightweight Node.js library designed to handle parsing of multiple file formats—such as **PDF**, **DOCX**, **TXT**, **HTML**, and **Markdown**—and convert them into **plain text** with ease.\n\n🚀 **Say goodbye to file format limitations!** UniParser extracts text content from all these formats, providing a consistent text output for your applications.\n\n---\n\n## ✨ **Features**\n\n- 🔍 **PDF Parsing**: Extracts plain text from PDF documents.\n- 📝 **DOCX Parsing**: Reads and extracts text from Microsoft Word `.docx` files.\n- 📄 **TXT Parsing**: Handles plain text files with no special formatting.\n- 🌐 **HTML Parsing**: Extracts text from the body of HTML documents.\n- 🎨 **Markdown Parsing**: Converts Markdown files to plain text, stripping out all formatting syntax.\n- 🔄 **Auto-detection**: Automatically detects the file format and parses it using the `autoParse` function.\n\n---\n\n## 📦 **Installation**\n\nTo install **UniParser**, simply run:\n\n```bash\nnpm install uniparser\n```\n\n---\n\n## 🛠️ **Usage**\n\n### **CommonJS (CJS) Example**\n\nIf you’re working in a Node.js environment with CommonJS (CJS), use `require()` to import UniParser:\n\n```javascript\nconst { autoParse, parsePDF, parseDOCX, parseTXT, parseHTML, parseMarkdown } = require('uniparser');\n\n// Example: Automatically detect and parse a file\n(async () =\u003e {\n    const parsedText = await autoParse('./path/to/sample-file.pdf');\n    console.log(parsedText);\n})();\n\n// Example: Parse specific file types\nconst pdfText = await parsePDF('./path/to/sample-file.pdf');\nconst docxText = await parseDOCX('./path/to/sample-file.docx');\nconst txtText = parseTXT('./path/to/sample-file.txt');\nconst htmlText = parseHTML('./path/to/sample-file.html');\nconst markdownText = parseMarkdown('./path/to/sample-file.md');\n```\n\n### **ES Modules (ESM) Example**\n\nIf you’re working in an ES Module environment (modern JavaScript), use `import` to load the functions:\n\n```javascript\nimport { autoParse, parsePDF, parseDOCX, parseTXT, parseHTML, parseMarkdown } from 'uniparser';\n\n// Example: Automatically detect and parse a file\n(async () =\u003e {\n    const parsedText = await autoParse('./path/to/sample-file.pdf');\n    console.log(parsedText);\n})();\n\n// Example: Parse specific file types\nconst pdfText = await parsePDF('./path/to/sample-file.pdf');\nconst docxText = await parseDOCX('./path/to/sample-file.docx');\nconst txtText = parseTXT('./path/to/sample-file.txt');\nconst htmlText = parseHTML('./path/to/sample-file.html');\nconst markdownText = parseMarkdown('./path/to/sample-file.md');\n```\n\n### ⚡ **Synchronous Usage (for small files)**\n\nFor small files, you can use UniParser synchronously, but this should only be done for very lightweight files.\n\n#### CommonJS (CJS):\n```javascript\nconst { parseTXT, parseMarkdown } = require('uniparser');\n\n// Synchronously read small text files\nconst txtContent = parseTXT('./path/to/sample-file.txt');\nconsole.log(txtContent);\n\nconst markdownContent = parseMarkdown('./path/to/sample-file.md');\nconsole.log(markdownContent);\n```\n\n#### ES Modules (ESM):\n```javascript\nimport { parseTXT, parseMarkdown } from 'uniparser';\n\n// Synchronously read small text files\nconst txtContent = parseTXT('./path/to/sample-file.txt');\nconsole.log(txtContent);\n\nconst markdownContent = parseMarkdown('./path/to/sample-file.md');\nconsole.log(markdownContent);\n```\n\n---\n\n## 🔗 **Supported File Formats**\n\n- 📄 **PDF** (`.pdf`): Converts PDF documents to plain text.\n- 📝 **DOCX** (`.docx`): Extracts text from Microsoft Word `.docx` files.\n- 🖋️ **TXT** (`.txt`): Reads plain text from simple text files.\n- 🌐 **HTML** (`.html`): Strips HTML tags and returns the text content.\n- ✍️ **Markdown** (`.md`): Converts Markdown files to plain text, removing all formatting.\n- 🔄 **Auto-detection**: Detects file types automatically via `autoParse` and processes them accordingly.\n\n---\n\n## 🎯 **Example**\n\nHere's a quick example to get you started with DOCX parsing:\n\n### CommonJS (CJS):\n```javascript\nconst { parseDOCX } = require('uniparser');\n\n(async () =\u003e {\n    const docxText = await parseDOCX('./path/to/sample-file.docx');\n    console.log(docxText);\n})();\n```\n\n### ES Modules (ESM):\n```javascript\nimport { parseDOCX } from 'uniparser';\n\n(async () =\u003e {\n    const docxText = await parseDOCX('./path/to/sample-file.docx');\n    console.log(docxText);\n})();\n```\n\n---\n\n## 🔑 **License**\n\nThis project is licensed under the **MIT License**. See the [LICENSE](./LICENSE) file for more information.\n\n---\n\n## 🤝 **Contributing**\n\nContributions are welcome! If you'd like to improve UniParser, feel free to fork the repository and submit a pull request. We appreciate your feedback and contributions!\n\n---\n\n💡 **UniParser** makes it easier than ever to extract content from a wide range of file formats—**Try it now and streamline your file processing tasks!** 🌟\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famrsa1%2Funiparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Famrsa1%2Funiparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Famrsa1%2Funiparser/lists"}