Projects in Awesome Lists tagged with doc2x
A curated list of projects in awesome lists tagged with doc2x .
https://github.com/wisupai/e2m
E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.
doc2x e2m llm markdown pdf-to-markdown text-cleaning
Last synced: 15 May 2025
https://github.com/NoEdgeAI/pdfdeal
A python wrapper for the Doc2X API and comes with native texts processing (to improve PDF recall in RAG). | Doc2X API的python封装,同时附带本地的文本处理(提升PDF在RAG中的召回率)。
Last synced: 16 Feb 2025