{"id":24746485,"url":"https://github.com/imadnajam/frida","last_synced_at":"2025-03-23T01:12:00.307Z","repository":{"id":277363947,"uuid":"932191471","full_name":"Imadnajam/Frida","owner":"Imadnajam","description":"All Files to Markdown Converter","archived":false,"fork":false,"pushed_at":"2025-03-19T22:34:16.000Z","size":355,"stargazers_count":6,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-19T23:28:59.078Z","etag":null,"topics":["ai","api","fastapi","framer-motion","gpt-neox","javascript","markdown","mongodb","nextjs","nodejs","python","restful-api","shadcn-ui","transformer"],"latest_commit_sha":null,"homepage":"https://frida2.vercel.app","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Imadnajam.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-13T14:18:41.000Z","updated_at":"2025-03-13T11:45:11.000Z","dependencies_parsed_at":"2025-03-11T19:23:44.681Z","dependency_job_id":"6ee0fd04-8772-4ccb-a83f-a75ef7a9c3ca","html_url":"https://github.com/Imadnajam/Frida","commit_stats":null,"previous_names":["imadnajam/frida2"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Imadnajam%2FFrida","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Imadnajam%2FFrida/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Imadnajam%2FFrida/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Imadnajam%2FFrida/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Imadnajam","download_url":"https://codeload.github.com/Imadnajam/Frida/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245040697,"owners_count":20551308,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","api","fastapi","framer-motion","gpt-neox","javascript","markdown","mongodb","nextjs","nodejs","python","restful-api","shadcn-ui","transformer"],"created_at":"2025-01-28T04:23:40.919Z","updated_at":"2025-03-23T01:12:00.292Z","avatar_url":"https://github.com/Imadnajam.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"#  File to Markdown Converter\n\n![Cover](https://raw.githubusercontent.com/Imadnajam/Frida/main/Cover.png)\n\nA **cutting-edge**, **web-based platform** built with **Next.js** for the frontend and **FastAPI** for the backend, designed to convert office documents and files (e.g., Word, Excel, PowerPoint, PDF) into **Markdown** format. This tool simplifies document conversion for developers, writers, and project managers by quickly transforming complex files into lightweight and easy-to-use Markdown files.\n\n\u003e **Powered by GPT-NeoX**: This project harnesses the capabilities of GPT-NeoX, an open-source autoregressive language model, to intelligently parse and convert documents while preserving their semantic structure and formatting.\n\n---\n\n## 🚀 **Why Contribute?**\n\nThis project is **open-source** and thrives on community contributions. Whether you're a developer, designer, or documentation enthusiast, your contributions can help make this tool even better! Here's why you should join us:\n\n- **Impact**: Help thousands of users streamline their document conversion process.\n- **Learn \u0026 Grow**: Work with modern technologies like Next.js, FastAPI, and GPT-NeoX.\n- **Collaborate**: Join a vibrant community of developers and contributors.\n- **Recognition**: Get your name featured in our contributors' list and gain visibility in the open-source world.\n\n---\n\n## 💡 **What is it?**\n\nThe **File-to-Markdown Converter** is a **fast, reliable, and efficient web tool** that allows users to upload documents (e.g., DOCX, PDF, XLSX, PPTX), convert them into Markdown format, and download the output instantly. Markdown is a popular format for documentation and content creation, making this tool perfect for anyone looking to streamline content management.\n\nThis tool is built using:\n\n- **Next.js**: A React-based framework for server-side rendering, ensuring a fast and responsive frontend experience.\n- **FastAPI**: A high-performance, asynchronous Python web framework that powers the backend API for file conversion.\n- **GPT-NeoX**: An advanced language model that enhances document parsing and conversion accuracy.\n\n---\n\n## 🎯 **Key Features**\n\n- **AI-Powered Conversion**: GPT-NeoX intelligently processes documents to maintain semantic meaning and structure.\n- **Convert DOCX to Markdown**: Effortlessly convert Word documents to structured Markdown.\n- **Excel to Markdown**: Extract data from Excel spreadsheets into Markdown tables.\n- **PowerPoint to Markdown**: Generate Markdown from PowerPoint slides (text, bullet points, images).\n- **PDF to Markdown**: Extract text and structure from PDF files to Markdown format.\n- **Real-time Preview**: Preview your converted Markdown before downloading.\n- **Customizable Conversion Options**: Choose specific sections, formats, and customizations when converting.\n- **FastAPI Backend**: Built with high-speed, non-blocking FastAPI, handling multiple conversions simultaneously.\n- **Next.js Frontend**: Provides a fast, responsive, and dynamic UI for users.\n- **File History**: Users can view their previously converted files and download them anytime.\n\n---\n\n## 🛠️ **How It Works**\n\n1. **Upload Your File**: Drag and drop your document (Word, Excel, PowerPoint, PDF) into the upload section.\n2. **AI Processing**: Your document is analyzed by GPT-NeoX to understand its structure and content.\n3. **Conversion**: The file is processed by the FastAPI backend, converting the document into Markdown.\n4. **Preview**: View the Markdown output directly in the browser with live updates.\n5. **Download**: Download the Markdown file for use in your documentation, blog, or project.\n\n---\n\n## 🧩 **Code Examples**\n\n### Frontend File Upload Component (Next.js)\n\n```jsx\nimport { useState } from 'react';\nimport { Upload } from 'lucide-react';\n\nexport default function FileUploader() {\n  const [file, setFile] = useState(null);\n  const [converting, setConverting] = useState(false);\n  const [preview, setPreview] = useState('');\n  \n  const handleUpload = async (e) =\u003e {\n    const selectedFile = e.target.files[0];\n    setFile(selectedFile);\n    \n    if (!selectedFile) return;\n    \n    // Create form data\n    const formData = new FormData();\n    formData.append('file', selectedFile);\n    \n    try {\n      setConverting(true);\n      const response = await fetch('/api/convert', {\n        method: 'POST',\n        body: formData,\n      });\n      \n      const data = await response.json();\n      setPreview(data.markdown);\n      setConverting(false);\n    } catch (error) {\n      console.error('Conversion failed:', error);\n      setConverting(false);\n    }\n  };\n  \n  return (\n    \u003cdiv className=\"w-full max-w-md mx-auto\"\u003e\n      \u003clabel className=\"flex flex-col items-center p-6 bg-white rounded-lg shadow-lg cursor-pointer border-2 border-dashed border-gray-300 hover:border-blue-500\"\u003e\n        \u003cUpload className=\"w-10 h-10 text-blue-500\" /\u003e\n        \u003cspan className=\"mt-2 text-sm text-gray-600\"\u003e\n          {file ? file.name : 'Upload your file'}\n        \u003c/span\u003e\n        \u003cinput\n          type=\"file\"\n          className=\"hidden\"\n          onChange={handleUpload}\n          accept=\".docx,.xlsx,.pptx,.pdf,.txt\"\n        /\u003e\n      \u003c/label\u003e\n      \n      {converting \u0026\u0026 \u003cp className=\"mt-4 text-center\"\u003eConverting...\u003c/p\u003e}\n      \n      {preview \u0026\u0026 (\n        \u003cdiv className=\"mt-6\"\u003e\n          \u003ch3 className=\"text-lg font-medium\"\u003ePreview:\u003c/h3\u003e\n          \u003cdiv className=\"mt-2 p-4 bg-gray-100 rounded overflow-auto\"\u003e\n            \u003cpre\u003e{preview}\u003c/pre\u003e\n          \u003c/div\u003e\n          \u003cbutton \n            className=\"mt-4 px-4 py-2 bg-blue-500 text-white rounded-lg\"\n            onClick={() =\u003e {\n              // Download logic\n              const blob = new Blob([preview], { type: 'text/markdown' });\n              const url = URL.createObjectURL(blob);\n              const a = document.createElement('a');\n              a.href = url;\n              a.download = `${file.name.split('.')[0]}.md`;\n              document.body.appendChild(a);\n              a.click();\n              document.body.removeChild(a);\n              URL.revokeObjectURL(url);\n            }}\n          \u003e\n            Download Markdown\n          \u003c/button\u003e\n        \u003c/div\u003e\n      )}\n    \u003c/div\u003e\n  );\n}\n```\n\n### FastAPI Backend Conversion Endpoint\n\n```python\nfrom fastapi import FastAPI, File, UploadFile, HTTPException\nfrom fastapi.middleware.cors import CORSMiddleware\nimport os\nimport tempfile\nfrom typing import Optional\nimport mammoth\nimport pandas as pd\nfrom pptx import Presentation\nimport fitz  # PyMuPDF\nfrom transformers import AutoModelForCausalLM, AutoTokenizer\n\napp = FastAPI()\n\n# Setup CORS\napp.add_middleware(\n    CORSMiddleware,\n    allow_origins=[\"http://localhost:3000\"],\n    allow_credentials=True,\n    allow_methods=[\"*\"],\n    allow_headers=[\"*\"],\n)\n\n# Load the GPT-NeoX model for advanced document understanding\nmodel_name = \"EleutherAI/gpt-neox-20b\"  # Using a smaller model for example purposes\ntokenizer = AutoTokenizer.from_pretrained(model_name)\nmodel = AutoModelForCausalLM.from_pretrained(model_name)\n\n@app.post(\"/api/convert\")\nasync def convert_file(file: UploadFile = File(...)):\n    # Get file extension\n    file_extension = os.path.splitext(file.filename)[1].lower()\n    \n    # Create temp file\n    with tempfile.NamedTemporaryFile(delete=False, suffix=file_extension) as temp_file:\n        temp_file.write(await file.read())\n        temp_file_path = temp_file.name\n    \n    try:\n        # Process based on file type\n        if file_extension == \".docx\":\n            markdown = convert_docx(temp_file_path)\n        elif file_extension == \".xlsx\":\n            markdown = convert_xlsx(temp_file_path)\n        elif file_extension == \".pptx\":\n            markdown = convert_pptx(temp_file_path)\n        elif file_extension == \".pdf\":\n            markdown = convert_pdf(temp_file_path)\n        elif file_extension == \".txt\":\n            with open(temp_file_path, \"r\", encoding=\"utf-8\") as f:\n                content = f.read()\n            markdown = enhance_with_gpt_neox(content)\n        else:\n            raise HTTPException(status_code=400, detail=\"Unsupported file format\")\n        \n        # Clean up\n        os.unlink(temp_file_path)\n        \n        return {\"markdown\": markdown}\n    \n    except Exception as e:\n        # Clean up on error\n        os.unlink(temp_file_path)\n        raise HTTPException(status_code=500, detail=str(e))\n\ndef convert_docx(file_path: str) -\u003e str:\n    \"\"\"Convert DOCX to Markdown using mammoth and enhance with GPT-NeoX\"\"\"\n    with open(file_path, \"rb\") as docx_file:\n        result = mammoth.convert_to_markdown(docx_file)\n        markdown = result.value\n    \n    # Enhance with GPT-NeoX\n    return enhance_with_gpt_neox(markdown)\n\ndef convert_xlsx(file_path: str) -\u003e str:\n    \"\"\"Convert Excel to Markdown tables\"\"\"\n    xl = pd.ExcelFile(file_path)\n    markdown = \"\"\n    \n    for sheet_name in xl.sheet_names:\n        df = pd.read_excel(file_path, sheet_name=sheet_name)\n        markdown += f\"## Sheet: {sheet_name}\\n\\n\"\n        markdown += df.to_markdown() + \"\\n\\n\"\n    \n    return markdown\n\ndef convert_pptx(file_path: str) -\u003e str:\n    \"\"\"Extract text from PowerPoint and convert to Markdown\"\"\"\n    prs = Presentation(file_path)\n    markdown = \"# Presentation\\n\\n\"\n    \n    for i, slide in enumerate(prs.slides):\n        markdown += f\"## Slide {i+1}\\n\\n\"\n        \n        for shape in slide.shapes:\n            if hasattr(shape, \"text\") and shape.text:\n                markdown += f\"{shape.text}\\n\\n\"\n    \n    # Enhance with GPT-NeoX\n    return enhance_with_gpt_neox(markdown)\n\ndef convert_pdf(file_path: str) -\u003e str:\n    \"\"\"Extract text from PDF and convert to Markdown\"\"\"\n    doc = fitz.open(file_path)\n    markdown = \"\"\n    \n    for page_num in range(len(doc)):\n        page = doc[page_num]\n        markdown += f\"## Page {page_num+1}\\n\\n\"\n        markdown += page.get_text() + \"\\n\\n\"\n    \n    # Enhance with GPT-NeoX\n    return enhance_with_gpt_neox(markdown)\n\ndef enhance_with_gpt_neox(text: str) -\u003e str:\n    \"\"\"Use GPT-NeoX to improve document structure and formatting\"\"\"\n    # Prepare prompt for the model\n    prompt = f\"Convert the following text to well-formatted Markdown:\\n\\n{text[:500]}...\"  # Limit input size\n    \n    # Generate improved markdown\n    inputs = tokenizer(prompt, return_tensors=\"pt\", truncation=True, max_length=1024)\n    outputs = model.generate(\n        inputs.input_ids,\n        max_length=1500,\n        num_return_sequences=1,\n        temperature=0.7,\n        top_p=0.9,\n    )\n    \n    # Extract the generated text and clean up\n    enhanced_text = tokenizer.decode(outputs[0], skip_special_tokens=True)\n    # Remove the prompt from the output\n    enhanced_text = enhanced_text.replace(prompt, \"\").strip()\n    \n    return enhanced_text\n```\n\n### Database Schema for File History\n\n```python\nfrom sqlalchemy import Column, Integer, String, DateTime, create_engine\nfrom sqlalchemy.ext.declarative import declarative_base\nfrom sqlalchemy.sql import func\n\nBase = declarative_base()\n\nclass ConversionHistory(Base):\n    __tablename__ = \"conversion_history\"\n    \n    id = Column(Integer, primary_key=True, index=True)\n    original_filename = Column(String, index=True)\n    file_type = Column(String)\n    md_filename = Column(String)\n    created_at = Column(DateTime(timezone=True), server_default=func.now())\n    user_id = Column(String, index=True, nullable=True)  # For logged-in users\n    file_size = Column(Integer)  # Size in bytes\n    \n    def __repr__(self):\n        return f\"\u003cConversionHistory(id={self.id}, filename={self.original_filename})\u003e\"\n\n# Database connection setup\nDATABASE_URL = \"sqlite:///./file_conversion.db\"\nengine = create_engine(DATABASE_URL)\nBase.metadata.create_all(bind=engine)\n```\n\n---\n\n## 📝 **Supported Formats**\n\n- **Microsoft Word (.docx)**\n- **Excel Spreadsheets (.xlsx)**\n- **PowerPoint Presentations (.pptx)**\n- **PDF Documents (.pdf)**\n- **Plain Text Files (.txt)**\n\n---\n\n## 🌟 **Why Use This Tool?**\n\n- **AI Enhancement**: GPT-NeoX ensures intelligent, high-quality conversions that preserve document meaning.\n- **Speed \u0026 Efficiency**: Convert large files in seconds, thanks to FastAPI's high-performance backend.\n- **Flexible**: Converts various file types and preserves structure, tables, and images.\n- **User-friendly**: Simple UI built with Next.js for a smooth, intuitive experience.\n- **Live Preview**: See how your converted file will look in Markdown format before downloading.\n- **Open-Source**: Customizable and extendable by the community.\n\n---\n\n## 💼 **Use Cases**\n\n- **Documentation**: Convert office documents to Markdown for use in GitHub repos, project wikis, or blog posts.\n- **Content Creation**: Bloggers, writers, and developers can convert Word and PDF files into Markdown for easy publishing.\n- **Project Management**: Convert presentations and reports into Markdown to be shared and tracked in repositories.\n- **AI Research**: Researchers can use our GPT-NeoX integration to study document comprehension and transformation.\n\n---\n\n## 🛠️ **Tech Stack**\n\n- **Frontend**: Next.js, React, Tailwind CSS\n- **Backend**: FastAPI, Python\n- **AI Engine**: GPT-NeoX\n- **Document Libraries**: mammoth (DOCX), pandas (XLSX), python-pptx (PPTX), PyMuPDF (PDF)\n- **Database**: SQLite (for file history)\n- **Deployment**: Vercel (Frontend), Render/Heroku (Backend)\n\n---\n\n## 🚀 **Get Started**\n\n### **For Users**\n\n1. Visit the [live site](frida.vercel.app) (link to be added).\n2. Upload your file and start converting!\n\n### **For Contributors**\n\n1. **Fork the repository** on [GitHub](#).\n2. Clone your forked repository:\n\n   ```bash\n   git clone https://github.com/Imadnajam/Frida.git\n   ```\n\n## Getting Started\n\nFirst, create and activate a virtual environment:\n\n```bash\npython -m venv venv\nsource venv/bin/activate\n```\n\nThen, install the dependencies:\n\n```bash\nnpm install\n# or\nyarn\n# or\npnpm install\n```\n\nThen, run the development server(python dependencies will be installed automatically here):\n\n```bash\nnpm run dev\n# or\nyarn dev\n# or\npnpm dev\n```\n\nOpen [http://localhost:3000](http://localhost:3000) with your browser to see the result.\n\nThe FastApi server will be running on [http://127.0.0.1:8000](http://127.0.0.1:8000) – feel free to change the port in `package.json` (you'll also need to update it in `next.config.js`).\n\n5. Make your changes and submit a **pull request**!\n\n---\n\n## 🤝 **How to Contribute**\n\nWe welcome contributions of all kinds! Here are some ways you can help:\n\n### **For Developers**\n\n- **Add New Features**: Implement support for additional file formats or enhance the conversion logic.\n- **Improve AI Integration**: Help optimize our GPT-NeoX implementation or add other AI capabilities.\n- **Improve Performance**: Optimize the backend for faster conversions.\n- **Fix Bugs**: Help us squash those pesky bugs!\n\n### **For Designers**\n\n- **UI/UX Improvements**: Make the interface more intuitive and visually appealing.\n- **Branding**: Help us design a logo or improve the overall branding.\n\n### **For Writers**\n\n- **Documentation**: Improve the README, write tutorials, or create user guides.\n- **Translation**: Help translate the tool into multiple languages.\n\n---\n\n## 📜 **Code of Conduct**\n\nWe follow a **Contributor Covenant Code of Conduct**. Please read it [here](#) before contributing.\n\n---\n\n## 🙏 **Acknowledgments**\n\nA big shoutout to all our contributors, the open-source community, and the GPT-NeoX team for making this project possible!\n\n---\n\n## 📄 **License**\n\nThis project is licensed under the **MIT License**. See the [LICENSE](#) file for details.\n\n---\n\n## 💬 **Join the Community**\n\nHave questions or ideas? Join our [Discord server](#) or open an issue on [GitHub](#). Let's build something amazing together!\n\n---\n\n**Happy Converting!** 🎉\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimadnajam%2Ffrida","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fimadnajam%2Ffrida","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fimadnajam%2Ffrida/lists"}