{"id":23094876,"url":"https://github.com/anant2003jain/textextractify","last_synced_at":"2025-08-21T10:18:19.067Z","repository":{"id":264337109,"uuid":"864222943","full_name":"Anant2003jain/TextExtractify","owner":"Anant2003jain","description":"TextExtractify is an AI-powered tool that extracts text from images and PDFs using both Azure OCR and EasyOCR. It offers features like multi-image upload, text entity extraction, and .docx export for premium users. Designed to streamline document processing with fast, accurate text extraction.","archived":false,"fork":false,"pushed_at":"2024-11-23T14:12:59.000Z","size":89497,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-03T19:26:00.630Z","etag":null,"topics":["azure","login-system","ocr","ocr-python","pillow","python3","streamlit","text-extraction"],"latest_commit_sha":null,"homepage":"https://text-extractify.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Anant2003jain.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-27T18:13:11.000Z","updated_at":"2024-11-26T19:32:18.000Z","dependencies_parsed_at":"2024-11-23T14:45:06.535Z","dependency_job_id":null,"html_url":"https://github.com/Anant2003jain/TextExtractify","commit_stats":null,"previous_names":["anant2003jain/textextractify"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Anant2003jain/TextExtractify","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anant2003jain%2FTextExtractify","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anant2003jain%2FTextExtractify/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anant2003jain%2FTextExtractify/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anant2003jain%2FTextExtractify/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Anant2003jain","download_url":"https://codeload.github.com/Anant2003jain/TextExtractify/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Anant2003jain%2FTextExtractify/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271462097,"owners_count":24763860,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["azure","login-system","ocr","ocr-python","pillow","python3","streamlit","text-extraction"],"created_at":"2024-12-16T22:18:32.533Z","updated_at":"2025-08-21T10:18:19.047Z","avatar_url":"https://github.com/Anant2003jain.png","language":"Python","readme":"# **TextExtractify 📄🔍**\nTextExtractify is a cutting-edge application designed to extract text from images and PDFs using powerful OCR technologies. With Azure OCR and EasyOCR at its core, TextExtractify offers a streamlined experience for users across different roles—whether you’re a free user or a premium subscriber looking for advanced features like PDF conversion and text extraction to .docx.\n\n## Demo 🎥\nhttps://github.com/user-attachments/assets/b3432af0-8f2e-43b5-814d-fda2723a8f46\n## 🚀 Features\n### 🔑 Free User Features:\n* Single Image Upload: Upload an image to extract text using Azure OCR or EasyOCR.\n* Text Entity Extraction: Detect entities within the extracted text for structured information.\n* Basic PDF Processing: Convert and extract text from PDFs.\n### 💼 Premium User Features:\n* Batch Image Upload: Upload and process multiple images in one go.\n* PDF to Text Conversion: Extract text from PDFs and convert it to .docx format.\n* Download .docx Files: Each extracted text from images and PDFs can be downloaded as a .docx document.\n* Optimized for Performance: Faster extraction time and priority access to new features.\n### 🔥 Coming Soon:\n* Multiple Language Support: Translate extracted text into various languages.\n* Additional File Format Support: Expand beyond PDFs to Word, Excel, and other document types.\n## 💡 How It Works\n1. Upload Files: Users can upload image or PDF files from their system.\n2. Choose OCR Engine: Select either Azure OCR or EasyOCR for processing.\n3. Extract \u0026 Display: The app extracts text and displays it on the result page.\n4. Download Options: Free users can copy or view the text, while premium users can download .docx files or process multiple images in one go.\n## 🎨 User Interface\nTextExtractify has a modern and responsive UI designed for a seamless user experience. The interface adapts to different devices and ensures smooth navigation for both free and premium users.\n\n### Screenshots\n\n**1. Login Page:**\n\n   ![Login Page](https://github.com/user-attachments/assets/b7b2a81b-4c92-4cb2-b071-823eb4bb9172)\n\n**2. Signup Page:**\n  \n  ![SignUp Page](https://github.com/user-attachments/assets/44fc4f2d-688c-47bf-b1d6-538a82de852e)\n\n**3. Home Page:**\n\n  ![Home Page](https://github.com/user-attachments/assets/4279c830-7423-4570-834c-a180115ee1fa)\n\n**4. Free User Features**\n\n![Free Features](https://github.com/user-attachments/assets/65f7f004-3387-44d0-ab51-6103729f754f)\n\n**5. Premium User PDF View:**\n  \n  ![Premium PDF](https://github.com/user-attachments/assets/24963b86-8d75-40fa-aed8-f1f7e338c942)\n\n\n## 🛠️ Tech Stack\n### Backend:\n* Python: Core language for all processing.\n* Azure OCR / EasyOCR: OCR engines for text extraction.\n* Streamlit: Web framework for creating interactive UIs.\n* Pillow: For image handling.\n### Frontend:\n* HTML/CSS: For custom designs and styling.\n### Database:\n* Json: Used for managing user authentication and subscription data.\n## 🧑‍💻 Installation \u0026 Setup\n### Requirements:\n* Python 3.7+\n* Azure OCR API Key (for Azure OCR functionality)\n### Instructions:\n#### 1. Clone the repository:\n\n* git clone https://github.com/Anant2003jain/TextExtractify.git\n\n* cd TextExtractify\n\n#### 2. Install the required packages:\n\n* pip install -r requirements.txt\n  \n#### 3. Set up environment variables for Azure OCR:\n\n* export AZURE_OCR_KEY=your_key_here\n* export AZURE_OCR_ENDPOINT=your_endpoint_here\n#### 4. Run the application:\n\n* python -m streamlit run textex_app.py\n\n* Visit http://localhost:8501 in your browser.\n\n## 🔐 User Roles\n* Free Users: Access basic OCR and text extraction features.\n* Premium Users: Unlock advanced functionalities like batch image processing and downloadable .docx files.\n## 🎯 Future Roadmap\n* AI-powered Translations: Expanding language detection and translation capabilities.\n* Improved Performance: Reducing processing time for large PDFs and image batches.\n## 📝 License\n* This project is licensed under the MIT License - see the [LICENSE](https://github.com/Anant2003jain/TextExtractify/blob/main/LICENSE) file for details.\n\n## 🤝 Contributing\nWe welcome contributions from the community! To contribute:\n\n1. Fork the repo.\n2. Create your feature branch: git checkout -b feature/your-feature.\n3. Commit your changes: git commit -m 'Add feature'.\n4. Push to the branch: git push origin feature/your-feature.\n5. Open a pull request.\n## 🌟 Acknowledgements\n* Azure OCR for their comprehensive OCR API.\n* EasyOCR for providing a flexible open-source OCR solution.\n* Streamlit for making app deployment seamless.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanant2003jain%2Ftextextractify","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fanant2003jain%2Ftextextractify","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fanant2003jain%2Ftextextractify/lists"}