{"id":30462472,"url":"https://github.com/shahin-ro/table-detection","last_synced_at":"2026-04-08T18:02:33.922Z","repository":{"id":310308035,"uuid":"1039405540","full_name":"shahin-ro/Table-Detection","owner":"shahin-ro","description":"Python tool for table extraction \u0026 Persian OCR. Uses OpenCV for table detection, Tesseract for text extraction, \u0026 Pandas for data output. Visualizes cells \u0026 text. Ideal for Persian documents! 📄✨ ","archived":false,"fork":false,"pushed_at":"2025-08-17T07:13:05.000Z","size":272,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-08-17T08:33:26.601Z","etag":null,"topics":["colab","computer-vision","data-extraction","data-visualization","document-processing","image-analysis","image-processing","machine-learning","matplotlib","numpy","ocr","opencv","pandas","persian-ocr","persian-text","python","table-detection","table-extraction","tesseract","text-recognition"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/shahin-ro.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-17T06:38:12.000Z","updated_at":"2025-08-17T07:25:11.000Z","dependencies_parsed_at":"2025-08-17T08:43:36.516Z","dependency_job_id":null,"html_url":"https://github.com/shahin-ro/Table-Detection","commit_stats":null,"previous_names":["shahin-ro/table-detection"],"tags_count":null,"template":false,"template_full_name":null,"purl":"pkg:github/shahin-ro/Table-Detection","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shahin-ro%2FTable-Detection","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shahin-ro%2FTable-Detection/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shahin-ro%2FTable-Detection/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shahin-ro%2FTable-Detection/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/shahin-ro","download_url":"https://codeload.github.com/shahin-ro/Table-Detection/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/shahin-ro%2FTable-Detection/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31567227,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-08T14:31:17.711Z","status":"ssl_error","status_checked_at":"2026-04-08T14:31:17.202Z","response_time":54,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["colab","computer-vision","data-extraction","data-visualization","document-processing","image-analysis","image-processing","machine-learning","matplotlib","numpy","ocr","opencv","pandas","persian-ocr","persian-text","python","table-detection","table-extraction","tesseract","text-recognition"],"created_at":"2025-08-23T23:01:31.916Z","updated_at":"2026-04-08T18:02:33.913Z","avatar_url":"https://github.com/shahin-ro.png","language":"Jupyter Notebook","readme":"# Table-Detection\n\nTable Extraction and OCR for Persian Documents 📄✨\n\nThis project provides a Python-based solution for detecting table\nstructures in images and extracting Persian text using Optical Character\nRecognition (OCR). It uses OpenCV for table detection and Tesseract OCR\nfor text extraction, with proper rendering of Persian text. 🚀\n\n## Features 🌟\n\n-   **Table Detection** 📊: Identifies table cells in images using\n    advanced image processing with OpenCV.\n-   **OCR Support** 🔍: Extracts Persian text from table cells using\n    Tesseract OCR with Persian language support.\n-   **Data Structuring** 📈: Organizes extracted text into a Pandas\n    DataFrame for easy analysis.\n-   **Visualization** 🎨: Displays detected table cells, line masks, and\n    intersection points for verification.\n\n## Requirements 🛠️\n\nTo run this project, you need the following dependencies: - Python 3.7+\n🐍 - OpenCV (`cv2`) - NumPy - Matplotlib - Pandas - Pytesseract (for\nTesseract OCR) - Tesseract-OCR with Persian language support\n(`tesseract-ocr-fas`)\n\nInstall the dependencies using:\n\n``` bash\npip install opencv-python numpy matplotlib pandas pytesseract\n```\n\nFor Tesseract OCR:\n\n``` bash\napt-get install -y tesseract-ocr tesseract-ocr-fas\n```\n\n## Usage 🚀\n\n1.  **Clone the Repository** 📂:\n\n    ``` bash\n    git clone https://github.com/shahin-ro/table-extraction-ocr.git\n    cd table-extraction-ocr\n    ```\n\n2.  **Prepare an Image** 🖼️:\n\n    -   Ensure you have an image containing a table with Persian text\n        (e.g., a scanned document or screenshot).\n    -   Place the image in the project directory or provide the path to\n        the script.\n\n3.  **Run the Script** ▶️:\n\n    -   The script (`jadval.py`) processes the image, detects table\n        cells, extracts text, and visualizes the results.\n\n    -   Run the script:\n\n        ``` bash\n        python jadval.py\n        ```\n\n4.  **Output** 📜:\n\n    -   The script outputs:\n        -   A count of detected table cells ✅.\n        -   Extracted text for each cell with coordinates 📍.\n        -   A Pandas DataFrame representing the table structure 🗃️.\n        -   Visualizations showing detected cells, line masks, and\n            intersection points 🖼️.\n\n## How It Works 🧠\n\n1.  **Table Detection** 📏:\n    -   Uses OpenCV to preprocess the image (grayscale, adaptive\n        thresholding, morphological operations).\n    -   Detects horizontal and vertical lines to identify table\n        boundaries.\n    -   Clusters line intersections to determine cell coordinates.\n2.  **Text Extraction** 📝:\n    -   Crops each detected cell and processes it with Tesseract OCR\n        (`lang='fas'`) for Persian text extraction.\n    -   Stores text and coordinates for each cell.\n3.  **Data Structuring** 📚:\n    -   Maps extracted text to a grid based on cell positions.\n    -   Creates a Pandas DataFrame to represent the table structure.\n4.  **Visualization** 🖌️:\n    -   Displays three plots:\n        -   **Detected Cells** 🟢: Original image with green rectangles\n            around table cells.\n        -   **Line Mask** ⚪: Inverted mask showing detected horizontal\n            and vertical lines.\n        -   **Joints** 🔲: Intersection points of table lines.\n\n## Example 📋\n\n``` python\n# Example output for a table with 6 cells\n✅ Detected 6 cells.\nمتن سلول 1: نام\nمختصات: (50, 30, 150, 80)\n---\nمتن سلول 2: سن\nمختصات: (150, 30, 250, 80)\n---\n...\nجدول استخراج شده (متن داخل سلول‌ها):\n     0    1    2\n0  نام  سن  شغل\n1  علی  30  مهندس\n```\n\n## Notes 📌\n\n-   **Tesseract OCR** 🔍: Requires `tesseract-ocr-fas` for Persian\n    language support.\n-   **Colab Compatibility** ☁️: The script is designed to work in Google\n    Colab, with file upload support and Tesseract installation commands.\n-   **Image Quality** 🖼️: OCR accuracy depends on clear table lines and\n    readable text.\n\n## Limitations ⚠️\n\n-   The table detection algorithm assumes well-defined table lines.\n-   OCR accuracy depends on image quality and text clarity.\n-   Persian text rendering in visualizations may require additional font\n    support for non-Colab environments.\n\n## Contributing 🤝\n\nContributions are welcome! Please submit a pull request or open an issue\nfor bug reports, feature requests, or improvements. 🙌\n\n## License 📜\n\nThis project is licensed under the MIT License. See the\n[LICENSE](LICENSE) file for details.\n\n## Acknowledgments 💖\n\n-   [OpenCV](https://opencv.org/) for image processing 🖼️.\n-   [Tesseract OCR](https://github.com/tesseract-ocr/tesseract) for\n    Persian text extraction 🔍.\n-   [Pandas](https://pandas.pydata.org/) for data structuring 📚.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshahin-ro%2Ftable-detection","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fshahin-ro%2Ftable-detection","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fshahin-ro%2Ftable-detection/lists"}