{"id":24639745,"url":"https://github.com/jpleorx/detectron2-publaynet","last_synced_at":"2025-05-10T04:36:31.803Z","repository":{"id":65307027,"uuid":"406546463","full_name":"JPLeoRX/detectron2-publaynet","owner":"JPLeoRX","description":"Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset","archived":false,"fork":false,"pushed_at":"2023-04-16T19:14:05.000Z","size":8137,"stargazers_count":48,"open_issues_count":0,"forks_count":7,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-31T21:42:31.401Z","etag":null,"topics":["artificial-intelligence","computer-vision","deep-learning","detectron2","document-analysis","document-classification","document-layout","document-layout-analysis","faster-rcnn","instance-segmentation","layout-analysis","machine-learning","neural-network","neural-networks","object-detection","publaynet","python","python3","pytorch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/JPLeoRX.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-09-14T23:07:56.000Z","updated_at":"2025-02-05T02:58:53.000Z","dependencies_parsed_at":"2023-01-16T15:15:36.351Z","dependency_job_id":null,"html_url":"https://github.com/JPLeoRX/detectron2-publaynet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JPLeoRX%2Fdetectron2-publaynet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JPLeoRX%2Fdetectron2-publaynet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JPLeoRX%2Fdetectron2-publaynet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/JPLeoRX%2Fdetectron2-publaynet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/JPLeoRX","download_url":"https://codeload.github.com/JPLeoRX/detectron2-publaynet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":253187110,"owners_count":21868065,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","computer-vision","deep-learning","detectron2","document-analysis","document-classification","document-layout","document-layout-analysis","faster-rcnn","instance-segmentation","layout-analysis","machine-learning","neural-network","neural-networks","object-detection","publaynet","python","python3","pytorch"],"created_at":"2025-01-25T11:12:38.014Z","updated_at":"2025-05-09T03:42:04.729Z","avatar_url":"https://github.com/JPLeoRX.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Warning!!!\nThis repository is no longer maintained, as it was moved to another GitHub. Please see [CaseDrive](https://github.com/CaseDrive/publaynet-models) GitHub profile for any updates related to this research. \n\n---\n\nHere I present [Detectron2](https://github.com/facebookresearch/detectron2) object detection models trained on [PubLayNet](https://developer.ibm.com/exchanges/data/all/publaynet/) dataset, ranging from 81.139 to 86.690 in validation AP scores (possibly even better results can be achieved with longer training times).\n\n![Preview](https://github.com/JPLeoRX/detectron2-publaynet/blob/master/prediction_example/preview.png?raw=True)\n\n# Dataset overview\n\nPubLayNet is a very large (over 300k images \u0026 over 90 GB in weight) dataset for document layout analysis. It contains images of research papers and articles and annotations for various elements in a page such as “text”, “list”, “figure” etc in these research paper images. The dataset was obtained by automatically matching the XML representations and the content of over 1 million PDF articles that are publicly available on PubMed Central. Originally provided by IBM [here](https://developer.ibm.com/exchanges/data/all/publaynet).\n\n| Data Description | Zipped File Name | Purpose |\n| ------------- | ------------- | ------------- |\n| Train 0 Dataset, 13 GB | [train-0.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/train-0.tar.gz) | Training | \n| Train 1 Dataset, 13 GB | [train-1.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/train-1.tar.gz) | Training |\n| Train 2 Dataset, 13 GB | [train-2.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/train-2.tar.gz) | Training |\n| Train 3 Dataset, 13 GB | [train-3.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/train-3.tar.gz) | Training |\n| Train 4 Dataset, 13 GB | [train-4.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/train-4.tar.gz) | Training |\n| Train 5 Dataset, 13 GB | [train-5.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/train-5.tar.gz) | Training |\n| Train 6 Dataset, 13 GB | [train-6.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/train-6.tar.gz) | Training |\n| Evaluation Dataset, 3 GB | [val.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/val.tar.gz) | Evaluation |\n| Labels Dataset, 314 MB | [labels.tar.gz](https://dax-cdn.cdn.appdomain.cloud/dax-publaynet/1.0.0/labels.tar.gz) | |\n\n# Models overview\n\nModels were trained on `train` part of the dataset, consisting of 335 703 images, and evaluated on `val` part of the dataset with 11 245 images. All the AP scores were obtained on the `val` dataset. Inference times were taken from official Detectron model zoo descriptions.\n\nTo provide you with some options, and to experiment with different models I have trained 4 versions for this dataset. 2 of them are from basic object detection, and 2 contain segmentation masks. You can choose which better suits your task by comparing accuracy scores and inference times of these models.\n\n### Models from [COCO Object Detection](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md#coco-object-detection-baselines):\n\n| Name  | Inference time (s/im) | Box AP | Folder name | Model zoo config | Trained model |\n| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |\n| R50-FPN  | 0.038 | 81.139 | faster_rcnn_R_50_FPN_3x | [Click me!](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml) | [Click me!](https://keybase.pub/jpleorx/detectron2-publaynet/faster_rcnn_R_50_FPN_3x) |\n| R101-FPN  | 0.051 | 84.295 | faster_rcnn_R_101_FPN_3x | [Click me!](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-Detection/faster_rcnn_R_101_FPN_3x.yaml) | [Click me!](https://keybase.pub/jpleorx/detectron2-publaynet/faster_rcnn_R_101_FPN_3x) |\n\n\n### Models from [COCO Instance Segmentation with Mask R-CNN](https://github.com/facebookresearch/detectron2/blob/main/MODEL_ZOO.md#coco-instance-segmentation-baselines-with-mask-r-cnn):\n\n| Name  | Inference time (s/im) | Box AP | Mask AP | Folder name | Model zoo config | Trained model |\n| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- |\n| R50-FPN  | 0.043 | 83.666 | 82.268 | mask_rcnn_R_50_FPN_3x |[Click me!](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml) | [Click me!](https://keybase.pub/jpleorx/detectron2-publaynet/mask_rcnn_R_50_FPN_3x) |\n| R101-FPN  | 0.056 | 86.690 | 82.105 | mask_rcnn_R_101_FPN_3x | [Click me!](https://github.com/facebookresearch/detectron2/blob/main/configs/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml) | [Click me!](https://keybase.pub/jpleorx/detectron2-publaynet/mask_rcnn_R_101_FPN_3x) |\n\n\n### Model folder\n\nEach model's directory in git will contain these files:\n\n| File name  | Description |\n| ------------- | ------------- |\n| download.txt  | A text file that contains a link from where you can download the trained model |\n| train.py  | Training script, that trains the model found in `training_output` sub-folder for a given number of epochs |\n| test.py  | Testing script, that runs the model found in `training_output` in inference mode for 6 randomly preselected images (3 from training, 3 from evaluation datasets) and displays all predictions on each image in a pop-up window |\n| eval.py  | Testing script, that runs the model found in `training_output` in inference mode for all images found in dataset's `val` folder and `val.json`. Evaluation is performed through Detectron's `COCOEvaluator` |\n| evaluation.txt | As `eval.py` takes some time to execute (from 10 to 20 minutes) I've recorded last output of evaluation in this separate text file |\n\nGenerally all trained models `.pth` files should be available through [here](https://keybase.pub/jpleorx/detectron2-publaynet/) or [here](https://drive.google.com/drive/folders/11BeTAb8BlS9DiEb_ndoAh1fKyq16i6FC?usp=sharing), but if not - refer to individual `download.txt` links\n\n# Using the models\n\nIn `usage_example` module I've provided a sample script `example.py` that builds Detectron objects (config and predictor) for my trained model and runs inference on it, with a sample interpretation of Detectron's outputs. Please note that `test.py` and `eval.py` in model folders use common functions shared between models in this project. While `example.py` provides a completely clean setup, assuming you only want to download the models and use them for inference directly. It requires only Pillow, OpenCV, numpy and Detectron2 to run.\n\n# Hardware used\n\nBelieve it or not but training of these models was performed on a regular consumer-grade personal gaming PC with one NVIDIA 2070 SUPER (8GB) GPU, Intel Core i5-10600K CPU and 32 GB RAM.\n\n# Links\n\nIn case you’d like to check my other work or contact me:\n* [Personal website](https://tekleo.net/)\n* [GitHub](https://github.com/jpleorx)\n* [PyPI](https://pypi.org/user/JPLeoRX/)\n* [DockerHub](https://hub.docker.com/u/jpleorx)\n* [Articles on Medium](https://medium.com/@leo.ertuna)\n* [LinkedIn (feel free to connect)](https://www.linkedin.com/in/leo-ertuna-14b539187/) ","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjpleorx%2Fdetectron2-publaynet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjpleorx%2Fdetectron2-publaynet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjpleorx%2Fdetectron2-publaynet/lists"}