{"id":15908432,"url":"https://github.com/johnnv1/ccagt-utils","last_synced_at":"2025-04-07T23:48:10.128Z","repository":{"id":36995321,"uuid":"456516174","full_name":"johnnv1/CCAgT-utils","owner":"johnnv1","description":"Some code to work with CCAgT dataset. Annotations format conversion, mask generation, plotting samples, etc.","archived":false,"fork":false,"pushed_at":"2024-05-05T14:27:20.000Z","size":17564,"stargazers_count":2,"open_issues_count":11,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-05-05T15:33:31.054Z","etag":null,"topics":["computer-vision","computer-vision-datasets","dataset","object-detection","panoptic-segmentation","segmentation"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/johnnv1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-07T13:26:14.000Z","updated_at":"2024-05-29T23:43:20.478Z","dependencies_parsed_at":"2024-01-02T23:59:56.487Z","dependency_job_id":"9b780c20-78e4-4d82-92ac-c02246ba2d90","html_url":"https://github.com/johnnv1/CCAgT-utils","commit_stats":{"total_commits":195,"total_committers":2,"mean_commits":97.5,"dds":"0.14871794871794874","last_synced_commit":"8ab32e459b311cbac482a27d46e383725f9793af"},"previous_names":["johnnv1/ccagt_dataset_utils"],"tags_count":12,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnnv1%2FCCAgT-utils","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnnv1%2FCCAgT-utils/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnnv1%2FCCAgT-utils/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnnv1%2FCCAgT-utils/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/johnnv1","download_url":"https://codeload.github.com/johnnv1/CCAgT-utils/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247749968,"owners_count":20989713,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","computer-vision-datasets","dataset","object-detection","panoptic-segmentation","segmentation"],"created_at":"2024-10-06T14:21:29.294Z","updated_at":"2025-04-07T23:48:10.106Z","avatar_url":"https://github.com/johnnv1.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![PyPI](https://img.shields.io/pypi/v/CCAgT-utils?color=blue\u0026label=pypi%20version)](https://pypi.org/project/CCAgT-utils/)\n[![Code coverage Status](https://codecov.io/gh/johnnv1/CCAgT-utils/branch/main/graph/badge.svg?token=HB8P4BKTZ7)](https://codecov.io/gh/johnnv1/CCAgT-utils)\n[![pre-commit.ci status](https://results.pre-commit.ci/badge/github/johnnv1/CCAgT-utils/main.svg)](https://results.pre-commit.ci/latest/github/johnnv1/CCAgT-utils/main)\n[![main status](https://github.com/johnnv1/CCAgT-utils/actions/workflows/main.yml/badge.svg)](https://github.com/johnnv1/CCAgT-utils/actions/workflows/main.yml)\n[![DOI](https://zenodo.org/badge/456516174.svg)](https://zenodo.org/badge/latestdoi/456516174)\n\n# CCAgT-utils\n\nCCAgT-utils it's a package to work with the **CCAgT dataset**:\n`Images of Cervical Cells with AgNOR Stain Technique`. The\npackage will provide some customized codes for annotations\nformat conversion, mask generation, plotting samples, etc.\n\n\n## Package context\nI have been working with images of cervical cells stained\nwith AgNOR since January/2020 for my master thesis. The\nresults of my thesis you can find at [CCAgT-benchmarks](https://github.com/johnnv1/CCAgT-benchmarks).\nIn general, the objective of the thesis it's automatize the\nprincipal part to help at the diagnostic/prognostic of these\n cells. Therefore, I also have developed some codes to\n preprocess or just to help in the use of this dataset.\n\n\nThese codes to work with the dataset will be available at this\npackage.\n\n## Contents\n\n1. [Links to download the dataset](#links-to-download-the-ccagt-dataset)\n2. [What does this dataset look like?](#what-does-this-dataset-look-like)\n3. [Examples of use of this package](#examples-of-use)\n\n\n# Links to download the CCAgT dataset\n\n1. Version 1.1 - [drive](https://drive.google.com/drive/folders/1TBpYCv6S1ydASLauSzcsvO7Wc5O-WUw0?usp=sharing) or [UFSC repository](https://arquivos.ufsc.br/d/373be2177a33426a9e6c/)\n2. Version 2 - [Mendeley data](https://doi.org/10.17632/wg4bpm33hj.2)\n\n## Using HF datasets\n\nThe [Hugging Face datasets API](https://github.com/huggingface/datasets) provides a easily way to download and use various datasets. One of these datasets is CCAgT dataset, which can be found at [huggingface.co/datasets/lapix/CCAgT/](https://huggingface.co/datasets/lapix/CCAgT/).\n\n```python\nfrom datasets import load_dataset\n\ndataset = load_dataset(\"lapix/CCAgT\")\n\n# ...\n```\n\n# What does this dataset look like?\nExplanations and examples around the `\u003e=2.0` version of the\ndataset. If you want to use older versions of the dataset,\nyou will need to make some modifications to the data directory\norganizations, or things like that.\n\n\nThis is a computer vision dataset, created by some collaborators\nfrom different departments at [Universidade Federal de Santa Catarina (UFSC)](https://en.ufsc.br/).\nThe dataset contains images annotated/labelled for semantic\nsegmentation and others. The annotation tool is [labelbox](https://labelbox.com/).\nIn the data repositories will the images, masks (semantic\nsegmentation) and COCO annotations for object detection. The\ncodes to convert annotations from labelbox format to others\nwill be in this package.\n\nEach slide can have some differences in the stain coloration,\nat figure 1 can be seen an image created from different images\nof different slides.\n\n![Image sample created from samples from different slides](./data/static_images/Figure1.jpg)\n\nIn directory [./data/samples/images/](./data/samples/images/)\ncan be seen the original images of each tile from different\nslides/patients. The dataset present a wide variety of colors,\ntexture, nuclei format, and others for the cells nuclei, this\nvariety depends on different factors as: Type of lesion, stain\nprocess, sample acquisition, sensor/microscopy setup for image\nacquisition and others.\n\nThe dataset at version `1.x` has 3 categories annotated, and at\nversion `2.x` will have 7 categories. But, the principal\nobjective to help at diagnostic/prognostic of these samples is\nto detect/identify/measure the Nucleolus Organizer Regions\n(NORs) inside each nucleus. The NORs (the black dots/parts\ninside the nuclei) were labeled as two different categories:\nSatellite and clusters.\n\nAt figure 2, has an example with two highlighted nuclei. The\nnucleus at left (black highlighted) it's a nucleus with three\nclusters. The nucleus at right side (gray highlighted) it's a\nnucleus with one cluster (the black dot at the top of the\nnuclei) and two satellites (the other two black dots).\n\n![Image from a tile highlighting two nuclei](./data/static_images/Figure2.jpg)\n\nFor more explanations about the dataset, see the dataset pages,\nor their papers.\n\n\n# Examples of use\n\n## Converter\nTo use the dataset along different approaches, different\n“formats” are required. This module will provide the correct\ntransformation between the format provided by the annotation\ntool (LabelBox) and the current state-of-the-art formats (e.g.\nCOCO). It will also make it possible to work with the data in\nDataFrame format, which I consider to be the easiest way\nperform the manipulation of these annotations. The dataframe\nformat is not recommended or built for use in any specific deep\nlearning library or approach. It was built only for\nmanipulation of the dataset, to facilitate conversions between\ndifferent formats, perform analysis, and internal use of this\nlibrary.\n\n```console\n$ CCAgT-converter -h  # to show help message\n```\n\n### Labelbox to COCO format\n```console\n$ CCAgT-converter labelbox_to_COCO \\\n                    -t OD \\\n                    -r ./data/samples/sanitized_sample_labelbox.json\\\n                    -a ./data/samples/CCAgT_dataset_metadata.json\\\n                    -o ./data/samples/out/CCAgT_COCO_OD.json\n```\n\n### Labelbox to CCAgT format\n```console\n$ CCAgT-converter labelbox_to_CCAgT \\\n                    -r ./data/samples/sanitized_sample_labelbox.json \\\n                    -a ./data/samples/CCAgT_dataset_metadata.json \\\n                    -o ./data/samples/out/CCAgT.parquet.gzip\\\n                    -p True\n```\n### CCAgT to masks (categorical masks for semantic segmentation)\n```console\n$ CCAgT-converter generate_masks \\\n                    -l ./data/samples/out/CCAgT.parquet.gzip\\\n                    -o ./data/samples/masks/semantic_segmentation/\\\n                    --split-by-slide\n```\n\n### CCAgT to Panoptic segmentation COCO\n```console\n$ CCAgT-converter CCAgT_to_COCO \\\n                    -t PS \\\n                    -l ./data/samples/out/CCAgT.parquet.gzip \\\n                    -o ./data/samples/masks/panoptic_segmentation \\\n                    --out-file ./data/samples/out/CCAgT_COCO_PS.json\n```\n\n\n## Create subdataset's\nModule responsible to create personalized versions of the\ndataset with the desired modifications. Things that can\ndone: slice the images into smaller parts, select just images\nthat have specific categories, create images with a specific\ncategory. This tool, will copy, or generate the images, and\nalso generate a new CCAgT annotations file, based on the\ndesired options!\n\nFirst, if desired, the tool will remove images that do not have\nthe desired categories:\n\u003e- `--remove-images-without` with the categories ids, will\nremove all images that don't have the categories passed as\nparameter.\n\u003e- `--remove-annotations-different` with the categories ids,\nwill remove all annotations that have different categories\nthan the parameter.\n\nSecond, the tool allows selecting what will be the format\nof the images:\n\u003e- `--slice-images` to slice the images into sub parts;\n\u003e- `--extract` to create images with a unique category\n(centralized into the new image);\n\u003e- `--labels` (to be used with `--extract`) path for the\nCCAgT file with the labels;\n\u003e- `--paddings` (to be used with `--extract`) in percent\n(float values) or pixels (integer values) select, the size of\npaddings to apply;\n\u003e- Without any parameter, will just copy the original dataset\n\nThird, and last, can (re)check if all images has the desired\ncategories, and delete with don't have.\n\u003e- `--check-if-all-have-at-least-one-of` to verify if the\nimage have at least one of the categories IDs passed as\nparameter;\n\u003e- `--delete` if desired, delete images that don't have at\nleast one of the categories.\n\u003e- `--generate-masks` if desired, will generate the masks\nbased on the new CCAgT annotations file.\n\n**Check all option with: ** `CCAgT-utils create-subdataset -h`\n\nExample creates a subdataset with images sliced into 2x2\n(1 image (1600x1200) -\u003e 4 images (800x600)), and remove images\ndo not have any information (images with just background).\n\n```console\n# Create a directory with the same structure of the dataset\n$ mkdir /tmp/example_dataset\n$ mkdir /tmp/example_dataset/images/\n$ mkdir /tmp/example_dataset/masks/\n$ cp -r ./data/samples/images/ /tmp/example_dataset/images/\n$ cp -r ./data/samples/masks/semantic_segmentation/ /tmp/example_dataset/masks/\n\n# Create the subdataset\n$ CCAgT-utils create-subdataset \\\n                    -name dataset_sliced_into2x2 \\\n                    --original /tmp/example_dataset/ \\\n                    --output /tmp/ \\\n                    --remove-images-without 1 2 3 4 5 6 7\\\n                    --slice-images 2 2\n```\n\nWith this tool, various datasets (based on the original\ndataset) can be created, be creative 😊 at yours experiments.\n\n## visualization\nModule responsible for assisting in the display or creation\nof figures from the dataset.\n\n```console\nusage: CCAgT-visualization -h  # to show help message\n```\n\n### Show images with boxes\n```console\n$ CCAgT-visualization show \\\n                        -l ./data/samples/out/CCAgT.parquet.gzip\\\n                        -a ./data/samples/CCAgT_dataset_metadata.json\\\n                        -d ./data/samples/images/\n```\n\n### Show images and mask\n```console\n$ CCAgT-visualization show \\\n                        -t image-and-mask\\\n                        -l ./data/samples/out/CCAgT.parquet.gzip\\\n                        -a ./data/samples/CCAgT_dataset_metadata.json\\\n                        -d ./data/samples/images/\\\n                        -m ./data/samples/masks/semantic_segmentation/\n```\n\n### Show image with boxes and mask\n```console\n$ CCAgT-visualization show \\\n                        -t image-with-boxes-and-mask\\\n                        -l ./data/samples/out/CCAgT.parquet.gzip\\\n                        -a ./data/samples/CCAgT_dataset_metadata.json\\\n                        -d ./data/samples/images/\\\n                        -m ./data/samples/masks/semantic_segmentation/\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnnv1%2Fccagt-utils","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohnnv1%2Fccagt-utils","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnnv1%2Fccagt-utils/lists"}