{"id":19440193,"url":"https://github.com/kghandour/dd3d","last_synced_at":"2025-04-14T19:21:39.322Z","repository":{"id":194404498,"uuid":"632843338","full_name":"kghandour/dd3d","owner":"kghandour","description":"Dataset Distillation on 3D Point Clouds using Gradient Matching","archived":false,"fork":false,"pushed_at":"2023-09-12T20:49:38.000Z","size":17788,"stargazers_count":6,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-28T07:41:34.523Z","etag":null,"topics":["dataset-distillation","gradient-matching"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/kghandour.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2023-04-26T08:45:15.000Z","updated_at":"2024-12-09T10:54:17.000Z","dependencies_parsed_at":null,"dependency_job_id":"c5ea13a8-c152-43d2-8cb3-7b0a6964be74","html_url":"https://github.com/kghandour/dd3d","commit_stats":null,"previous_names":["kghandour/dd3d"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kghandour%2Fdd3d","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kghandour%2Fdd3d/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kghandour%2Fdd3d/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/kghandour%2Fdd3d/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/kghandour","download_url":"https://codeload.github.com/kghandour/dd3d/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248943415,"owners_count":21186958,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset-distillation","gradient-matching"],"created_at":"2024-11-10T15:28:21.069Z","updated_at":"2025-04-14T19:21:39.296Z","avatar_url":"https://github.com/kghandour.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Dataset Distillation on 3D Point Clouds using Gradient Matching\n![Overall Pipeline](assets/Pipeline.png)\nFigure 1: Overview of the main pipeline\n\n![Distillation Block](assets/DistillationBlock.png)\nFigure 2: Pipeline of the Distillation Block\n\n![Distillation Results from using Pytorch Implementation (Top) and MinkowskiEngine (Bottom)](assets/MNIST.png)\n\nFigure 3: Distillation results of MNIST pointclouds after projecting to image space. Top: Pytorch implementation. Bottom: MinkowskiEngine implementation.\n\n[Report](assets/Report.pdf)\n\n## Abstract\nAs the demand for better deep learning model increases, the need for larger datasets is on the rise. This increase in size is requiring more resources to be able to store such datasets and use them in training. Dataset distillation is a relatively new research focus that attempts to condense a large dataset by learning a smaller synthetic dataset which when used to train a neural network reaches a similar performance as using the original dataset. Several approaches attempt distilling dense data representations such as images whether by using a bi-level optimization method, feature matching, or gradient matching. However, the objective of this paper is to use dataset distillation using gradient matching on a sparse data representation, specifically 3D point clouds with a variable number of input points. Instead of turning on or off voxels in a dense grid, the objective is updating the point coordinates directly starting from points scattered randomly in 3D space to eventually converge to capture the geometric features of the classes. Even though the results do not reach other state of the art approaches, they show a proof of concept that updating coordinates directly is a possible method for distilling data.\n\n## Getting Started\n### 1. Installing MinkowskiEngine\nThis implementation focuses on sparse tensors. MinkowskiEngine is heavily used in the process.\nFollow the steps found [here](https://github.com/NVIDIA/MinkowskiEngine) to install MinkowskiEngine\n\n### 2. Installing the dependancies\n- Install Conda on your system\n- Install the environment\n```\nconda env create -f environment.yml\n```\n\n### 3. Update configuration\nUpdate `configs/default.ini` important fields:\n\n- `logging_parent` stands for the destination of your logging files\n- `distillation_checkpoint_dir` stands for the destination of your checkpoint saves\n\n### 4. Running the Distillation\n```\npython mnist_distill.py\n```\n\nUseful parameters:\n\n- `--no-log` when debugging this prevents creating tensorboard logs\n- `--exp \u003cNAME\u003e` quickly set the experiment name for each run\n- `--pytorch` uses pytorch architecture instead of Minkowski Engine. (DEFAULT: FALSE)\n\n## Important Files\n- `mnist_distill.py` \n\nCovers the main pipeline of the distillation and evaluation\n\n- `distillation_loss.py`\n\nIncludes the list of distance measurements implemented\n\n- `models/MEConv.py`\n\nIncludes the different network architectures. \n`MEPytorch` covers the architecture used for using pure Pytorch.\n`MEConvExp` covers the arhictecture for using ME.\n\n- `models/MNIST.py`\n\nIncludes the classifier network\n\n- `configs/settings.py`\n\nParses the config file and the CLI arguments and sets the default values for multiple global variables.\n\n- `configs/default.ini`\n\nIs the default configuration file. Includes the directory paths, naming and some network parameters.\n\n## Irrelevant files\n- `run_distill.py`\n\nIncludes initial experiments for distilling ShapeNet using PointNet++ \n\n- `train_classification.py` and `test_classification.py` and `models/pointnet2_ssg_wo_normals/`\n\nInclude training pipeline for baseline classifier on ShapeNet that uses PointNet++ as base. These files are heavily based on https://github.com/yanx27/Pointnet_Pointnet2_pytorch\n\n- Additional files in other branches worked to convert ShapeNet to PCD as well as run some analysis on ShapeNet during initial experiments.\n\n## Citations\n- Zhao et al. Dataset Condensation using Gradient Matching (2020) [Github Link](https://github.com/VICO-UoE/DatasetCondensation)\n- MNIST dataset \n- Choy et al. 4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks [Github Link](https://github.com/NVIDIA/MinkowskiEngine)\n- Xu Yan: PointNet/Pointnet++ Pytorch [Github Link](https://github.com/yanx27/Pointnet_Pointnet2_pytorch) (This implementation is no longer mainly used but is kept in the repository for potential future use.)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkghandour%2Fdd3d","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkghandour%2Fdd3d","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkghandour%2Fdd3d/lists"}