{"id":22794558,"url":"https://github.com/colintr/practicalncd","last_synced_at":"2026-02-27T22:36:16.197Z","repository":{"id":207782422,"uuid":"720084911","full_name":"ColinTr/PracticalNCD","owner":"ColinTr","description":"[DMKD 2024] A Practical Approach to Novel Class Discovery in Tabular Data","archived":false,"fork":false,"pushed_at":"2024-08-20T09:14:53.000Z","size":57,"stargazers_count":1,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-23T21:25:39.044Z","etag":null,"topics":["class-discovery","clustering","deep-learning","ncd","novel-class-discovery"],"latest_commit_sha":null,"homepage":"https://arxiv.org/abs/2311.05440","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ColinTr.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-17T14:47:20.000Z","updated_at":"2024-08-20T09:14:56.000Z","dependencies_parsed_at":null,"dependency_job_id":"50549ddd-7af7-4a71-ad6a-f02ec04d39cf","html_url":"https://github.com/ColinTr/PracticalNCD","commit_stats":null,"previous_names":["colintr/practicalncd"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ColinTr/PracticalNCD","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColinTr%2FPracticalNCD","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColinTr%2FPracticalNCD/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColinTr%2FPracticalNCD/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColinTr%2FPracticalNCD/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ColinTr","download_url":"https://codeload.github.com/ColinTr/PracticalNCD/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ColinTr%2FPracticalNCD/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269649228,"owners_count":24453499,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-09T02:00:10.424Z","response_time":111,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["class-discovery","clustering","deep-learning","ncd","novel-class-discovery"],"created_at":"2024-12-12T04:09:16.309Z","updated_at":"2026-02-27T22:36:11.173Z","avatar_url":"https://github.com/ColinTr.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003e\n  PracticalNCD\n\u003c/h1\u003e\n  \n\u003cp align=\"center\"\u003e\n  Code used to generate the results of the DMKD journal paper \u003ca href=\"https://arxiv.org/abs/2311.05440\"\u003eA Practical Approach to Novel Class Discovery in Tabular Data\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cdiv align=\"center\"\u003e\n \n  [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)\n\u003c/div\u003e\n\n\n## 🔍 Overview\n\nThis python library proposes an ensemble tools for the Machine Learning problem of [Novel Class Discovery](https://arxiv.org/pdf/2302.12028.pdf).\n\nIn this library, you will find the following tools illustrated through Jupyter Notebooks:\n - An hyperparameter optimization procedure tailored to transfer the results from the known classes to the novel classes.\n - An estimation of the number of clusters by applying clustering quality metrics in the latent space of NCD methods.\n - Two unsupervised clustering algorithms modified to utilize the data available in the NCD setting.\n - A novel method called PBN (for Projection-Based NCD).\n\n\n## 🐍 Setting up the Python environment\n\n### Option 1 - With [Anaconda](https://www.anaconda.com/download):\n\n```bash\n# Create the virtual environment and install the packages with conda\nconda env create --file environment.yml --prefix ./venvpracticalncd\n\n# Activate the virtual environment\nconda activate .\\venvpracticalncd\n\n# Add package missing from conda repositories\npip install iteration-utilities==0.11.0\n```\n\n### Option 2 - Without Anaconda:\n\nPrerequisite: having [Python 3.10.9](https://www.python.org/downloads/release/python-3109/) the default python 3.10 version.\n\n```bash\n# Create the empty virtual environment\npy -3.10 -m venv venvpracticalncd\n\n# Activate the virtual environment\n# On windows:\n  .\\venvpracticalncd\\Scripts\\activate\n# On linux:\n  source venvpracticalncd/bin/activate\n  \n# Install the needed packages\npip install -r requirements.txt\n\n# And finish by installing pytorch independently\npip install torch==1.12.1 --index-url https://download.pytorch.org/whl/cu113\n```\n\n\n### Finishing touches\n\n```bash\n# Add the virtual environment as a jupyter kernel\nipython kernel install --name \"venvpracticalncd\" --user\n\n# Check if torch supports GPU (you need CUDA 11 installed)\npython -c \"import torch; print(torch.cuda.is_available())\"\n```\n\n\n## 💻 Usage\n\nThree notebooks are available:\n- **Full_notebook.ipynb** lets you train and evaluate the models when the number of clusters *k* is known in advance.\n- **Full_notebook_with_k_estimation.ipynb** (self-explanatory).\n- **results_wrt_n_unknown_classes.ipynb** is used to evaluate the performance of all the models when the number of novel classes increases. It was used to generate Figure C1 of Appendix C.\n\n\n## 📊 Datasets\n\nThe datasets will be \u003cu\u003eautomatically downloaded\u003c/u\u003e from https://archive.ics.uci.edu/ on the first execution.\u003cbr/\u003e\nIf it fails, please try disabling proxies.\n\n**However**, the data splits for some datasets are random and the results can vary compared to the paper.\n\nThe most impacted datasets are:\n- LetterRecognition\n- USCensus1990\n- multiple_feature\n\n\n## 📜 Citation\n\nIf you found this work useful, please use the following citation:\n```\n@article{tr2024practical,\n   title = {A Practical Approach to Novel Class Discovery in Tabular Data},\n   author = {Troisemaine, Colin and Reiffers{-}Masson, Alexandre and Gosselin, St{'{e}}phane and Lemaire, Vincent and Vaton, Sandrine},\n   journal = {Data Mining and Knowledge Discovery},\n   year = {2024},\n   month = {May},\n   day = {31},\n   issn = {1573-756X},\n   doi = {10.1007/s10618-024-01025-y}\n}\n```\n\n## ⚖️ License\n\nCopyright (c) 2023 Orange.\n\nThis code is released under the MIT license. See the LICENSE file for more information.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcolintr%2Fpracticalncd","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcolintr%2Fpracticalncd","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcolintr%2Fpracticalncd/lists"}