{"id":13573980,"url":"https://github.com/stat-ml/ncvis","last_synced_at":"2025-04-04T13:30:40.805Z","repository":{"id":38408937,"uuid":"194685080","full_name":"stat-ml/ncvis","owner":"stat-ml","description":"Noise-Contrastive Visualization","archived":false,"fork":false,"pushed_at":"2023-11-25T17:57:22.000Z","size":1863,"stargazers_count":51,"open_issues_count":3,"forks_count":2,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-04-24T15:42:49.912Z","etag":null,"topics":["dimensionality-reduction","machine-learning","ncvis","visualization"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/stat-ml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2019-07-01T14:08:25.000Z","updated_at":"2024-03-19T13:39:37.000Z","dependencies_parsed_at":"2024-01-14T03:51:53.231Z","dependency_job_id":"8526986b-3542-43a8-9cd4-8f226e3b22ae","html_url":"https://github.com/stat-ml/ncvis","commit_stats":{"total_commits":228,"total_committers":2,"mean_commits":114.0,"dds":"0.030701754385964897","last_synced_commit":"debaf84496c54666b8fc613c87fa9f5c28c2574a"},"previous_names":[],"tags_count":10,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stat-ml%2Fncvis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stat-ml%2Fncvis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stat-ml%2Fncvis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/stat-ml%2Fncvis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/stat-ml","download_url":"https://codeload.github.com/stat-ml/ncvis/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247184876,"owners_count":20897845,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dimensionality-reduction","machine-learning","ncvis","visualization"],"created_at":"2024-08-01T15:00:44.545Z","updated_at":"2025-04-04T13:30:36.301Z","avatar_url":"https://github.com/stat-ml.png","language":"C++","funding_links":[],"categories":["C++"],"sub_categories":[],"readme":"[![Conda Version](https://img.shields.io/conda/vn/conda-forge/ncvis.svg)](https://anaconda.org/conda-forge/ncvis)\n[![PyPI](https://img.shields.io/pypi/v/ncvis.svg)](https://pypi.python.org/pypi/ncvis/)\n[![GitHub](https://img.shields.io/github/license/alartum/ncvis.svg)](https://github.com/alartum/ncvis/blob/master/LICENSE)\n[![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/ncvis.svg)](https://anaconda.org/conda-forge/ncvis)\n[![Build Status](https://dev.azure.com/conda-forge/feedstock-builds/_apis/build/status/ncvis-feedstock?branchName=main)](https://dev.azure.com/conda-forge/feedstock-builds/_build/latest?definitionId=8934\u0026branchName=main)\n[![Conda Platforms](https://img.shields.io/conda/pn/conda-forge/ncvis.svg)](https://anaconda.org/conda-forge/ncvis)\n\n# ncvis\n\n**NCVis** is an efficient solution for data visualization and dimensionality reduction. It uses [HNSW](https://github.com/nmslib/hnswlib) to quickly construct the nearest neighbors graph and a parallel (batched) approach to build its embedding. Efficient random sampling is achieved via [PCGRandom](https://github.com/imneme/pcg-cpp). Detailed application examples can be found [here](https://github.com/alartum/ncvis-examples).\n\n# Why NCVis?\n\n## It is Fast\n\nWe use preprocessed samples from the [News Headlines Of India dataset](https://www.kaggle.com/therohk/india-headlines-news-dataset) to perform the comparison. Test cases are generated by taking the first 1000, 2 · 1000, . . . , 2¹⁰ · 1000 samples from the dataset. Given the same amount of time **NCVis** allows to process more than double number of samples compared to other methods, visualizing **10⁶** points in only **6** minutes (12 × Intel® CoreTM i7-8700K CPU @\n3.70GHz, 64 Gb RAM).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"400\" alt=\"Speed Comparison\" src=\"https://github.com/stat-ml/ncvis-examples/blob/master/img/time_all.png?raw=true\"\u003e\n\u003c/p\u003e\n\n## It is Efficient\n\nOne can define efficiency as the ratio of the time to execute the task on a single processor to the time on multiple processors. Ideally, the efficiency should be equal to the num-\nber of threads. **NCVis** does not achieve this limit but signifi-\ncantly outperforms other methods. We used 10000 samples from the [News Headlines Of India dataset](https://www.kaggle.com/therohk/india-headlines-news-dataset).\n\n\u003cp align=\"center\"\u003e\n  \u003cimg width=\"400\" alt=\"Efficiency Comparison\" src=\"https://github.com/stat-ml/ncvis-examples/blob/master/img/efficiency.png?raw=true\"\u003e\n\u003c/p\u003e\n\n## It is Predictable\n\nIt is important that the proposed method has predictable behavior on simple datasets. We used the [Optical Recognition of Handwritten Digits Data Set](https://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits) which comprised 5620 preprocessed handwritten digits and thus has a simple structure that is assumed to be revealed by visualization. **NCVis** shows the behavior consistent with classical methods like t-SNE while producing visualization up to the order of magnitude faster.\n\n| t-SNE (29.5s)   |   FIt-SNE (17.4s) |\n:-------------------------:|:-------------------------:\n\u003cimg width=\"300\" alt=\"t-SNE\" src=\"https://github.com/stat-ml/ncvis-examples/blob/master/img/t-SNE.png?raw=true\"\u003e | \u003cimg width=\"300\" alt=\"FIt-SNE\" src=\"https://github.com/stat-ml/ncvis-examples/blob/master/img/FIt-SNE.png?raw=true\"\u003e\n\n| Multicore t-SNE (14.3s) |  LargeVis (9.7s)|\n:-------------------------:|:-------------------------:\n\u003cimg width=\"300\" alt=\"Multicore t-SNE\" src=\"https://github.com/stat-ml/ncvis-examples/blob/master/img/Multicore%20t-SNE.png?raw=true\"\u003e | \u003cimg width=\"300\" alt=\"LargeVis\" src=\"https://github.com/stat-ml/ncvis-examples/blob/master/img/LargeVis.png?raw=true\"\u003e\n\n| Umap (7.5s)  |  NCVis (0.9s)|\n:-------------------------:|:-------------------------:\n\u003cimg width=\"300\" alt=\"Umap\" src=\"https://github.com/stat-ml/ncvis-examples/blob/master/img/Umap.png?raw=true\"\u003e | \u003cimg width=\"300\" alt=\"NCVis\" src=\"https://github.com/stat-ml/ncvis-examples/blob/master/img/NCVis.png?raw=true\"\u003e\n\n# Using\n\n```python\nimport ncvis\n\nvis = ncvis.NCVis()\nY = vis.fit_transform(X)\n```\n\nMore detailed examples can be found [here](https://github.com/alartum/ncvis-examples).\n\n# Installation\n\n## Conda [recommended]\n\nYou do not need to setup the environment if using *conda*, all dependencies are installed automatically. Using *conda-forge* channel is preferred, but using *alartum* channel is also possible in case of any issues with *conda-forge*.\n```bash\n$ conda install conda-forge::ncvis \n```\nor\n```bash\n$ conda install alartum::ncvis \n```\n\n## Pip [not recommended]\n\n**Important**: be sure to have a compiler with *OpenMP* support. *GCC* has it by default, which is not the case with *clang*. You may need to install *llvm-openmp* library beforehand.  \n\n1. Install **numpy**, **cython** and **pybind11** packages (compile-time dependencies):\n    ```bash\n    $ pip install numpy cython pybind11\n    ```\n2. Install **ncvis** package:\n    ```bash\n    $ pip install ncvis\n    ```\n\n## From source [not recommended]\n\n**Important**: be sure to have *OpenMP* available.\n\nFirst of all, download the *pcg-cpp* and *hnswlib* libraries:\n```bash\n$ make libs\n``` \n### Python Wrapper \n\nIf *conda* environment is used, it replaces library search paths. To prevent compilation errors, you either need to use compilers provided by *conda* or switch to *pip*  and system compilers. \n\n* Conda\n    ```bash\n    $ conda install -c conda-forge cxx-compiler c-compiler conda-build numpy cython pybind11 scipy\n    $ conda-develop -bc .\n    ``` \n\n* Pip\n    ```bash\n    $ pip install numpy cython pybind11\n    $ make wrapper\n    ```\n\nYou can then use *pytest* to run some basic checks\n```bash\n$ pytest -v recipe/test.py\n```\n\n\n### C++ Binary\n\n* Release\n    ```bash\n    $ make ncvis\n    ```\n\n* Debug\n    ```bash\n    $ make debug\n    ```\n\n# Citation\n\nThe original paper can be found [here](https://dl.acm.org/doi/abs/10.1145/3366423.3380061). If you use **NCVis**, we kindly ask you to cite:\n\n```\n@inproceedings{10.1145/3366423.3380061,\nauthor = {Artemenkov, Aleksandr and Panov, Maxim},\ntitle = {NCVis: Noise Contrastive Approach for Scalable Visualization},\nyear = {2020},\nisbn = {9781450370233},\npublisher = {Association for Computing Machinery},\naddress = {New York, NY, USA},\nurl = {https://doi.org/10.1145/3366423.3380061},\ndoi = {10.1145/3366423.3380061},\nbooktitle = {Proceedings of The Web Conference 2020},\npages = {2941–2947},\nnumpages = {7},\nkeywords = {dimensionality reduction, noise contrastive estimation, embedding algorithms, visualization},\nlocation = {Taipei, Taiwan},\nseries = {WWW ’20}\n}\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstat-ml%2Fncvis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fstat-ml%2Fncvis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fstat-ml%2Fncvis/lists"}