{"id":13603211,"url":"https://github.com/nomic-ai/nomic","last_synced_at":"2025-05-13T19:05:28.148Z","repository":{"id":55272893,"uuid":"516505460","full_name":"nomic-ai/nomic","owner":"nomic-ai","description":"Interact, analyze and structure massive text, image, embedding, audio and video datasets","archived":false,"fork":false,"pushed_at":"2025-03-27T21:09:04.000Z","size":25224,"stargazers_count":1642,"open_issues_count":56,"forks_count":185,"subscribers_count":28,"default_branch":"main","last_synced_at":"2025-04-26T11:59:47.530Z","etag":null,"topics":["clustering","duplicate-detection","embeddings","python","text","topic-modeling","unstructured-data"],"latest_commit_sha":null,"homepage":"https://atlas.nomic.ai","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nomic-ai.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-07-21T19:57:43.000Z","updated_at":"2025-04-25T09:23:07.000Z","dependencies_parsed_at":"2023-12-22T04:31:25.506Z","dependency_job_id":"bdc34350-1a25-4b2e-88ab-66648094f11a","html_url":"https://github.com/nomic-ai/nomic","commit_stats":{"total_commits":324,"total_committers":6,"mean_commits":54.0,"dds":0.191358024691358,"last_synced_commit":"4e446242b66714cd23c0d3956d17e734e1e5a925"},"previous_names":[],"tags_count":17,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nomic-ai%2Fnomic","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nomic-ai%2Fnomic/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nomic-ai%2Fnomic/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nomic-ai%2Fnomic/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nomic-ai","download_url":"https://codeload.github.com/nomic-ai/nomic/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251089408,"owners_count":21534512,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["clustering","duplicate-detection","embeddings","python","text","topic-modeling","unstructured-data"],"created_at":"2024-08-01T18:01:57.697Z","updated_at":"2025-04-27T04:51:22.860Z","avatar_url":"https://github.com/nomic-ai.png","language":"Python","funding_links":[],"categories":["Python","向量数据库_向量搜索_最近邻搜索"],"sub_categories":["资源传输下载"],"readme":"\u003ch1 align=\"center\"\u003eNomic Atlas Python Client\u003c/h1\u003e\n\u003ch3 align=\"center\"\u003eExplore, label, search and share massive datasets in your web browser.\u003c/h3\u003e\n\u003cp\u003eThis repository contains Python bindings for working with \u003ca href=\"https://atlas.nomic.ai/\"\u003eNomic Atlas\u003c/a\u003e, the world’s most powerful unstructured data interaction platform. Atlas supports datasets from hundreds to tens of millions of points, and supports data modalities ranging from text to image to audio to video. \u003c/p\u003e\n\nWith Nomic Atlas, you can:\n\n- Generate, store and retrieve embeddings for your unstructured data.\n- Find insights in your unstructured data and embeddings all from your web browser.\n- Share and present your datasets and data findings to anyone.\n\n### Where to find us?\n\n[https://atlas.nomic.ai/](https://atlas.nomic.ai/)\n\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003ctd\u003e\n      \u003cimg src=\"./assets/arxiv_map.png\" alt=\"Atlas Map of Arxiv Data\"\u003e\n      \u003cbr\u003e\n      \u003ccenter\u003e\u003ci\u003e\u003ca href=\"https://atlas.nomic.ai/map/ad82766d-3519-4c93-94c6-931dee0a7016/fab1d389-7d83-4bc6-b2bc-abc9fb50f808\"\u003eArticles Submitted to Arxiv (10/12/2023 - 10/19/2023)\u003c/a\u003e\u003c/i\u003e\u003c/center\u003e\n    \u003c/td\u003e\n    \u003ctd\u003e\n      \u003cimg src=\"./assets/tiktok_map.png\" alt=\"Atlas Map of TikTok Data\"\u003e\n      \u003cbr\u003e\n      \u003ccenter\u003e\u003ci\u003e\u003ca href=\"https://atlas.nomic.ai/map/eef7bc87-0c68-4d14-be83-157327d1e355/e3b74502-c9a4-4b24-9bbf-c9f708688ac6\"\u003eHistorical TikTok Dataset (Indexed on Metadata Descriptions)\u003c/a\u003e\u003c/i\u003e\u003c/center\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\n## Table of Contents\n\n- [Quick resources](#quick-resources)\n  - [Example maps](#example-maps)\n- [Features](#features)\n- [Quickstart](#quickstart)\n  - [Installation](#installation)\n  - [Make your first map](#make-your-first-map)\n- [Atlas usage examples](#atlas-usage-examples)\n  - [Access your embeddings](#access-your-embeddings)\n  - [View your data's topic model](#view-your-datas-topic-model)\n  - [Search for data semantically](#search-for-data-semantically)\n- [Documentation](#documentation)\n- [Discussion](#discussion)\n- [Community](#community)\n\n## Quick Resources\n\n\u003cp \u003e\n  Try the \u003ca href=\"https://colab.research.google.com/drive/1CZBo3LV0FoRTVRN3v068tvNJgbeWpcSX?usp=sharing\"\u003e:notebook: Colab Demo\u003c/a\u003e to get started in Python\n\u003c/p\u003e\n\n\u003cp\u003e\n  Read the \u003ca href=\"https://docs.nomic.ai\"\u003e:closed_book:\t Atlas Docs\u003c/a\u003e\n\u003c/p\u003e\n\n\u003cp\u003e\n  Join our \u003ca href=\"https://discord.gg/myY5YDR8z8\"\u003e:hut: Discord\u003c/a\u003e to start chatting and get help\n\u003c/p\u003e\n\n#### Example maps\n\n\u003ca href=\"https://atlas.nomic.ai/map/twitter\"\u003e:world_map: Map of Twitter\u003c/a\u003e (5.4 million tweets)\n\u003cbr\u003e \u003cbr\u003e\n\u003ca href=\"https://atlas.nomic.ai/map/stablediffusion\"\u003e:world_map: Map of StableDiffusion Generations\u003c/a\u003e (6.4 million images)\n\u003cbr\u003e \u003cbr\u003e\n\u003ca href=\"https://atlas.nomic.ai/map/neurips\"\u003e:world_map: Map of NeurIPS Proceedings\u003c/a\u003e (16,623 abstracts)\n\n\u003c/p\u003e\n\n## Features\n\nHere are just a few of the features which Atlas offers:\n\n- Organize your **text, image, and embedding data**\n- Create **beautiful and shareable** maps **with or without coding knowledge**\n- Have easy access to both **high-level data structures** and **individual datapoints**\n- **Search** millions of datapoints **instantly**\n- **Cluster data** into semantic topics\n- **Tag and clean** your dataset\n- **Deduplicate** text, images, video, audio\n\n\u003cimg src=\"./assets/nomic-banner 3.png\" alt=\"Nomic banner logo\"\u003e\n\n## Quickstart\n\n### Installation\n\n1. Install the Nomic library\n\n```bash\npip install nomic\n```\n\n2. Login or create your Nomic account:\n\n```bash\nnomic login\n```\n\n3. Follow the instructions to obtain your access token.\n\n```bash\nnomic login [token]\n```\n\n### Make your first map\n\n```python\nfrom nomic import atlas\nimport numpy as np\n\n# Randomly generate a set of 10,000 high-dimensional embeddings\nnum_embeddings = 10000\nembeddings = np.random.rand(num_embeddings, 256)\n\n# Create Atlas project\ndataset = atlas.map_data(embeddings=embeddings)\n\nprint(dataset)\n```\n\n## Atlas usage examples\n\n### Access your embeddings\n\nAtlas stores, manages and generates embeddings for your unstructured data.\n\nYou can access Atlas latent embeddings (e.g. high dimensional) or the two-dimensional embeddings generated for web display.\n\n```python\n# Access your Atlas map and download your embeddings\nmap = dataset.maps[0]\n\nprojected_embeddings = map.embeddings.projected\nlatent_embeddings = map.embeddings.latent\n```\n\n```python\nprint(projected_embeddings)\n```\n\n```\n# Response:\nid \tx \ty\n0 \t9.815330 \t-8.105308\n1 \t-8.725819 \t5.980628\n2 \t13.199472 \t-1.103389\n... \t... \t... \t...\n```\n\n```python\nprint(latent_embeddings)\n```\n\n```\n# Response:\nn x d numpy.ndarray where n = number of datapoints and d = number of latent dimensions\n```\n\n### View your data’s topic model\n\nAtlas automatically organizes your data into topics informed by the latent contents of your embeddings. Visually, these are represented by regions of homogenous color on an Atlas map.\n\nYou can access and operate on topics programmatically by using the `topics` attribute\nof an AtlasMap.\n\n```python\n# Access your Atlas map\nmap = dataset.maps[0]\n\n# Access a pandas DataFrame associating each datum on your map to their topics at each topic depth.\ntopic_df = map.topics.df\n\nprint(map.topics.df)\n\n```\n\n```\nResponse:\n\nid topic_depth_1 topic_depth_2\n0 Oil Prices mergers and acquisitions\n1 Iraq War Trial of Thatcher\n2 Oil Prices Economic Growth\n... ... ... ...\n9997 Oil Prices Economic Growth\n9998 Baseball Giambi's contract\n9999 Olympic Gold Medal European Football\n\n```\n\n### Search for data semantically\n\nUse Atlas to automatically find nearest neighbors in your vector database.\n\n```python\n# Load map and perform vector search for the five nearest neighbors of datum with id \"my_query_point\"\nmap = dataset.maps[0]\n\nwith dataset.wait_for_dataset_lock():\n  neighbors, _ = map.embeddings.vector_search(ids=['my_query_point'], k=5)\n\n# Return similar data points\nsimilar_datapoints = dataset.get_data(ids=neighbors[0])\n\nprint(similar_datapoints)\n```\n\n```\nResponse:\n\nOriginal query point:\n\"Intel abandons digital TV chip project NEW YORK, October 22 (newratings.com) - Global semiconductor giant Intel Corporation (INTC.NAS) has called off its plan to develop a new chip for the digital projection televisions.\"\n\nNearest neighbors:\n\"Intel awaits government move on expensing options Figuring it's had enough of fighting over options, the chip giant is waiting to see what Congress comes up with.\"\n\"Citigroup Takes On Intel The financial services giant takes over non-memory semiconductor chip production.\"\n\"Intel Seen Readying New Wi-Fi Chips  SAN FRANCISCO (Reuters) - Intel Corp. this week is  expected to introduce a chip that adds support for a relatively  obscure version of Wi-Fi, analysts said on Monday, in a move  that could help ease congestion on wireless networks.\"\n\"Intel pledges to bring Itanic down to Xeon price-point EM64T a stand-in until the real anti-AMD64 kit arrives\"\n```\n\n## Background\n\nAtlas is developed by the [Nomic AI](https://home.nomic.ai/) team, which is based in NYC. Nomic also developed and maintains [GPT4All](https://gpt4all.io/index.html), an open-source LLM chatbot ecosystem.\n\n## Discussion\n\nJoin the discussion on our [:hut: Discord](https://discord.gg/myY5YDR8z8) to ask questions, get help, and chat with others about Atlas, Nomic, GPT4All, and related topics. Our doors are open to enthusiasts of all skill levels.\n\n## Community\n\n- Blog: [https://blog.nomic.ai/](https://blog.nomic.ai/)\n- Twitter: [https://twitter.com/nomic_ai](https://twitter.com/nomic_ai)\n- Nomic Website: [https://home.nomic.ai/](https://home.nomic.ai/)\n- Atlas Website: [https://atlas.nomic.ai/](https://atlas.nomic.ai/)\n- GPT4All Website: [https://gpt4all.io/index.html](https://gpt4all.io/index.html)\n- LinkedIn: [https://www.linkedin.com/company/nomic-ai](https://www.linkedin.com/company/nomic-ai)\n\n\u003cbr\u003e\n\n[Go to top](#)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnomic-ai%2Fnomic","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnomic-ai%2Fnomic","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnomic-ai%2Fnomic/lists"}