{"id":23186608,"url":"https://github.com/sidhyaashu/md_genai_rough","last_synced_at":"2025-04-05T04:46:00.766Z","repository":{"id":268214584,"uuid":"903663581","full_name":"sidhyaashu/MD_GenAI_Rough","owner":"sidhyaashu","description":"General code for Gen-Ai RAG using python and openAI","archived":false,"fork":false,"pushed_at":"2024-12-15T08:08:40.000Z","size":9,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"sidhya","last_synced_at":"2025-02-10T12:45:25.632Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sidhyaashu.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-15T08:05:20.000Z","updated_at":"2024-12-15T08:08:44.000Z","dependencies_parsed_at":"2024-12-15T09:19:50.418Z","dependency_job_id":null,"html_url":"https://github.com/sidhyaashu/MD_GenAI_Rough","commit_stats":null,"previous_names":["sidhyaashu/md_genai_rough"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sidhyaashu%2FMD_GenAI_Rough","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sidhyaashu%2FMD_GenAI_Rough/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sidhyaashu%2FMD_GenAI_Rough/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sidhyaashu%2FMD_GenAI_Rough/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sidhyaashu","download_url":"https://codeload.github.com/sidhyaashu/MD_GenAI_Rough/tar.gz/refs/heads/sidhya","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247289399,"owners_count":20914464,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-18T10:16:53.668Z","updated_at":"2025-04-05T04:46:00.748Z","avatar_url":"https://github.com/sidhyaashu.png","language":"Python","readme":"### Code Summary: Define the Vector Search Index in a JSON file using CLI\n\n```plaintext\nDefine the Vector Search Index in a JSON file:\n\nDefine the data and collection you want to index. Designate the type as vectorSearch and create a name that allows you to easily identify the purpose of the index. Finally, define the fields being indexed, and specify the type, number of dimensions, and similarity.\n\n{\n  \"database\": \"sample_mflix\",\n  \"collectionName\": \"movies\",\n  \"type\": \"vectorSearch\",\n  \"name\": \"movies_vector_index\",\n  \"fields\": [\n    {\n      \"type\": \"vector\",\n      \"path\": \"embedding\",\n      \"numDimensions\": 1536,\n      \"similarity\": \"cosine\"\n    }\n  ]\n}\n\n\nCreate the Index:\n\nUse atlas clusters search indexes create to create the index using a JSON file like the example above. You’ll need to pass in the name of the cluster and the path to the file. Note that depending on how you authenticate, you may need to first specify the appropriate projectID.\n\natlas clusters search indexes create \\\n    --clusterName vector \\\n    --file index.json\u003c?code\u003e\n\n\nConfirmation\n\nSuccessful creation of the index should return a confirmation message like this:\n\nIndex movies_vector_index created.\n\n\nChecking Your Indexes:\n\nTo check on the status of an index (or multiple indexes) you can use the atlas clusters search indexes list command. You’ll need to specify the names of the cluster, database, and collection for the index. In this example, we are requesting that the output be formatted in JSON.\n\natlas clusters search indexes list \\\n    --clusterName vector \\\n    --db test_mflix \\\n    --collection movies \\\n    --output json\n\n\nThis will return an array which will include information on each index within the specified collection.\n\n[\n  {\n    \"collectionName\": \"movies\",\n    \"database\": \"test_mflix\",\n    \"indexID\": \"66720dec75b489672353910b\",\n    \"name\": \"movies_vector_index\",\n    \"status\": \"STEADY\",\n    \"type\": \"vectorSearch\",\n    \"fields\": [\n      {\n        \"numDimensions\": 1536,\n        \"path\": \"embedding\",\n        \"similarity\": \"cosine\",\n        \"type\": \"vector\"\n      }\n    ]\n  }\n]\n\n\nLooking Up a Specific Index\n\nTo see information for a specific index, you can use the atlas clusters search indexes describe command and pass in the index ID, like so:\n\natlas clusters search indexes describe \u003cid_placeholder\u003e \\\n    --clusterName vector \\\n    --output json\n\n\nThis will return information about the index specified. Note that this is a single document, and not an array.\n\n{\n  \"collectionName\": \"movies\",\n  \"database\": \"test_mflix\",\n  \"indexID\": \"66720dec75b489672353910b\",\n  \"name\": \"movies_vector_index\",\n  \"status\": \"STEADY\",\n  \"type\": \"vectorSearch\",\n  \"fields\": [\n    {\n      \"numDimensions\": 1536,\n      \"path\": \"embedding\",\n      \"similarity\": \"cosine\",\n      \"type\": \"vector\"\n    }\n  ]\n}\n\n\nUpdating an Existing Index:\n\nHere, we’ve added a filter to the index definition JSON file:\n\n{\n  \"database\": \"test_mflix\",\n  \"collectionName\": \"movies\",\n  \"type\": \"vectorSearch\",\n  \"name\": \"movies_vector_index\",\n  \"fields\": [\n    {\n      \"type\": \"vector\",\n      \"path\": \"embedding\",\n      \"numDimensions\": 1536,\n      \"similarity\": \"cosine\"\n    },\n    {\n      \"type\": \"filter\",\n      \"path\": \"year\"\n    }\n  ]\n}\n\n\nWe can then use the atlas clusters search indexes update command to overwrite the existing index (specified via the indexID) with the new definition:\n\natlas clusters search indexes update \u003cid_placeholder\u003e \\\n    --clusterName vector \\\n    --file index.json \\\n    --output json\n\n\nThe confirmation message will look like this:\n\n{\n  \"collectionName\": \"movies\",\n  \"database\": \"test_mflix\",\n  \"indexID\": \"66720dec75b489672353910b\",\n  \"name\": \"movies_vector_index\",\n  \"status\": \"IN_PROGRESS\",\n  \"type\": \"vectorSearch\",\n  \"fields\": [\n    {\n      \"numDimensions\": 1536,\n      \"path\": \"embedding\",\n      \"similarity\": \"cosine\",\n      \"type\": \"vector\"\n    },\n    {\n      \"path\": \"year\",\n      \"type\": \"filter\"\n    }\n  ]\n}\n\n\nNote that the filter we added to the JSON definition file is now included in the index, and that the status is listed as “IN_PROGRESS”. This will change to “STEADY” when it is finished being rebuilt. Queries which would benefit from the index will continue to use the original version until the update is complete.\n\nDeleting an Index:\n\nTo delete an index use the atlas clusters search indexes delete command. You’ll need to specify the indexID and the name of the cluster the index resides on.\n\natlas clusters search indexes delete \u003cid_placeholder\u003e \\\n    --clusterName vector\n\n\nAfter running the command, you’ll be prompted to enter y to confirm the deletion, and you will receive a confirmation message:\n\n? Are you sure you want to delete: \u003cid_placeholder\u003e (y/N) y\nIndex '\u003cid_placeholder\u003e' deleted\n\n\nYou may confirm the deletion of the index by running the atlas clusters search indexes list command:\n\natlas clusters search indexes list \\\n    --clusterName vector \\\n    --db test_mflix \\\n    --collection movies \\\n    --output json\n\n\n\n\n\n```\n\n\n### Code Summary: Creating a Vector Search Index Using MongoDB Shell\n```plaintext\n\nCreating a Vector Search Index Using MongoDB Shell:\n\nUse the db.collection.createSearchIndex command to build an index on a collection. You must specify the type of index as vectorSearch, and you will need to define the fields you wish to index, including the type, number of dimensions, and similarity.\n\ndb.movies.createSearchIndex(\n  \"movies_vector_index\", \n  \"vectorSearch\", \n  {\n    \"fields\": [\n      {\n        \"type\": \"vector\",\n        \"numDimensions\": 1536,\n        \"path\": \"plot_embedding\",\n        \"similarity\": \"cosine\",\n      }\n    ],\n  }\n);\n\n\nAfter running the command, you will receive a confirmation message with the name of the new index:\n\nmovies_vector_index\n\n\nViewing an Existing Index:\n\nUse the db.collection.getSearchIndexes command and leave it blank in order to see all indexes on the specified collection:\n\ndb.movies.getSearchIndexes();\n\n\nIf you wish to specify a particular index, pass in the name of the index as an argument:\n\ndb.movies.getSearchIndexes(\"movies_vector_index\");\n\n\nThis will return a lot of information about the index, including the status, which will let you know if the index is ready or still being prepared. It will also show the current definition of the index.\n\n[\n  {\n    id: \"6671e934b362ed3c6ad84512\",\n    name: \"movies_vector_index\",\n    type: \"vectorSearch\",\n    status: \"READY\",\n    queryable: true,\n    latestDefinitionVersion: { version: 0, createdAt: ISODate(\"2024-06-18T20:08:20.678Z\") },\n    latestDefinition: {\n      fields: [\n        {\n          type: \"vector\",\n          numDimensions: 1536,\n          path: \"plot_embedding\",\n          similarity: \"cosine\"\n        },\n      ],\n    },\n    statusDetail: [\n      . . .\n    ],\n  },\n];\n\n\nNote the statusDetail array in the above example. This array contains an index status array for each node in the cluster.\n\n[\n  {\n    . . .\n    statusDetail: [\n      {\n        hostname: \"atlas-11yyiw-shard-00-02\",\n        status: \"READY\",\n        queryable: true,\n        mainIndex: {\n          status: \"READY\",\n          queryable: true,\n          definitionVersion: {\n            version: 0,\n            createdAt: ISODate(\"2024-06-18T20:08:20.000Z\"),\n          },\n          definition: { fields: [[Object]] },\n        },\n      },\n      . . .\n    ]\n  },\n];\n\n\nEditing an Existing Vector Search Index:\n\nUpdate the definition of an existing vector search index with the db.collection.updateSearchIndex command. In this example, we’ll add a filter on the year field.\n\ndb.movies.updateSearchIndex(\n  \"movies_vector_index\", \n  {\n    \"fields\": [\n      {\n        \"type\": \"vector\",\n        \"numDimensions\": 1536,\n        \"path\": \"embedding\",\n        \"similarity\": \"cosine\",\n      },\n      {\n        \"type\": \"filter\",\n        \"path\": \"year\",\n      }\n    ],\n  }\n);\n\n\nUsing db.movies.getSearchIndexes(\"movies_vector_index\"), we can confirm that the update was successful:\n\n[\n  {\n    id: \"6671e934b362ed3c6ad84512\",\n    name: \"movies_vector_index\",\n    type: \"vectorSearch\",\n    status: \"READY\",\n    queryable: true,\n    latestDefinitionVersion: { version: 1, createdAt: ISODate(\"2024-06-18T20:08:25.348Z\") },\n    latestDefinition: {\n      fields: [\n        {\n          type: \"vector\",\n          numDimensions: 1536,\n          path: \"plot_embedding\",\n          similarity: \"cosine\"\n        },\n     { type: 'filter', path: 'year' }\n      ],\n    },\n    statusDetail: [\n      . . .\n    ],\n  },\n];\n\n\nDeleting a Vector Search Index\n\nDelete a vector search index with the db.collection.dropSearchIndex command and pass the name of the vector along as an argument:\n\ndb.movies.dropSearchIndex(\"movies_vector_index\");\n\n\nRunning this command will not produce any feedback from the console, so to confirm that the deletion was successful, you can use the db.collection.getSearchIndexes command:\n\ndb.movies.getSearchIndexes(\"movies_vector_index\");\n\n\nThis will return the information we’re accustomed to seeing when we run this command. In this case, we can see that the status is listed as “DELETING”, so we know the deletion was successful. We can also run getSearchIndexes again later without specifying an index name, and see that it is no longer present on the collection.\n\n[\n  {\n    id: \"6671e934b362ed3c6ad84512\",\n    name: \"movies_vector_index\",\n    type: \"vectorSearch\",\n    status: \"DELETING\",\n    queryable: true,\n    latestDefinitionVersion: { version: 1, createdAt: ISODate(\"2024-06-18T20:08:25.348Z\") },\n    latestDefinition: {\n      fields: [\n        {\n          type: \"vector\",\n          numDimensions: 1536,\n          path: \"plot_embedding\",\n          similarity: \"cosine\"\n        },\n     { type: 'filter', path: 'year' }\n      ],\n    },\n    statusDetail: [\n      . . .\n    ],\n  },\n];\n\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsidhyaashu%2Fmd_genai_rough","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsidhyaashu%2Fmd_genai_rough","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsidhyaashu%2Fmd_genai_rough/lists"}