{"id":15020963,"url":"https://github.com/decisionfacts/semantic-ai","last_synced_at":"2026-03-17T14:20:23.445Z","repository":{"id":205278880,"uuid":"702391306","full_name":"decisionfacts/semantic-ai","owner":"decisionfacts","description":"An open source framework for Retrieval-Augmented  System (RAG) uses semantic search helps to retrieve the expected results and generate human readable conversational response with the help of LLM (Large Language Model).","archived":false,"fork":false,"pushed_at":"2024-07-19T11:38:45.000Z","size":4752,"stargazers_count":21,"open_issues_count":2,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-07-11T22:01:56.357Z","etag":null,"topics":["approximate-nearest-neighbor-search","deep-neural-networks","document-parser","docx","fastapi","inference-api","llama2","llm","machine-learning","ocr","openai","openai-api","pdf","rag","retrieval-augmented-generation","semantic-search","vector-database"],"latest_commit_sha":null,"homepage":"https://docs-semantic-ai.decisionfacts.ai/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/decisionfacts.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-10-09T08:28:31.000Z","updated_at":"2025-06-12T15:06:48.000Z","dependencies_parsed_at":"2023-12-26T08:44:38.643Z","dependency_job_id":"5e9ee36c-c602-4fe9-9d04-5f54fafb9e60","html_url":"https://github.com/decisionfacts/semantic-ai","commit_stats":{"total_commits":119,"total_committers":9,"mean_commits":"13.222222222222221","dds":"0.45378151260504207","last_synced_commit":"3531dfa83f032e16b84c0e088eae309def066733"},"previous_names":["decisionfacts/semantic-ai"],"tags_count":8,"template":false,"template_full_name":null,"purl":"pkg:github/decisionfacts/semantic-ai","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decisionfacts%2Fsemantic-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decisionfacts%2Fsemantic-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decisionfacts%2Fsemantic-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decisionfacts%2Fsemantic-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/decisionfacts","download_url":"https://codeload.github.com/decisionfacts/semantic-ai/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/decisionfacts%2Fsemantic-ai/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":265643242,"owners_count":23804045,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["approximate-nearest-neighbor-search","deep-neural-networks","document-parser","docx","fastapi","inference-api","llama2","llm","machine-learning","ocr","openai","openai-api","pdf","rag","retrieval-augmented-generation","semantic-search","vector-database"],"created_at":"2024-09-24T19:55:56.991Z","updated_at":"2026-03-17T14:20:23.388Z","avatar_url":"https://github.com/decisionfacts.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"![Semantic AI Logo](https://github.com/decisionfacts/semantic-ai/blob/master/docs/source/_static/images/createLLM.png?raw=True)\n# Semantic AI Lib\n\n[![Python version](https://img.shields.io/badge/python-3.10-green)](https://img.shields.io/badge/python-3.10-green)[![PyPI version](https://badge.fury.io/py/semantic-ai.svg)](https://badge.fury.io/py/semantic-ai)[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)\n\nAn open-source framework for Retrieval-Augmented System (RAG) uses semantic search to retrieve the expected results and generate human-readable conversational responses with the help of LLM (Large Language Model).\n\n**Semantic AI Library Documentation [Docs here](https://docs-semantic-ai.decisionfacts.ai/)**\n\n## Requirements\n\nPython 3.10+ asyncio\n\n## Installation\n```shell\n# Using pip\n$ python -m pip install semantic-ai\n\n# Manual install\n$ python -m pip install .\n```\n# Set the environment variable\nSet the credentials in .env file. Only give the credential for an one connector, an one indexer and an one llm model config. other fields put as empty\n```shell\n# Default\nFILE_DOWNLOAD_DIR_PATH= # default directory name 'download_file_dir'\nEXTRACTED_DIR_PATH= # default directory name 'extracted_dir'\n\n# Connector (SharePoint, S3, GCP Bucket, GDrive, Confluence etc.,)\nCONNECTOR_TYPE=\"connector_name\" # sharepoint\nSHAREPOINT_CLIENT_ID=\"client_id\"\nSHAREPOINT_CLIENT_SECRET=\"client_secret\"\nSHAREPOINT_TENANT_ID=\"tenant_id\"\nSHAREPOINT_HOST_NAME='\u003ctenant_name\u003e.sharepoint.com'\nSHAREPOINT_SCOPE='https://graph.microsoft.com/.default'\nSHAREPOINT_SITE_ID=\"site_id\"\nSHAREPOINT_DRIVE_ID=\"drive_id\"\nSHAREPOINT_FOLDER_URL=\"folder_url\" # /My_folder/child_folder/\n\n# Indexer\nINDEXER_TYPE=\"\u003cvector_db_name\u003e\" # elasticsearch, qdrant, opensearch\nELASTICSEARCH_URL=\"\u003celasticsearch_url\u003e\" # give valid url\nELASTICSEARCH_USER=\"\u003celasticsearch_user\u003e\" # give valid user\nELASTICSEARCH_PASSWORD=\"\u003celasticsearch_password\u003e\" # give valid password\nELASTICSEARCH_INDEX_NAME=\"\u003cindex_name\u003e\"\nELASTICSEARCH_SSL_VERIFY=\"\u003cssl_verify\u003e\" # True or False\n\n# Qdrant\nQDRANT_URL=\"\u003cqdrant_url\u003e\"\nQDRANT_INDEX_NAME=\"\u003cindex_name\u003e\"\nQDRANT_API_KEY=\"\u003capikey\u003e\"\n\n# Opensearch\nOPENSEARCH_URL=\"\u003copensearch_url\u003e\"\nOPENSEARCH_USER=\"\u003copensearch_user\u003e\"\nOPENSEARCH_PASSWORD=\"\u003copensearch_password\u003e\"\nOPENSEARCH_INDEX_NAME=\"\u003cindex_name\u003e\"\n\n# LLM\nLLM_MODEL=\"\u003cllm_model\u003e\" # llama, openai\nLLM_MODEL_NAME_OR_PATH=\"\" # model name\nOPENAI_API_KEY=\"\u003copenai_api_key\u003e\" # if using openai\n\n# SQL\nSQLITE_SQL_PATH=\"\u003cdatabase_path\u003e\" # sqlit db path\n\n# MYSQL\nMYSQL_HOST=\"\u003chost_name\u003e\" # localhost or Ip Address\nMYSQL_USER=\"\u003cuser_name\u003e\"\nMYSQL_PASSWORD=\"\u003cpassword\u003e\"\nMYSQL_DATABASE=\"\u003cdatabase_name\u003e\"\nMYSQL_PORT=\"\u003cport\u003e\" # default port is 3306\n\n```\nMethod 1:\n    To load the .env file. Env file should have the credentials\n```shell\n%load_ext dotenv\n%dotenv\n%dotenv relative/or/absolute/path/to/.env\n\n(or)\n\ndotenv -f .env run -- python\n```\nMethod 2:\n```python\nfrom semantic_ai.config import Settings\nsettings = Settings()\n```\n\n# Un-Structure \n### 1. Import the module\n```python\nimport asyncio\nimport semantic_ai\n```\n\n### 2. To download the files from a given source, extract the content from the downloaded files and index the extracted data in the given vector db.\n```python\nawait semantic_ai.download()\nawait semantic_ai.extract()\nawait semantic_ai.index()\n```\nAfter completion of download, extract and index, we can generate the answer from indexed vector db. That code given below.\n### 3. To generate the answer from indexed vector db using retrieval LLM model.\n```python\nsearch_obj = await semantic_ai.search()\nquery = \"\"\nsearch = await search_obj.generate(query)\n```\nSuppose the job is running for a long time, we can watch the number of files processed, the number of files failed, and that filename stored in the text file that is processed and failed in the 'EXTRACTED_DIR_PATH/meta' directory.\n\n### Example\nTo connect the source and get the connection object. We can see that in the examples folder.\nExample: SharePoint connector\n```python\nfrom semantic_ai.connectors import Sharepoint\n\nCLIENT_ID = '\u003cclient_id\u003e'  # sharepoint client id\nCLIENT_SECRET = '\u003cclient_secret\u003e'  # sharepoint client seceret\nTENANT_ID = '\u003ctenant_id\u003e'  # sharepoint tenant id\nSCOPE = 'https://graph.microsoft.com/.default'  # scope\nHOST_NAME = \"\u003ctenant_name\u003e.sharepoint.com\"  # for example 'contoso.sharepoint.com'\n\n# Sharepoint object creation\nconnection = Sharepoint(\n    client_id=CLIENT_ID,\n    client_secret=CLIENT_SECRET,\n    tenant_id=TENANT_ID,\n    host_name=HOST_NAME,\n    scope=SCOPE\n)\n```\n\n# Structure\n\n### 1. Import the module\n```python\nimport asyncio\nimport semantic_ai\n```\n\n### 2. The database connection  \n\n#### Sqlite:\n```python\nfrom semantic_ai.connectors import Sqlite\n\nfile_path= \u003cdatabase_file_path\u003e\n\nsql = Sqlite(sql_path=file_path)\n```\n\n#### Mysql:\n```python\nfrom semantic_ai.connectors import Mysql\n\nsql = Mysql(\n    host=\u003chost_name\u003e,\n    user=\u003cuser_name\u003e,\n    password=\u003cpassword\u003e,\n    database=\u003cdatabase\u003e,\n    port=\u003cport_number\u003e  # 3306 is default port\n)\n```\n\n### 3. To generate the answer from db using retrieval LLM model.\n```python\nquery = \"\"\nsearch_obj = await semantic_ai.db_search(query=query)\n```\n\n## Run in the server\n```shell\n$ semantic_ai serve -f .env\n\nINFO:     Loading environment from '.env'\nINFO:     Started server process [43973]\nINFO:     Waiting for application startup.\nINFO:     Application startup complete.\nINFO:     Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)\n```\nOpen your browser at http://127.0.0.1:8000/semantic-ai\n\n### Interactive API docs\nNow go to http://127.0.0.1:8000/docs.\nYou will see the automatic interactive API documentation (provided by Swagger UI):\n![Swagger UI](https://github.com/decisionfacts/semantic-ai/blob/master/docs/source/_static/images/img.png?raw=True)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdecisionfacts%2Fsemantic-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdecisionfacts%2Fsemantic-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdecisionfacts%2Fsemantic-ai/lists"}