{"id":23567141,"url":"https://github.com/godeltech/semantic_qa","last_synced_at":"2025-11-01T22:30:43.492Z","repository":{"id":224134411,"uuid":"696526618","full_name":"GodelTech/semantic_qa","owner":"GodelTech","description":"A small semantic Q\u0026A demo using langchain","archived":false,"fork":false,"pushed_at":"2024-05-11T09:53:03.000Z","size":68,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2024-12-26T18:19:05.666Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GodelTech.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-09-25T23:34:51.000Z","updated_at":"2024-05-11T09:53:06.000Z","dependencies_parsed_at":null,"dependency_job_id":"5655a126-1ac9-466a-8b55-86c31e14f20b","html_url":"https://github.com/GodelTech/semantic_qa","commit_stats":null,"previous_names":["godeltech/semantic_qa"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GodelTech%2Fsemantic_qa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GodelTech%2Fsemantic_qa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GodelTech%2Fsemantic_qa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GodelTech%2Fsemantic_qa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GodelTech","download_url":"https://codeload.github.com/GodelTech/semantic_qa/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239340504,"owners_count":19622702,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-26T18:19:07.559Z","updated_at":"2025-11-01T22:30:43.445Z","avatar_url":"https://github.com/GodelTech.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# semantic_qa\nA small semantic Q\u0026amp;A demo using langchain and openai\n\n## Prerequisites\n\nThe only hard requirements are:\n\n1. Python 3.10+ with pip and virtualenv.\n2. An [OpenAI API](https://openai.com/product) key. Although this will require setting up a payment plan with a credit card, per-call costs are [very low](https://openai.com/pricing).\n  \nAlthough it's not a pre-requisite, having a CUDA-compatible GPU is strongly advised to generate text embeddings locally using larger models.\n\n## Dependencies\n\nAfter cloning this repo, create and activate a Python virtual environment, then install the required Python packages using pip:\n\n```Powershell\nPS \u003e virtualenv venv\nPS \u003e venv\\scripts\\activate.ps1\n(venv) PS \u003e pip install -r requirements_dev.txt\n(venv) PS \u003e pip install -r requirements.txt\n```\n\n```sh\n$ virtualenv venv\n$ source venv/bin/activate\n(venv) $ pip install -r requirements_dev.txt\n(venv) $ pip install -r requirements.txt\n```\n\nIf you have a CUDA-capable GPU, install the right torch packages as described at https://pytorch.org/get-started/locally/ \n\n```\npip install -U --force-reinstall --no-deps torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118\n```\n\n## Document corpus\n\nChoose whatever document folder you fancy. Why not try a local copy of Godel's security policies from Sharepoint? \n\n## Vector database\n\nThe demo currently supports the following vector stores:\n\n * [ChromaDB](https://www.trychroma.com/)\n * [Pgvector](https://github.com/pgvector/pgvector). [Set-up instructions](vector_stores_howtos/pgvector.md)\n * [Redis Stack](https://redis.io/docs/about/about-stack/). [Set-up instructions](vector_stores_howtos/redis-stack.md)\n * [Pinecone](https://www.pinecone.io/). [Set-up instructions](vector_stores_howtos/pinecone.md)\n * [MongoDB Atlas](https://www.mongodb.com/atlas/database). [Set-up instructions](vector_stores_howtos/mongodb_atlas.md)\n * [Elasticsearch](https://www.elastic.co/elasticsearch/vector-database). [Set-up instructions](vector_stores_howtos/elasticsearch.md)\n * [Neo4j](https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/). [Set-up instructions](vector_stores_howtos/neoj4.md)\n\nThe first one uses file-based SQLite for storage and does not require any work. All the others need some set up, detailed in the links above.\n\n## Embeddings generator models\n\nThe demo currently supports:\n\n * Calling the [OpenAI embeddings API](https://platform.openai.com/docs/api-reference/embeddings), which requires an API key and a payment plan, using the model [\"text-embedding-ada-002\"](https://openai.com/blog/new-and-improved-embedding-model) by default\n * Generating embeddings locally using torch and a pre-trained model downloaded from [Hugging Face](https://huggingface.co/models). The default model is [\"all-MiniLM-L6-v2\"](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2)\n * Generating embeddings locally using torch and one of the pre-trained [Instructor](https://github.com/HKUNLP/instructor-embedding) models. The default used is [\"hkunlp/instructor-large\"](https://github.com/HKUNLP/instructor-embedding#model-list)\n\n## Running the demo\n\n```\n(venv) $\u003e python semantic_qa.py\n```\n\nThe first time it runs, leave REBUILD = True to ensure the script iterates over the files in the corpus and generates the embeddings. In successive runs, you can change REBUILD = False and just test different values of QUERY_STR or tweaks to the GPT prompt.\n\n## Web UI\n\n```\n(venv) $\u003e chainlit run ./chainlit_app.py -w\n```\n\nThis will run a small web UI on port 8000 by default.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgodeltech%2Fsemantic_qa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgodeltech%2Fsemantic_qa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgodeltech%2Fsemantic_qa/lists"}