{"id":19517301,"url":"https://github.com/tomlin7/ai-research-assistant","last_synced_at":"2026-02-19T00:03:31.770Z","repository":{"id":261929474,"uuid":"885689501","full_name":"tomlin7/AI-research-assistant","owner":"tomlin7","description":"Semantic document search system with pgvector and PGAI","archived":false,"fork":false,"pushed_at":"2024-11-09T23:46:30.000Z","size":52,"stargazers_count":2,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-07T11:08:22.925Z","etag":null,"topics":["ai","assistant","document-search","machine-learning","natural-language-processing","ollama","pgai","pgvector","postgres","postgresql","research-assistant","semantic-search","sentence-embeddings","sentence-transformers","sentiment-analysis","summarization","text-similarity","text-summarization"],"latest_commit_sha":null,"homepage":"https://semantic-doc-search.streamlit.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tomlin7.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-09T06:00:10.000Z","updated_at":"2025-03-10T07:07:23.000Z","dependencies_parsed_at":"2024-11-10T00:22:45.483Z","dependency_job_id":null,"html_url":"https://github.com/tomlin7/AI-research-assistant","commit_stats":null,"previous_names":["tomlin7/pgv-test","tomlin7/ai-research-assistant"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/tomlin7/AI-research-assistant","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomlin7%2FAI-research-assistant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomlin7%2FAI-research-assistant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomlin7%2FAI-research-assistant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomlin7%2FAI-research-assistant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tomlin7","download_url":"https://codeload.github.com/tomlin7/AI-research-assistant/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tomlin7%2FAI-research-assistant/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":276085212,"owners_count":25582509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-20T02:00:10.207Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["ai","assistant","document-search","machine-learning","natural-language-processing","ollama","pgai","pgvector","postgres","postgresql","research-assistant","semantic-search","sentence-embeddings","sentence-transformers","sentiment-analysis","summarization","text-similarity","text-summarization"],"created_at":"2024-11-11T00:01:32.338Z","updated_at":"2025-09-20T10:36:44.456Z","avatar_url":"https://github.com/tomlin7.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI Research Assistant with Semantic Document Search System\n\n*This is a submission for the [Open Source AI Challenge with pgai and Ollama](https://dev.to/challenges/pgai)*\n\n## What I Built\n\nThis is an AI based research assistant with a semantic document search system for smart document storage and retrieval using natural language queries. [**Ollama**](https://ollama.com/) is integrated into the assistant to summarise, and generate sentiment analysis, key points, related topics for provided content. [Streamlit](https://streamlit.io/) is used to provide a minimalistic user interface.\n\nYou can use natural language to search data stored in the PostgreSQL database. Uses **pgvector** for vector similarity search, [**pgai**](https://github.com/timescale/pgai) through **TimescaleDB** for search AI features. It is very helpful in cases where you have to manage and search through large collections of documents based on meaning rather than just keywords.\n\n**Key Features:**\n- Uses Ollama to summarise docs, and generate sentiment analysis, key points, and related topics\n- Semantic search capability using document embeddings, powered by pgai\n- Batch document processing (directly upload CSV files)\n- User-friendly interface built with Streamlit\n- Document addition and indexing from GUI\n- Rich metadata support for categorization\n- Simple table view and a detailed view for data\n- Scalable vector search using both pgvector's IVFFlat indexing and the [**pgvectorscale**](https://github.com/timescale/pgvectorscale) extension\n\nAlthough initially the idea was to develop a semantic document search system, later on I decided to extend this to an AI research assistant featuring the same document search system along with Ollama integration.\n\n## Demo\n\nBecause of problems with hosting Ollama along with the assistant app, only the semantic document search tool demo is hosted.\n- Thanks to Streamlit community cloud, [**visit the demo**](https://semantic-doc-search.streamlit.app) ⭐\n\n![assistant](https://github.com/user-attachments/assets/c58ff3ae-b122-466b-a088-c9e31b80b60f)\n\n![document search tool](https://github.com/user-attachments/assets/ecad7f26-ac7c-4aff-8a2e-d6ad44ba406a)\n\n## Tools Used\n\n### Ollama + pgvector + pgai + Streamlit\n- [**Ollama**](https://ollama.com/) is integrated into the assistant to summarise, and generate sentiment analysis, key points, related topics for provided content. \n- [TimescaleDB](https://www.timescale.com/) (PostgreSQL) for primary database (can be configured for self hosted psql as well)\n- [pgvector](https://github.com/pgvector/pgvector) for efficient vector similarity search\n- [pgai](https://github.com/timescale/pgai) through TimescaleDB for AI\n- [Streamlit](https://streamlit.io/) for the web interface\n\n### Key Technologies\n1. **Database Layer**\n   - pgvector extension for vector operations\n   - pgai extension for AI features\n   - IVFFlat indexing for efficient similarity search\n   - JSONB data type for flexible metadata storage\n\n2. **Machine Learning**\n   - [Sentence-Transformers](https://github.com/UKPLab/sentence-transformers) (`all-MiniLM-L6-v2 model`)\n   - 384-dimensional embeddings for semantic representation\n\n3. **Backend**\n   - Python 3.12+\n   - psycopg2 for PostgreSQL interaction\n   - Vector similarity calculations using cosine distance\n\n4. **Frontend**\n   - Streamlit for the web interface\n   - Pandas for data display\n   - Download data as CSV files\n\n## Installation\n\n### Using Timescale Cloud\n\n1. **Create a Timescale Service**\n   - Open [Timescale Cloud Console](https://console.cloud.timescale.com/) and create a service\n   - In the **AI** tab, enable `ai`, `vector` extensions\n   - Pick Python app and copy the database connection URL\n\n2. **Configure Environment**\n   Edit the `src/.env` file with the copied URL\n   ```bash\n   PSQL_URL=postgres://username:password@hostname:port/dbname?sslmode=require\n   ```\n3. Install Ollama and any model (make sure its added to script) for assistant\n   ```\n   curl -fsSL https://ollama.com/install.sh | sh\n   ollama pull mistral\n   ollama serve\n   ```\n3. **Install Requirements**\n   ```bash\n   pip install -r requirements.txt\n   # or if you have poetry\n   poetry install \u0026\u0026 poetry shell\n   ```\n\n4. **Run the Assistant**\n   ```bash\n   cd src\n   streamlit run assistant.py\n   ```\n   **Run the Document Search Tool**\n   ```bash\n   cd src\n   streamlit run main.py\n   ```\n\n### Self-Hosted PostgreSQL\n\n1. **Install PostgreSQL and Extensions**\n   ```bash\n   # Install PostgreSQL\n   sudo apt-get install postgresql postgresql-common\n\n   # Install pgvector\n   sudo /usr/share/postgresql-common/pgdg/apt.postgresql.org.sh\n   sudo apt install postgresql-12-pgvector\n\n   # Install pgai\n   # https://github.com/timescale/pgai/tree/main?tab=readme-ov-file#install-from-source\n   ```\n\n2. **Configure Database**\n   ```sql\n   CREATE EXTENSION IF NOT EXISTS vector;\n   CREATE EXTENSION IF NOT EXISTS ai CASCADE;\n   ```\n\n3. **Configure Environment**\n   ```bash\n   PSQL_URL=postgresql://user:password@localhost:5432/dbname\n   ```\n   or configure within script\n   ```py\n    db_params = {\n        'dbname': 'dbname',\n        'user': 'postgres',\n        'password': 'your_password',\n        'host': 'localhost',\n        'port': '5432'\n    }\n    ```\n\n## Final Thoughts\n\nThis project is about integrating AI vector search features with traditional databases (which are hard to get used to). The same tool is used to create an AI research assistant with Ollama integration. This is a very helpful tool for content management systems where you need to manage and search through large collections of documents. Integration of pgvector and pgai provides a strong solution.\n\n### TODO\n\n- [ ] Better visualization of results using charts and stuff\n- [x] Batch document processing (import CSV)\n- [ ] Delete, update documents functionality\n- [ ] Filtering based on metadata as well\n- [ ] More use cases of pgai\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomlin7%2Fai-research-assistant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftomlin7%2Fai-research-assistant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftomlin7%2Fai-research-assistant/lists"}