{"id":15175566,"url":"https://github.com/redis-developer/redis-arxiv-search","last_synced_at":"2025-09-17T23:23:17.283Z","repository":{"id":61554225,"uuid":"530374092","full_name":"redis-developer/redis-arXiv-search","owner":"redis-developer","description":"Vector search demo with the arXiv paper dataset, RedisVL, HuggingFace, OpenAI, Cohere, FastAPI, React, and Redis.","archived":false,"fork":false,"pushed_at":"2025-02-25T20:53:33.000Z","size":1024,"stargazers_count":143,"open_issues_count":7,"forks_count":23,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-04-05T01:05:27.667Z","etag":null,"topics":["arxiv","arxiv-papers","cohere","document-retrieval","document-search","huggingface","machine-learning","nlp","openai","react","redis","vector-database","vector-search"],"latest_commit_sha":null,"homepage":"https://docsearch.redisvl.com","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"bsd-3-clause","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/redis-developer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-08-29T19:58:55.000Z","updated_at":"2025-02-25T20:53:34.000Z","dependencies_parsed_at":"2024-12-03T17:55:42.097Z","dependency_job_id":"adb83c6b-4821-486c-b952-9fd19a1bed93","html_url":"https://github.com/redis-developer/redis-arXiv-search","commit_stats":{"total_commits":20,"total_committers":3,"mean_commits":6.666666666666667,"dds":0.09999999999999998,"last_synced_commit":"0fa0d9f78e259316c91b4455424bb8f325b87895"},"previous_names":["redis-developer/redis-arxiv-search"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redis-developer%2Fredis-arXiv-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redis-developer%2Fredis-arXiv-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redis-developer%2Fredis-arXiv-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/redis-developer%2Fredis-arXiv-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/redis-developer","download_url":"https://codeload.github.com/redis-developer/redis-arXiv-search/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247271526,"owners_count":20911587,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["arxiv","arxiv-papers","cohere","document-retrieval","document-search","huggingface","machine-learning","nlp","openai","react","redis","vector-database","vector-search"],"created_at":"2024-09-27T12:39:29.865Z","updated_at":"2025-09-17T23:23:17.267Z","avatar_url":"https://github.com/redis-developer.png","language":"Python","funding_links":[],"categories":["Demos"],"sub_categories":[],"readme":"\n\u003cdiv align=\"center\"\u003e\n    \u003ca href=\"https://github.com/redis-developer/redis-arxiv-search\"\u003e\u003cimg src=\"https://redis.io/wp-content/uploads/2024/04/Logotype.svg?raw=true\" width=\"30%\"\u003e\u003cimg\u003e\u003c/a\u003e\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n    \u003ch1\u003e🔎 arXiv Search API\u003c/h1\u003e\n\u003cdiv display=\"inline-block\"\u003e\n    \u003ca href=\"https://docsearch.redisvl.com\"\u003e\u003cb\u003eHosted Demo\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\n    \u003ca href=\"https://github.com/redis-developer/redis-arxiv-search\"\u003e\u003cb\u003eCode\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\n    \u003ca href=\"https://github.com/redis-developer/redis-ai-resources\"\u003e\u003cb\u003eMore AI Recipes\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\n    \u003ca href=\"https://datasciencedojo.com/blog/ai-powered-document-search/\"\u003e\u003cb\u003eBlog Post\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\n    \u003ca href=\"https://redis.io/docs/interact/search-and-query/advanced-concepts/vectors/\"\u003e\u003cb\u003eRedis Vector Search Documentation\u003c/b\u003e\u003c/a\u003e\u0026nbsp;\u0026nbsp;\u0026nbsp;\n  \u003c/div\u003e\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n\u003c/div\u003e\n\n\n*This repository is the official codebase for the arxiv paper search app hosted at: **https://docsearch.redisvl.com***\n\n\n[Redis](https://redis.com) is a highly performant, production-ready vector database, which can be used for many types of applications. Here we showcase Redis vector search applied to a document retrieval use case. Read more about AI-powered search in [the technical blog post](https://datasciencedojo.com/blog/ai-powered-document-search/) published by our partners, *[Data Science Dojo](https://datasciencedojo.com)*.\n\n### Dataset\n\nThe arXiv papers dataset was sourced from the the following [Kaggle link](https://www.kaggle.com/Cornell-University/arxiv). arXiv is commonly used for scientific research in a variety of fields. Exposing a semantic search layer enables natural human language to be used to discover relevant papers.\n\n\n## Application\n\nThis app was built as a Single Page Application (SPA) with the following components:\n\n- **[Redis Stack](https://redis.io/docs/stack/)** for vector database\n- **[RedisVL](https://redisvl.com)** for Python vector db client\n- **[FastAPI](https://fastapi.tiangolo.com/)** for Python API\n- **[Pydantic](https://pydantic-docs.helpmanual.io/)** for schema and validation\n- **[React](https://reactjs.org/)** (with Typescript)\n- **[Docker Compose](https://docs.docker.com/compose/)** for development\n- **[MaterialUI](https://material-ui.com/)** for some UI elements/components\n- **[React-Bootstrap](https://react-bootstrap.github.io/)** for some UI elements\n- **[Huggingface](https://huggingface.co/sentence-transformers)**, **[OpenAI](https://platform.openai.com)**, and **[Cohere](https://cohere.com)** for vector embedding creation\n\nSome inspiration was taken from this [tiangolo/full-stack-fastapi-template](https://github.com/tiangolo/full-stack-fastapi-template)\nand turned into a SPA application instead of a separate front-end server approach.\n\n### General Project Structure\n\n```\n/backend\n    /arxivsearch\n        /api\n            /routes\n                papers.py # primary paper search logic lives here\n        /db\n            load.py # seeds Redis DB\n            redis_helpers.py # redis util\n        /schema\n            # pydantic models for serialization/validation from API\n        /tests\n        /utils\n        config.py\n        spa.py # logic for serving compiled react project\n        main.py # entrypoint\n/frontend\n    /public\n        # index, manifest, logos, etc.\n    /src\n        /config\n        /styles\n        /views\n            # primary components live here\n\n        api.ts # logic for connecting with BE\n        App.tsx # project entry\n        Routes.tsk # route definitions\n        ...\n/data\n    # folder mounted as volume in Docker\n    # load script auto populates initial data from S3\n\n```\n\n### Embedding Providers\nEmbeddings represent the semantic properies of the raw text and enable vector similarity search. This applications supports `HuggingFace`, `OpenAI`, and `Cohere` embeddings out of the box.\n\n| Provider        | Embedding Model           | Required?  |\n| ------------- |-------------| ----- |\n| HuggingFace      | `sentence-transformers/all-mpnet-base-v2` | Yes |\n| OpenAI      | `text-embedding-ada-002`      |   Yes |\n| Cohere | `embed-multilingual-v3.0`      |    Yes |\n\n**Interested in a different embedding provider?** Feel free to open a PR and make a suggested addition.\n\n**Want to use a different model than the one listed?** Set the following environment variables in your `.env` file (see below) to change:\n\n- `SENTENCE_TRANSFORMER_MODEL`\n- `OPENAI_EMBEDDING_MODEL`\n- `COHERE_EMBEDDING_MODEL`\n\n\n## 🚀 Running the App\n1. Before running the app, install [Docker Desktop](https://www.docker.com/products/docker-desktop/).\n2. Clone (and optionally fork) this Github repo to your machine.\n    ```bash\n    $ git clone https://github.com/redis-developer/redis-arxiv-search\n    ```\n3. Make a copy of the `.env.template` file:\n    ```bash\n    $ cd redis-arXiv-search/\n    $ cp .env.template .env\n    ```\n    - Add your `OPENAI_API_KEY` to the `.env` file. **Need one?** [Get an API key](https://platform.openai.com)\n    - Add you `COHERE_API_KEY` to the `.env` file. **Need one?** [Get an API key](https://cohere.ai)\n\n### Run locally with Redis 8 CE\n```bash\nmake deploy\n```\n\n\n## Customizing (optional)\n\n### Run local redis with Docker\n```bash\ndocker run -d --name redis -p 6379:6379 -p 8001:8001 redis:8.0-M03\n```\n\n### FastApi with poetry\nTo run the backend locally\n\n1. `cd backend`\n2. `poetry install`\n3. `poetry run start-app`\n\n*poetry run start-app runs the initial db load script and launch the API*\n\n### React Dev Environment\nIt's typically easier to build front end in an interactive environment, testing changes in realtime.\n\n1. Deploy the app using steps above.\n2. Install packages\n    ```bash\n    $ cd frontend/\n    $ npm install\n    ````\n4. Use `npm` to serve the application from your machine\n    ```bash\n    $ npm run start\n    ```\n5. Navigate to `http://localhost:3000` in a browser.\n\nAll changes to your frontend code will be reflected in your display in semi realtime.\n\n\n### Troubleshooting\nEvery once and a while you need to clear out some Docker cached artifacts. Run `docker system prune`, restart Docker Desktop, and try again.\n\nThis project is maintained by Redis on a good faith basis. Please, open an issue here on GitHub and we will try to be responsive to these.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fredis-developer%2Fredis-arxiv-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fredis-developer%2Fredis-arxiv-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fredis-developer%2Fredis-arxiv-search/lists"}