{"id":24709102,"url":"https://github.com/leettools-dev/leettools","last_synced_at":"2026-04-02T03:19:37.134Z","repository":{"id":269108266,"uuid":"905538292","full_name":"leettools-dev/leettools","owner":"leettools-dev","description":"AI Search tools.","archived":false,"fork":false,"pushed_at":"2025-11-03T19:51:20.000Z","size":1355,"stargazers_count":340,"open_issues_count":10,"forks_count":28,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-12-15T17:29:17.371Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/leettools-dev.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-19T03:16:10.000Z","updated_at":"2025-12-03T13:16:37.000Z","dependencies_parsed_at":null,"dependency_job_id":"f4d624aa-c1a8-4199-9b9c-d564df9157d0","html_url":"https://github.com/leettools-dev/leettools","commit_stats":null,"previous_names":["leettools-dev/leettools"],"tags_count":4,"template":false,"template_full_name":null,"purl":"pkg:github/leettools-dev/leettools","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leettools-dev%2Fleettools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leettools-dev%2Fleettools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leettools-dev%2Fleettools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leettools-dev%2Fleettools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/leettools-dev","download_url":"https://codeload.github.com/leettools-dev/leettools/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/leettools-dev%2Fleettools/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31295055,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T01:43:37.129Z","status":"online","status_checked_at":"2026-04-02T02:00:08.535Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-27T07:00:59.199Z","updated_at":"2026-04-02T03:19:37.114Z","avatar_url":"https://github.com/leettools-dev.png","language":"Python","funding_links":[],"categories":["\u003ca name=\"ai\"\u003e\u003c/a\u003eAI / ChatGPT"],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"docs/assets/LTC-Logo-leettools-normal.png\" alt=\"Logo\" width=\"200\"/\u003e\n\u003c/p\u003e\n\n\n[![Follow on X](https://img.shields.io/twitter/follow/LeetTools?logo=X\u0026color=%20%23f5f5f5)](https://twitter.com/intent/follow?screen_name=LeetTools)\n[![GitHub license](https://img.shields.io/badge/License-Apache_2.0-blue.svg?labelColor=%20%23155EEF\u0026color=%20%23528bff)](https://github.com/leettools-dev/leettools)\n\n- [AI Search Assistant with Local Knowledge Bases](#ai-search-assistant-with-local-knowledge-bases)\n- [Quick Start](#quick-start)\n- [Use Different LLM and Search Providers](#use-different-llm-and-search-providers)\n  - [Use local Ollama service for inference and embedding](#use-local-ollama-service-for-inference-and-embedding)\n  - [Use DeepSeek API with different embedding services](#use-deepseek-api-with-different-embedding-services)\n  - [Use Google / FireCrawl as the default web retriever](#use-google--firecrawl-as-the-default-web-retriever)\n- [Usage Examples](#usage-examples)\n  - [Build a local knowledge base using PDFs from the web](#build-a-local-knowledge-base-using-pdfs-from-the-web)\n  - [Generate analytical research reports like OpenAI/Google's Deep Research](#generate-analytical-research-reports-like-openaigoogles-deep-research)\n  - [Generate news list from web search results](#generate-news-list-from-web-search-results)\n- [Main Components](#main-components)\n- [Community](#community)\n\n\n# AI Search Assistant with Local Knowledge Bases\n\nLeetTools is an AI search assistant that can perform highly customizable search workflows\nand generate customized format results based on both web and local knowledge bases. With an\nautomated document pipeline that handles data ingestion, indexing, and storage, we can\nfocus on implementing the workflow without worrying about the underlying infrastructure.\n\nLeetTools can run with minimal resource requirements on the command line with a \nDuckDB-backend and configurable LLM settings. It can also use other dedicated \ndatabases for different functions, e.g., we can use MongoDB for document storage,\nMilvus for vector search, and Neo4j for graph search. We can configure different\nfunctions in the same workflow to use different LLM providers and models.\n\nHere is an illustration of the LeetTools **digest** flow where it can search the web\n(or local KB) and generate a digest article from the search results:\n\n![LeetTools Digest Flow](docs/assets/process-digest.drawio.svg)\n\nAnd here is an example output article generated by the **digest** flow for the query\n[How does Ollama work?](docs/examples/ollama.md).\n\nCurrently LeetTools provides the following workflows:\n\n* answer  : Answer the query directly with source references (similar to Perplexity). [📖](https://leettools-dev.github.io/Flow/answer)\n* digest  : Generate a multi-section digest article from search results (similar to Google Deep Research). [📖](https://leettools-dev.github.io/Flow/digest)\n* search  : Search for top segements that match the query. [📖](https://leettools-dev.github.io/Flow/search)\n* news    : Generate a list of news items for the specified topic. [📖](https://leettools-dev.github.io/Flow/news)\n* extract : Extract and store structured data for given schema. [📖](https://leettools-dev.github.io/Flow/extract)\n* opinions: Generate sentiment analysis and facts from the search results.  [📖](https://leettools-dev.github.io/Flow/opinions)\n\nWe are in the process of implementing a full-automated flow generation pipeline that allows\nusers to generate their own customized flows with natural language prompts. Stay tuned for more updates!\n\n# Quick Start\n\n**Before you start**\n\n- .env file: We can use any OpenAI-compatible LLM endpoint, such as local Ollama service\n  or public provider such as Gemini or DeepSeek. we can switch the service easily by \n  [defining environment variables or switching .env files](#use-different-llm-endpoints). \n\n- LeetHome: By default the data is saved under ${HOME}/leettools, you can set a different \n  LeetHome environment variable to change the location:\n\n```bash\n% export LEET_HOME=\u003cyour_leet_home\u003e\n% mkdir -p ${LEET_HOME}\n```\n\n**🚀 New: Run LeetTools Web UI with Docker 🚀**\n\nLeetTools now provides a Docker container that includes the web UI. You can start the \ncontainer by running the following command:\n\n```bash\ndocker/start.sh\n```\n\nThis will start the LeetTools service and the web UI. You can access the web UI at \n[http://localhost:3000](http://localhost:3000). The web UI app is currently under development\nand not open sourced yet. We plan to open source it in the near future.\n\n**Run with pip**\n\nIf you are using an OpenAI compatible LLM endpoint, you can install and run LeetTools \nwith pip as follows (using Conda/Venv is recommended):\n\n```bash\n% conda create -y -n leettools python=3.11\n% conda activate leettools\n% pip install leettools\n% export EDS_LLM_API_KEY=\u003cyour_api_key\u003e\n% leet flow -t answer -q \"How does GraphRAG work?\" -k graphrag -l info\n```\n\nThe above `flow -t answer` command will run the `answer` flow with the query \"How does\nGraphRAG work?\" and save the scraped web pages to the knowledge base `graphrag`. The\n`-l info` option will show the essential log messages.\n\nThe default API endpoint is set to the OpenAI API endpoint, which you can modify by\nchanging the `EDS_DEFAULT_LLM_BASE_URL` environment variable:\n\n```bash\n% export EDS_DEFAULT_LLM_BASE_URL=https://api.openai.com/v1\n```\n\n**Run with source code**\n\n```bash\n% git clone https://github.com/leettools-dev/leettools.git\n% cd leettools\n\n% conda create -y -n leettools python=3.11\n% conda activate leettools\n% pip install -r requirements.txt\n% pip install -e .\n# add the script path to the path\n% export PATH=`pwd`/scripts:${PATH}\n% export EDS_LLM_API_KEY=\u003cyour_api_key\u003e\n\n% leet flow -t answer -q \"How does GraphRAG work?\" -k graphrag -l info\n```\n\n# Use Different LLM and Search Providers\n\nWe can run LeetTools with different env files to use different LLM providers and other\nrelated settings.\n\n## Use local Ollama service for inference and embedding\n\n```bash\n# you may need to pull the models first\n% ollama pull llama3.2\n% ollama pull nomic-embed-text\n% ollama serve\n\n% cat \u003e .env.ollama \u003c\u003cEOF\nEDS_DEFAULT_LLM_BASE_URL=http://localhost:11434/v1\nEDS_LLM_API_KEY=dummy-llm-api-key\nEDS_DEFAULT_INFERENCE_MODEL=llama3.2\nEDS_DEFAULT_EMBEDDING_MODEL=nomic-embed-text\nEDS_EMBEDDING_MODEL_DIMENSION=768\nEOF\n\n# Then run the command with the -e option to specify the .env file to use\n% leet flow -e .env.ollama -t answer -q \"How does GraphRAG work?\" -k graphrag.ollama -l info\n```\n\n## Use DeepSeek API with different embedding services\n\nFor another example, since DeepSeek does not provide an embedding endpoint yet, we can\nuse the \"EDS_DEFAULT_DENSE_EMBEDDER\" setting to specify a local embedder with a default\n`all-MiniLM-L6-v2` model:\n\n```bash\n### to you can put the settings in the .env.deepseek file\n% cat \u003e .env.deepseek \u003c\u003cEOF\nLEET_HOME=\u003c/Users/myhome/leettools\u003e\nEDS_DEFAULT_LLM_BASE_URL=https://api.deepseek.com/v1\nEDS_LLM_API_KEY=\u003cyour-api-key\u003e\nEDS_DEFAULT_INFERENCE_MODEL=deepseek-chat\nEDS_DEFAULT_DENSE_EMBEDDER=dense_embedder_local_mem\nEOF\n\n# Then run the command with the -e option to specify the .env file to use\n% leet flow -e .env.deepseek -t answer -q \"How does GraphRAG work?\" -k graphrag -l info\n```\n\nIf you want to use another API provider (OpenAI compatible) for embedding, say a local\nOllama embedder, you can set the embedding endpoint URL and API key separately as follows:\n\n```bash\n% cat \u003e .env.deepseek \u003c\u003cEOF\nEDS_DEFAULT_LLM_BASE_URL=https://api.deepseek.com/v1\nEDS_LLM_API_KEY=\u003cyour-api-key\u003e\nEDS_DEFAULT_INFERENCE_MODEL=deepseek-chat\n\n# this specifies to use an OpenAI compatible embedding endpoint\nEDS_DEFAULT_DENSE_EMBEDDER=dense_embedder_openai\n\n# the following specifies the embedding endpoint URL and model to use\nEDS_DEFAULT_EMBEDDING_BASE_URL=http://localhost:11434/v1\nEDS_DEFAULT_EMBEDDING_MODEL=nomic-embed-text\nEDS_EMBEDDING_MODEL_DIMENSION=768\nEOF\n```\n\n## Use Google / FireCrawl as the default web retriever\n\nThe search engine is `google` by default, which can be set by the following environment\nvariable:\n\n```bash\nexport EDS_WEB_RETRIEVER=google\nexport EDS_SEARCH_API_URL=https://www.googleapis.com/customsearch/v1\nexport EDS_GOOGLE_CX_KEY=\u003cyour-google-cx-key\u003e\nexport EDS_GOOGLE_API_KEY=\u003cyour-google-api-key\u003e\n```\n\nWe can also use the FireCrawl search as the default web retriever instead of the default\nGoogle search by setting the following environment variables:\n\n```bash\nexport EDS_WEB_RETRIEVER=firecrawl\nexport EDS_FIRECRAWL_API_URL=https://api.firecrawl.dev\nexport EDS_FIRECRAWL_API_KEY=your_firecrawl_api_key\n```\n\nHere is a detailed example of [using FireCrawl with Ollama to run a deep research](docs/use_firecrawl.md).\n\nBy default we provide a shared proxy search service that can be used for testing purposes.\nUsers should use their own search services for production use.\n\n\n# Usage Examples\n\n## Build a local knowledge base using PDFs from the web\n\nWe can build a local knowledge base with PDFs from the web. Suppose we have set up\nthe local Ollama service as described [above](#use-local-ollama-service-for-inference-and-embedding),\nnow we can use the following commands to build a local knowledge base with PDFs from the web:\n\n```bash\n# create a KB with a URL\n# the book downloaded here is \"Foundations of Large Language Models\" \n# it has 231 pages and take some time to process\n% leet kb add-url -e .env.ollama -k llmbook -r \"https://arxiv.org/pdf/2501.09223\"\n\n# now you can query the KB with any topic you want to explore\n% leet kb flow -e .env.ollama -t answer -k llmbook -l info \\\n    -q \"How does LLM Finetuning process work?\" \n```\n\nWe have a more [detailed example](docs/run_ollama_with_deepseek_r1.md) to show how to\nuse the local Ollama service with the DeepSeek-r1:1.5B model to build a local knowledge\nbase.\n\n## Generate analytical research reports like OpenAI/Google's Deep Research\n\nWe can generate analytical research reports like OpenAI/Google's Deep Research by using\nthe `digest` flow. Here is an example:\n\n```bash\n% leet flow -e .env.fireworks -t digest -k aijob.fireworks \\\n    -p search_max_results=30 -p days_limit=360 \\\n    -q \"How will agentic AI and generative AI affect our non-tech jobs?\"  \\\n    -l info -o outputs/aijob.fireworks.md\n```\n\nAn example of the output is available [here](docs/examples/deepseek/aijob.fireworks.md),\nand the tutorial to use the DeepSeek API from fireworks.ai for the above command is \navailable [here](docs/run_deepsearch_with_firework_deepseek.md).\n\n## Generate news list from web search results\n\nWe can create a knowledge base with a web search with a date limit, and then generate\na list of news items from the KB. Here is an example:\n\n```bash\nleet flow -t news -q \"LLM GenAI Startups\" -k genai -l info\\\n    -p days_limit=3  -p search_iteration=3 -p search_max_results=100 \\\n    -o llm_genai_news.md\n```\n\nThe query retrieves the latest web pages from the past 3 days up to 100 search result page\nand generates a list of news items from the search results. The output is saved to \nthe `llm_genai_news.md` file. An example of the output is available [here](docs/examples/llm_genai_news.md).\n\n# Main Components\n\nThe main components of the backend include:\n* 🚀 Automated document pipeline to ingest, convert, chunk, embed, and index documents.\n* 🗂️ Knowledge base to manage and serve the indexed documents.\n* 🔍 Search and retrieval library to fetch documents from the web or local KB.\n* 🤖 Workflow engine to implement search-based AI workflows.\n* ⚙ Configuration system to support dynamic configurations used for every component.\n* 📝 Query history system to manage the history and the context of the queries.\n* 💻 Scheduler for automatic execution of the pipeline tasks.\n* 🧩 Accounting system to track the usage of the LLM APIs.\n\nThe architecture of the document pipeline is shown below:\n\n![LeetTools Document Pipeline](https://gist.githubusercontent.com/pengfeng/4b2e36bda389e0a3c338b5c42b5d09c1/raw/6bc06db40dadf995212270d914b46281bf7edae9/leettools-eds-arch.svg)\n\nSee the [Documentation](docs/documentation.md) for more details.\n\n\n# Community\n\n**Acknowledgements**\n\nRight now we are using the following open source libraries and tools (not limited to):\n\n- [DuckDB](https://github.com/duckdb/duckdb)\n- [Docling](https://github.com/DS4SD/docling)\n- [Chonkie](https://github.com/bhavnicksm/chonkie)\n- [Ollama](https://github.com/ollama/ollama)\n- [Jinja2](https://jinja.palletsprojects.com/en/3.0.x/)\n- [BS4](https://www.crummy.com/software/BeautifulSoup/bs4/doc/)\n- [FastAPI](https://github.com/fastapi/fastapi)\n- [Pydantic](https://github.com/pydantic/pydantic)\n\nWe plan to add more plugins for different components to support different workloads.\n\n**Get help and support**\n\nPlease feel free to connect with us using the [discussion section](https://github.com/leettools-dev/leettools/discussions).\n\n\n**Contributing**\n\nPlease read [Contributing to LeetTools](CONTRIBUTING.md) for details.\n\n**License**\n\nLeetTools is licensed under the Apache License, Version 2.0. See [LICENSE](LICENSE) \nfor the full license text.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleettools-dev%2Fleettools","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fleettools-dev%2Fleettools","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fleettools-dev%2Fleettools/lists"}