{"id":17526961,"url":"https://github.com/OpenInterpreter/aifs","last_synced_at":"2025-03-06T06:31:15.936Z","repository":{"id":217454203,"uuid":"743912205","full_name":"OpenInterpreter/aifs","owner":"OpenInterpreter","description":"Local semantic search. Stupidly simple.","archived":false,"fork":false,"pushed_at":"2024-04-24T03:27:32.000Z","size":331,"stargazers_count":326,"open_issues_count":7,"forks_count":40,"subscribers_count":9,"default_branch":"main","last_synced_at":"2024-05-22T00:19:58.682Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenInterpreter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":"ROADMAP.md","authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-16T08:54:21.000Z","updated_at":"2024-06-01T20:41:01.903Z","dependencies_parsed_at":"2024-04-24T04:43:05.815Z","dependency_job_id":null,"html_url":"https://github.com/OpenInterpreter/aifs","commit_stats":null,"previous_names":["killianlucas/aifs"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenInterpreter%2Faifs","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenInterpreter%2Faifs/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenInterpreter%2Faifs/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenInterpreter%2Faifs/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenInterpreter","download_url":"https://codeload.github.com/OpenInterpreter/aifs/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":220753131,"owners_count":16697311,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-20T15:02:39.600Z","updated_at":"2024-10-20T15:07:06.154Z","avatar_url":"https://github.com/OpenInterpreter.png","language":"Python","funding_links":[],"categories":["Python","Learning"],"sub_categories":["Repositories"],"readme":"# AI Filesystem\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1QdXPchTDnzW6I_3HTZFpSeak_XoH81v5?usp=sharing)\n\nLocal semantic search over folders. Why didn't this exist?\n\n```shell\npip install aifs\npip install \"unstructured[all-docs]\" # If you want to parse all doc types. Includes large packages!\n```\n\n```python\nfrom aifs import search\n\nsearch(\"How does AI Filesystem work?\", path=\"/path/to/folder\")\nsearch(\"It's not unlike how Spotlight works.\") # Path defaults to CWD\n```\n\n# How it works\n\n\u003cbr\u003e\n\n![aifs](https://github.com/KillianLucas/aifs/assets/63927363/c61599a9-aad8-483d-b6a4-3671629cd5f4)\n\nRunning `aifs.search` will chunk and embed all nested supported files (`.txt`, `.py`, `.sh`, `.docx`, `.pptx`, `.jpg`, `.png`, `.eml`, `.html`, and `.pdf`) in `path`. It will then store these embeddings into an `_.aifs` file in `path`.\n\nBy storing the index, you only have to chunk/embed once. This makes semantic search **very** fast after the first time you search a path.\n\nIf a file has changed or been added, `aifs.search` will update or add those chunks. We still need to handle file deletions (we welcome PRs).\n\n### In detail:\n\n1. If a folder hasn't been indexed, we first use [`unstructured`](https://github.com/Unstructured-IO/unstructured/tree/main) to parse and chunk every file in the `path`.\n2. Then we use [`chroma`](https://github.com/chroma-core/chroma) to embed the chunks locally and save them to a `_.aifs` file in `path`.\n3. Finally, `chroma` is used again to semantically search the embeddings.\n\nIf an `_.aifs` file _is_ found in a directory, it uses that instead of indexing it again. If some files have been updated, it will re-index those.\n\n# Goals\n\n- We should always have SOTA parsing and chunking. The logic for this should be swapped out as new methods arise.\n  - Chunking should be semantic — as in, `python` and `markdown` files should have _different_ chunking algorithms based on the expected content of those filetypes. Who has this solution?\n  - For parsing, I think Unstructured is the best of the best. Is this true?\n- We should always have SOTA embedding. If a better local embedding model is found, we should automatically download and use it.\n  - I think Chroma will always do this (is this true?) so we depend on Chroma.\n- This project should stay **minimally scoped** — we want `aifs` to be the best local semantic search in the universe.\n\n# Why?\n\nWe built this to let [`open-interpreter`](https://openinterpreter.com/) quickly semantically search files/folders.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenInterpreter%2Faifs","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FOpenInterpreter%2Faifs","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FOpenInterpreter%2Faifs/lists"}