{"id":13712527,"url":"https://github.com/lancedb/yoloexplorer","last_synced_at":"2025-04-05T05:06:39.041Z","repository":{"id":180723055,"uuid":"665015643","full_name":"lancedb/yoloexplorer","owner":"lancedb","description":"YOLOExplorer : Iterate on your YOLO / CV datasets using SQL, Vector semantic search, and more within seconds","archived":false,"fork":false,"pushed_at":"2025-03-03T16:38:39.000Z","size":16464,"stargazers_count":126,"open_issues_count":6,"forks_count":19,"subscribers_count":8,"default_branch":"main","last_synced_at":"2025-03-29T04:08:05.333Z","etag":null,"topics":["computer-vision","object-detection","yolov5","yolov8"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lancedb.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-07-11T08:52:31.000Z","updated_at":"2025-03-13T14:27:58.000Z","dependencies_parsed_at":"2023-07-12T15:36:15.883Z","dependency_job_id":"c95300b5-f528-4b79-bb73-0474f189f0ed","html_url":"https://github.com/lancedb/yoloexplorer","commit_stats":null,"previous_names":["lancedb/yoloexplorer"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fyoloexplorer","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fyoloexplorer/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fyoloexplorer/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lancedb%2Fyoloexplorer/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lancedb","download_url":"https://codeload.github.com/lancedb/yoloexplorer/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247289427,"owners_count":20914464,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["computer-vision","object-detection","yolov5","yolov8"],"created_at":"2024-08-02T23:01:19.436Z","updated_at":"2025-04-05T05:06:39.012Z","avatar_url":"https://github.com/lancedb.png","language":"Python","funding_links":[],"categories":["Summary","Object Detection Datasets"],"sub_categories":[],"readme":"# YOLOExplorer\n\nExplore, manipulate and iterate on Computer Vision datasets with precision using simple APIs.\nSupports SQL filters, vector similarity search, native interface with Pandas and more.\n\n\n* Analyse your datasets with powerful custom queries\n* Find and remove bad images (duplicates, out of domain data and more)\n* Enrich datasets by adding more examples from another datasets\n* And more\n\n🌟 NEW: Supports GUI Dashboard, Pythonic and notebook workflows\n### Dashboard Workflows\n\u003cdetails open\u003e\n\u003csummary\u003eMultiple dataset support\u003c/summary\u003e\nYou can now explore multiple datasets, search across them, add/remove images across multiple datasets to enrich bad examples. Start training on new dataset within seconds.\n  Here's an example of using VOC, coco128 and coco8 datasets together with VOC being the primary.\n\u003cpre\u003e\nfrom yoloexplorer import Explorer\n\nexp = Explorer(\"VOC.yaml\")\nexp.build_embeddings()\n\ncoco_exp = Explorer(\"coco128.yaml\")\ncoco_exp.build_embeddings()\n #Init coco8 similarly\n\nexp.dash([coco_exp, coco8])\n#Automatic analysis coming soon with dash(..., analysis=True)\n\n\u003c/pre\u003e\n\n  ![ezgif com-optimize (3)](https://github.com/lancedb/yoloexplorer/assets/15766192/3422a536-138a-4fce-af2c-cef97f171aed)\n\n\u003c/details\u003e\n\n\n\u003cdetails open\u003e\n\u003csummary\u003eMultiple model support\u003c/summary\u003e\n\nYou can now explore multiple pretrained models listed\n`\"resnet18\", \"resnet50\", \"efficientnet_b0\", \"efficientnet_v2_s\", \"googlenet\", \"mobilenet_v3_small\"` for extracting better features out of images to improve searching across multiple datasets.\u003cpre\u003e\nfrom yoloexplorer import Explorer\n\nexp = Explorer(\"coco128.yaml\", model=\"resnet50\")\nexp.build_embeddings()\n\ncoco_exp = Explorer(\"coco128.yaml\", model=\"mobilenet_v3_small\")\ncoco_exp.build_embeddings()\n\n#Use force=True as a parameter in build_embedding if embeddings already exists\n\nexp.dash([coco_exp, coco8])\n#Automatic analysis coming soon with dash(..., analysis=True)\n\u003c/details\u003e\n\n\u003cdetails open\u003e\n\u003csummary\u003eQuery using SQL and semantic search, View dataset as pandas DF and explore embeddings\u003c/summary\u003e\n\n![ezgif com-optimize (4)](https://github.com/lancedb/yoloexplorer/assets/15766192/b786e2f1-dc8e-411e-b13b-84b26ec50d41)\n\n![ezgif com-optimize (5)](https://github.com/lancedb/yoloexplorer/assets/15766192/38d42a38-810e-48f3-89ea-1ccf304a1047)\n\n\u003c/details\u003e\n\n\u003cdetails open\u003e\nTry an example colab \u003ca href=\"https://colab.research.google.com/github/lancedb/yoloexplorer/blob/main/examples/intro.ipynb\"\u003e\u003cimg src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"\u003e\u003c/a\u003e\n\n\u003csummary\u003eColab / Notebook\u003c/summary\u003e\n\u003cimg src=\"./yoloexplorer/assets/docs/intro.gif\" height=75% width=75% /\u003e\n\u003c/details\u003e\n\n### Installation\n```\npip install yoloexplorer\n```\nInstall from source branch\n```\npip install git+https://github.com/lancedb/yoloexplorer.git\n```\nPypi installation coming soon\n\n## Quickstart\nYOLOExplorer can be used to rapidly generate new versions of CV datasets trainable on [Ultralytics YOLO, SAM, FAST-SAM, RT-DETR](https://github.com/ultralytics/ultralytics) and more models.\n\nStart exploring your Datasets in 2 simple steps\n* Select a supported dataset or bring your own. Supports all  Ultralytics YOLO datasets currently\n```python\nfrom yoloexplorer import Explorer\n\ncoco_exp = Explorer(\"coco128.yaml\")\n```\n* Build the LanceDB table to allow querying\n```python\ncoco_exp.build_embeddings()\ncoco_exp.dash() # Launch the GUI dashboard\n```\n\u003cdetails open\u003e\n\u003csummary\u003e \u003cb\u003e Querying Basics \u003c/b\u003e \u003c/summary\u003e\n\nYou can get the schema of you dataset once the table is built\n```\nschema = coco_exp.table.schema\n```\nYou can use this schema to run queries\n\n\u003cb\u003eSQL query\u003c/b\u003e\u003cbr/\u003e\nLet's try this query and print 4 result - Select instances that contain one or more 'person' and 'cat'\n```python\ndf = coco_exp.sql(\"SELECT * from 'table' WHERE labels like '%person%' and labels LIKE '%cat%'\")\ncoco_exp.plot_imgs(ids=df[\"id\"][0:4].to_list())\n```\nResult\n\n\u003cimg src=\"./yoloexplorer/assets/docs/plotting.png\" height=50% width=50% /\u003e\u003cbr/\u003e\nThe above is equivlant to plotting directly with a query:\n```python\nvoc_exp.plot_imgs(query=query, n=4)\n```\n\n\u003cb\u003eQuerying by similarity\u003c/b\u003e\u003cbr/\u003e\nNow lets say your model confuses between cetain classes( cat \u0026 dog for example) so you want to look find images similar to the ones above to investigate.\n\nThe id of the first image in this case was 117\n```python\nimgs, ids = coco_exp.get_similar_imgs(117, n=6) # accepts ids/idx, Path, or img blob\nvoc_exp.plot_imgs(ids)\n```\n\u003cimg src=\"./yoloexplorer/assets/docs/sim_plotting.png\" height=50% width=50% /\u003e\u003cbr/\u003e\nThe above is equivlant to directly calling `plot_similar_imgs`\n```python\nvoc_exp.plot_similar_imgs(117, n=6)\n```\nNOTE: You can also pass any image file for similarity search, even the ones that are not in the dataset\n\n\n\u003cb\u003eSimilarity Search with SQL Filter (Coming Soon)\u003c/b\u003e\u003c/br\u003e\nSoon you'll be able to have a finer control over the queries by pre-filtering your table\n```\ncoco_exp.get_similar_imgs(..., query=\"WHERE labels LIKE '%motorbike%'\")\ncoco_exp.plot_similar_imgs(query=\"WHERE labels LIKE '%motorbike%'\")\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e \u003cb\u003ePlotting\u003c/b\u003e\u003c/summary\u003e\n\n| Visualization Method | Description | Arguments |\n|---|---|---|\n| `plot_imgs(ids, query, n=10)` | Plots the given `ids` or the result of the SQL query. One of the 2 must be provided. | `ids`: A list of image IDs or a SQL query. `n`: The number of images to plot. |\n| `plot_similar_imgs(img/idx, n=10)` | Plots `n` top similar images to the given img. Accepts img idx from the dataset, Path to imgs or encoded/binary img | `img/idx`: The image to plot similar images for. `n`: The number of similar images to plot. |\n| `plot_similarity_index(top_k=0.01, sim_thres=0.90, reduce=False, sorted=False)` | Plots the similarity index of the dataset. This gives measure of how similar an img is when compared to all the imgs of the dataset. | `top_k`: The percentage of images to keep for the similarity index. `sim_thres`: The similarity threshold. `reduce`: Whether to reduce the dimensionality of the similarity index. `sorted`: Whether to sort the similarity index. |\n\n**Additional Details**\n\n* The `plot_imgs` method can be used to visualize a subset of images from the dataset. The `ids` argument can be a list of image IDs, or a SQL query that returns a list of image IDs. The `n` argument specifies the number of images to plot.\n* The `plot_similar_imgs` method can be used to visualize the top `n` similar images to a given image. The `img/idx` argument can be the index of the image in the dataset, the path to the image file, or the encoded/binary representation of the image.\n* The `plot_similarity_index` method can be used to visualize the similarity index of the dataset. The similarity index is a measure of how similar each image is to all the other images in the dataset. The `top_k` argument specifies the percentage of images to keep for the similarity index. The `sim_thres` argument specifies the similarity threshold. The `reduce` argument specifies whether to reduce the dimensionality of embeddings before calculating the index. The `sorted` argument specifies whether to sort the similarity index.\n\n\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e \u003cb\u003eAdd, remove, merge parts of datasets, persist new Datasets, and start training!\u003c/b\u003e\u003c/summary\u003e\nOnce you've found the right images that you'd like to add or remove, you can simply add/remove them from your dataset and generate the updated version.\n\n\u003cb\u003eRemoving data\u003c/b\u003e\u003cbr/\u003e\nYou can simply remove images by passing a list of `ids` from the table.\n```\ncoco_exp.remove_imgs([100,120,300..n]) # Removes images at the given ids.\n```\n\n\u003cb\u003eAdding data\u003c/b\u003e\u003cbr/\u003e\nFor adding data from another dataset, you need an explorer object of that dataset with embeddings built. You can then pass that object along with the ids of the imgs that you'd like to add from that dataset.\n```\ncoco_exp.add_imgs(exp, idxs) #\n```\nNote: You can use SQL querying and/or similarity searches to get the desired ids from the datasets.\n\n\u003cb\u003ePersisting the Table: Create new dataset and start training\u003c/b\u003e\u003cbr/\u003e\nAfter making the desired changes, you can persist the table to create the new dataset.\n```\ncoco_exp.persist()\n```\nThis creates a new dataset and outputs the training command that you can simply paste in your terminal to train a new model!\n\n\u003cb\u003eResetting the Table\u003c/b\u003e\u003cbr/\u003e\nYou can reset the table to its original or last persisted state (whichever is latest)\n```\ncoco_exp.reset()\n```\n\u003c/details\u003e\n\n\u003cdetails\u003e\n\u003csummary\u003e\u003cb\u003e(Advanced querying)Getting insights from Similarity index\u003c/b\u003e\u003c/summary\u003e\nThe `plot_similarity_index` method can be used to visualize the similarity index of the dataset. The similarity index is a measure of how similar each image is to all the other images in the dataset.\nLet's the the similarity index of the VOC dataset keeping all the default settings\n\n```python\nvoc_exp.plot_similarity_index()\n```\n\n\u003cimg src=\"./yoloexplorer/assets/docs/sim_index.png\" height=50% width=50%\u003e\u003cbr/\u003e\nYou can also get the the similarity index as a numpy array to perform advanced querys.\n\n```python\nsim = voc_exp.get_similarity_index()\n```\nNow you can combine the similarity index with other querying options discussed above to create even more powerful queries. Here's an example:\n\n\"Let's say you've created a list of candidates you wish to remove from the dataset. Now, you want to filter out the images that have similarity index less than 250, i.e, remove the images that are 90%(`sim_thres`) or more similar to more than 250 images in the dataset.\n\"\n```python\nids = [...] # filtered ids list\nfilter = np.where(sim \u003e 250)\nfinal_ids = np.intersect1d(ids, filter) # intersect both arrays\n\nexp.remove_imgs(final_ids)\n```\n\u003c/details\u003e\n\n\u003ch3\u003eComing Soon\u003c/h3\u003e\n\n\u003cb\u003ePre-filtering\u003c/b\u003e\n* To allow adding filter to searches.\n* Have a finer control over embeddings search space\n\nPre-filtering will enable powerful queries like - \"Show me images similar to \u003cIMAGE\u003e and include only ones that contain one or more(or exactly one) person, 2 cars and 1 horse\" \u003cbr/\u003e\n\n* \u003cb\u003eAutomatically find potential duplicate images\u003c/b\u003e\n\n* \u003cb\u003eBetter embedding plotting and analytics insights \u003c/b\u003e\n\n* \u003cb\u003eBetter dashboard for visualizing imgs \u003c/b\u003e\n\u003c/br\u003e\n\nNotes:\n* The API will have some minor changes going from dev to minor release\n* For all practical purposes the ids are same as row number and is reset after every addition or removal\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Fyoloexplorer","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flancedb%2Fyoloexplorer","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flancedb%2Fyoloexplorer/lists"}