{"id":46289003,"url":"https://github.com/smartscanapp/smartscan-lib","last_synced_at":"2026-04-12T21:14:12.117Z","repository":{"id":285006157,"uuid":"956772816","full_name":"smartscanapp/smartscan-lib","owner":"smartscanapp","description":"Python library that provides tools for ML inference, indexing, semantic search , classification and efficient batch processing.","archived":false,"fork":false,"pushed_at":"2026-03-02T07:12:43.000Z","size":920,"stargazers_count":8,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-03-04T11:06:46.135Z","etag":null,"topics":["cli","file-management","linux","ml","onnx","onnxruntime","systemd","vector-embeddings"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/smartscanapp.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-03-28T20:49:40.000Z","updated_at":"2026-03-02T07:12:44.000Z","dependencies_parsed_at":"2025-03-28T22:22:53.765Z","dependency_job_id":"00420b57-65b5-447a-a35f-e82cd16ad3a0","html_url":"https://github.com/smartscanapp/smartscan-lib","commit_stats":null,"previous_names":["dev-diaries41/smartscan-cli","dev-diaries41/smartscan-py","smartscanapp/smartscan-server","smartscanapp/smartscan-lib"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/smartscanapp/smartscan-lib","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartscanapp%2Fsmartscan-lib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartscanapp%2Fsmartscan-lib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartscanapp%2Fsmartscan-lib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartscanapp%2Fsmartscan-lib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/smartscanapp","download_url":"https://codeload.github.com/smartscanapp/smartscan-lib/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/smartscanapp%2Fsmartscan-lib/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30186778,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-06T14:42:24.748Z","status":"ssl_error","status_checked_at":"2026-03-06T14:42:14.925Z","response_time":250,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cli","file-management","linux","ml","onnx","onnxruntime","systemd","vector-embeddings"],"created_at":"2026-03-04T07:35:56.281Z","updated_at":"2026-04-12T21:14:12.110Z","avatar_url":"https://github.com/smartscanapp.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# SmartScan Python Library\n\nPython library providing tools for ML inference, embeddings, indexing, semantic search, clustering, few-shot classification, and efficient batch processing. This library powers the SmartScan Server used by the Desktop App.\n\n---\n\n## Supported Embedding Providers\n\nAll of th models below are quantized.\n\n### Image\n\n* CLIP ViT-B-32\n* DINOv2 Small\n* Inception ResNet V2 (facial recognition)\n\n### Text\n\n* CLIP ViT-B-32\n* all-MiniLM-L6-v2\n* all-distilroberta-v1\n\n---\n\n## Installation\n\n### Prerequisites\n\n* Python 3.10+\n\n```bash\npip install git+https://github.com/smartscanapp/smartscan-lib.git\n```\n\n---\n\n## Quick Start\n\n### Embeddings\n\n#### Embed images\n\n```python\nfrom smartscan.models.model_manager import ModelManager\nfrom PIL import Image\n\nmm = ModelManager() # optionally pass root directory path for models\nimage_embedder = mm.get_image_embedder(\"clip-vit-b-32-image\")\n# or\nimage_embedder = mm.get_image_embedder(\"dinov2-small\")\n\nimage_embedder.init()\n\nimage_embedder.embed(Image.open(\"image.jpg\"))\nimage_embedder.embed_batch([\n    Image.open(\"image1.jpg\"),\n    Image.open(\"image2.jpg\")\n])\n```\n\n#### Embed text\n\n```python\nfrom smartscan.models.model_manager import ModelManager\nmm = ModelManager() # optionally pass root directory path for models\n\ntext_embedder = mm.get_text_embedder(\"all-minilm-l6-v2\")\ntext_embedder.init()\n\ntext_embedder.embed(\"text to embed\")\ntext_embedder.embed_batch([\"text1\", \"text2\", \"text3\"])\n```\n\n---\n\n### Indexing\n\nIndexers are implemented using the `BatchProcessor` abstraction. Default indexers are provided for common data types.\nAll indexers optionally accept a `ProcessorListener` for progress and batch callbacks.\n\n#### Images\n\n```python\nfrom smartscan.indexer import ImageIndexer\nfrom smartscan.models.model_manager import ModelManager\n\nimage_urls = [...]\nimage_paths = [...]\n\nmm = ModelManager() \nimage_embedder = mm.get_image_embedder(\"dinov2-small\")\nimage_embedder.init()\n\nindexer = ImageIndexer(\n    image_encoder=image_embedder,\n    listener=listener  # optional\n)\n\nawait indexer.run(image_urls)\nawait indexer.run(image_paths)\n```\n\n#### Videos\n\n```python\nfrom smartscan.indexer import VideoIndexer\nfrom smartscan.providers import DinoSmallV2ImageEmbedder\n\nvideo_urls = [...]\nvideo_paths = [...]\n\nmm = ModelManager()\nimage_embedder = mm.get_image_embedder(\"dinov2-small\")\nimage_embedder.init()\n\nindexer = VideoIndexer(\n    image_encoder=image_embedder,\n    listener=listener  # optional\n)\n\nawait indexer.run(video_urls)\nawait indexer.run(video_paths)\n```\n\n#### Documents\n\n```python\nfrom smartscan.indexer import DocIndexer\nfrom smartscan.models.model_manager import ModelManager\n\ndoc_paths = [...]\n\nmm = ModelManager()\ntext_embedder = mm.get_text_embedder(\"all-minilm-l6-v2\")\ntext_embedder.init()\n\nindexer = DocIndexer(\n    text_encoder=text_embedder,\n    listener=listener  # optional\n)\n\nawait indexer.run(doc_paths)\n```\n\n---\n\n### Clustering\n\nIncrementally groups embeddings into clusters based on similarity. Supports existing clusters, adaptive thresholds, and optional auto-merging.\n\n```python\nfrom smartscan.cluster import IncrementalClusterer\n\nclusterer = IncrementalClusterer(\n    default_threshold=initial_threshold,\n    merge_threshold=auto_merge_threshold,\n    existing_assignments=existing_assignments,\n    existing_clusters=existing_clusters,\n)\n\nresult = clusterer.cluster(ids, embeddings)\n```\n\n---\n\n### Few-Shot Classification\n\nAssigns a label to an embedding by comparing it against pre-labelled cluster centroids.\nSupports batch processing and an optional `ProcessorListener`.\n\n#### Single item\n\n```python\nfrom smartscan.classify.fewshot import few_shot_classify\n\nresult = few_shot_classify(\n    item=item_embedding,\n    labelled_clusters=clusters,\n    sim_factor=1.0\n)\n\nprint(result.label, result.similarity)\n```\n\n#### Batch processing\n\n```python\nfrom smartscan.classify.fewshot import FewShotClassifier\n\nclassifier = FewShotClassifier(\n    labelled_clusters=clusters,\n    listener=listener,  # optional\n    sim_factor=1.0,\n    batch_size=32\n)\n\nawait classifier.run(item_embeddings)\n```\n\n---\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmartscanapp%2Fsmartscan-lib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmartscanapp%2Fsmartscan-lib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmartscanapp%2Fsmartscan-lib/lists"}