{"id":20548656,"url":"https://github.com/smilyorg/photofield-ai","last_synced_at":"2025-04-14T10:53:18.145Z","repository":{"id":75930379,"uuid":"547960886","full_name":"SmilyOrg/photofield-ai","owner":"SmilyOrg","description":"Experimental machine learning API supporting Photofield.","archived":false,"fork":false,"pushed_at":"2024-01-07T00:03:08.000Z","size":1493,"stargazers_count":22,"open_issues_count":0,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-03-28T00:07:35.325Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SmilyOrg.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2022-10-08T16:48:01.000Z","updated_at":"2025-02-25T09:55:27.000Z","dependencies_parsed_at":"2024-01-07T01:50:21.392Z","dependency_job_id":"085630a8-ffb3-491a-b68a-5b820c1af97a","html_url":"https://github.com/SmilyOrg/photofield-ai","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SmilyOrg%2Fphotofield-ai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SmilyOrg%2Fphotofield-ai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SmilyOrg%2Fphotofield-ai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SmilyOrg%2Fphotofield-ai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SmilyOrg","download_url":"https://codeload.github.com/SmilyOrg/photofield-ai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248868774,"owners_count":21174756,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-16T02:14:11.148Z","updated_at":"2025-04-14T10:53:18.122Z","avatar_url":"https://github.com/SmilyOrg.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- HEADER --\u003e\n\u003cbr /\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/othneildrew/Best-README-Template\"\u003e\n    \u003cimg src=\"assets/android-chrome-192x192.png\" alt=\"Logo\" width=\"80\" height=\"80\"\u003e\n  \u003c/a\u003e\n\n  \u003ch3 align=\"center\"\u003ePhotofield AI\u003c/h3\u003e\n\n  \u003cp align=\"center\"\u003e\n    Experimental machine learning API supporting \u003ca href=\"https://github.com/SmilyOrg/photofield\"\u003ePhotofield\u003c/a\u003e.\n    \u003cbr /\u003e\n    \u003cbr /\u003e\n    \u003ca href=\"https://github.com/SmilyOrg/photofield-ai/issues\"\u003eReport Bug\u003c/a\u003e\n    ·\n    \u003ca href=\"https://github.com/SmilyOrg/photofield-ai/issues\"\u003eRequest Feature\u003c/a\u003e\n  \u003c/p\u003e\n\u003c/p\u003e\n\n\n\n\u003c!-- TABLE OF CONTENTS --\u003e\n\u003cdetails open=\"open\"\u003e\n  \u003csummary\u003eTable of Contents\u003c/summary\u003e\n  \u003col\u003e\n    \u003cli\u003e\n      \u003ca href=\"#about\"\u003eAbout\u003c/a\u003e\n      \u003cul\u003e\n        \u003cli\u003e\u003ca href=\"#features\"\u003eFeatures\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#limitations\"\u003eLimitations\u003c/a\u003e\u003c/li\u003e\n        \u003cli\u003e\u003ca href=\"#built-with\"\u003eBuilt With\u003c/a\u003e\u003c/li\u003e\n      \u003c/ul\u003e\n    \u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#getting-started\"\u003eGetting Started\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#usage\"\u003eUsage\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#configuration\"\u003eConfiguration\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#development-setup\"\u003eDevelopment Setup\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#contributing\"\u003eContributing\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#license\"\u003eLicense\u003c/a\u003e\u003c/li\u003e\n    \u003cli\u003e\u003ca href=\"#acknowledgements\"\u003eAcknowledgements\u003c/a\u003e\u003c/li\u003e\n  \u003c/ol\u003e\n\u003c/details\u003e\n\n\n\n## About\n\nPhotofield AI is a companion to [Photofield] providing AI features. It's a\nseparate REST API service both to keep the main app slim and because AI features\nare currently easier to implement in Python as opposed to Go. It is an API\ncurrently exposing the [OpenAI CLIP] image and text embedding functionality.\n\n### Features\n\nReturns [OpenAI CLIP] images and text embeddings that you can then compare with\n[Cosine similarity] for use in semantic image search. Image embedding runs at up\nto ~20 requests/sec on an i7-5820K CPU and up to ~200 requests/sec using a\nGeForce GTX 1070 Ti. Resource utilization with a GPU is low, so I imagine there\nare some bottlenecks in some parts of the system, but 200 requests/sec seems\nplenty enough as is.\n\n### Limitations\n\nThe current REST API is tied pretty closely to [Photofield]. The machine\nlearning model itself also has some limitations and bias, as was reported by\nOpenAI:\n\n_CLIP and our analysis of it have a number of limitations. CLIP currently\nstruggles with respect to certain tasks such as fine grained classification and\ncounting objects. CLIP also poses issues with regards to fairness and bias which\nwe discuss in the paper and briefly in the next section._\n\nSee more on model use in the [CLIP: Model Use] section of the model card from OpenAI.\n\n### Built With\n\n* [Python]\n* [FastAPI] - REST API framework\n* [ONNX Runtime] - machine learning inference\n* [CLIP Variants] - CLIP converted to ONNX (by yours truly)\n* [+ more Python libraries](pyproject.toml)\n\n## Getting Started\n\n### Docker\n\n`docker run -it -p 8081:8081 ghcr.io/smilyorg/photofield-ai:latest`\n\nThe `clip-vit-base-patch32-(visual|textual)-float16` models are currently\nbundled for a good out-of-the-box experience.\n\nThe Docker image is currently CPU-only as I'm currently unable to test Docker\nGPU support (help wanted).\n\nConnect it with [photofield] by adding the following snippet to its\n`configuration.yaml`:\n\n```yaml\nai:\n  # photofield-ai API server URL\n  host: http://localhost:8081\n```\n\n### From Source\n\n#### Prerequisites\n\n1. [Python]\n2. [Poetry]\n\n#### Setup\n\n1. [Download the\n   source](https://github.com/SmilyOrg/photofield-ai/archive/refs/heads/main.zip)\n   or clone the Git repository\n2. In the source directory you downloaded, run `poetry install` to install the\n   required dependencies. You can also run `poetry install --without gpu` to\n   skip installing GPU dependencies if you want to run it on CPU only (it is\n   also a smaller install).\n3. After [Poetry] installs all the required dependencies, the server should be\n   ready to run.\n\n#### Run\n\nRun the server with `poetry run python main.py`. If you don't specify any model\nfiles, it should first download the default models and then start listening to\nrequests.\n\n```\n❯ poetry run python main.py\nAvailable providers: TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider\nUsing providers: TensorrtExecutionProvider, CUDAExecutionProvider, CPUExecutionProvider\nLoading visual model: models/clip-vit-base-patch32-visual-float16.onnx\nLoading textual model: models/clip-vit-base-patch32-textual-float16.onnx\n2022-10-08 14:28:35.9706571 [W:onnxruntime:Default, tensorrt_execution_provider.h:60 onnxruntime::TensorrtLogger::log] [2022-10-08 13:28:35 WARNING] external\\onnx-tensorrt\\onnx2trt_utils.cpp:369: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.\n\nVisual inference ready, input size 224, type tensor(float16)\nTextual inference ready, input size 77, type tensor(int32)\nListening on 0.0.0.0:8081\n```\n\nIf you are starting it with GPU support (default) it may take some time for it to start up. The `WARNING` above is to be expected for the TensorRT runtime, it seems to work fine regardless.\n\n# Usage\n\nSome request/response examples are listed below. If you use the neat [REST Client] extension for VSCode you can even execute them directly if you open the README 😎. See [examples.http](examples.http) for more.\n\n`{{api}}` refers to the root URL of the API, the following defines it for the [REST Client] extension.\n\n```http\n@api = http://localhost:8081\n```\n\n## Embed Text\n\nThe `/text-embeddings` endpoint accepts a list of text strings that are\nconverted to embeddings by the textual model.\n\n### Request\n\n```http\nPOST {{api}}/text-embeddings HTTP/1.1\nContent-Type: application/json\n\n{\n    \"texts\": [\"hawk\"]\n}\n```\n\n### Response\n\n```json\n{\n  \"texts\": [\n    {\n      \"text\": \"hawk\",\n      \"embedding_f16_b64\": \"/7SvLO0xxqbgs4urVqq/vT21ozEAsGcu060XMxY0WyJbOBczIrJ0rkM4UB+Mrag007PhMuspxzTLshAxEjDgtXOsOzbrMRYwii2UKwS1IzNlsKUum6posPIwTS4KqG60arLbJgWu/K8hNaW1ry1QMx0hPpposuox2jMSsCixL6KNJQCw/LGttB4xwyoxM72QUa14NrsyMapespypirRHuHW177F5MNI0JiGfH+CxsbATMSqzIa94Mpy+eDcONnWvki27s4AqK7MlNsKgnjUpJ6Cy8y5snAqtTrB4JASxHCr4NF4wa6lMtHS2PK3HL1CtUjLKtXcuyTQlMQY0h7JZqU+1bTMpryYwArD7RSoonrR4rFEugTVaNssw8bBKKsyxzbKTLT22Jy8Rscy1BCmvsTSwT654pwkrii8gLkK116kDpvCwJ63frxYl9K+2sCk02DQ+LdS0C6dSrH0uOrUqL3e3vqa7sA4snjVgM48xeicYMMEoyy6AMAsjB69ft7OiDzC1MLKwtKbVIpqn6KjatmIsrjD8rY2xWa7MKW0fXLePuTWpdza/MYi5aB3atZivSqyYqC4kI60pKda0qbHbqLM126IFNA4xtrXKojyvDCynNGQioi4pLU01sp5/tcmoyasdODmuN7StLBawsqZKuYSsULSgtZE18i9SOLy5a7JksecrrS2dLGwx5q4etdm1rrQELLovxbY+prI0I7dLuXoxH66JKEkwszQitCSyia1cMfgwNK3lqDg0XzHlNrooTbhos9KyOR6GMYM1gbC8KyiwM7MoM+Oo7a7RLZ4mSx2wpSKy2qccMgEyFTS1tYMc+EWqNgayvzIfMrKrlDg3mV44GTBDKDg42jIOLBewtChhLNbAerFTNGywyDP3NjkxnTCgr8kyQrPoKLytxCo/tKYsQrDoJJSx2K7jNO8pjLL8KpirojVoMxWtV6msMjcyYDIqsIypYDSvNSO2ZK4+rg21EKE3owC0ALMSMwMu2KeKtXE0rbLPpsW20yUiLWuv/bUdsVuoKDYtKBSv9K4FJ560arXxNVs1RKwCsHCnkjhktam1rTGUn6yvdimYOKUHRDbHsHU38TdGKhE1UTKsLuYqxyVGq5UwaglCtEu0HiVasgg5ubQdrrUzLCw2Ji8yCa3ksnYpYzJlsF4y3beDKlaxFrRLnX+0W6IsOHO1TaE2s4gwKrEUNMWv5yFhsRosAzTVNtoxJrZKMCIoaavTM5UvMyyvLfM0D6zTtc60cK7Hlm62/iubLZgyODibrOgwN62auL0opzPiMvcvsK3KMpmtEbNIta+lYSysOM+vEDPqMOIzJKkOtCsq6bS3MqM3XixvsEC0Ni2HrbC1Bay4rg==\",\n      \"embedding_inv_norm_f16_uint16\": 11810\n    }\n  ]\n}\n```\n\n* `embedding_f16_b64` - embedding that comes out of machine learning model. It's\n  a base64-encoded list of 512 (or more, depending on model) float16 2-byte\n  floating point values. You can compare this embedding to any other text or\n  image embedding via [cosine similarity] (normalized dot product) to get the\n  semantic similarity between them.\n\n* `embedding_inv_norm_f16_uint16` - the [Euclidean / L2 norm][norm] of the\n  embedding (vector length). It is inverted, converted to float16 2-byte\n  floating point and then written out as an integer uint16 value. Using this\n  precomputed value comes in handy while computing the [cosine similarity] for\n  semantic image search as you can skip computing it for each image embedding.\n\n## Embed Images\n\nThe `/image-embeddings` endpoint accepts multiple form multipart image uploads\nand computes the embedding for each using the visual model.\n\n```http\n@image = heavy-industry.jpg\n```\n\n### Request\n\n```http\nPOST {{api}}/image-embeddings\nContent-Type: multipart/form-data; boundary=------------------------23f534be8db8eca0\n\n--------------------------23f534be8db8eca0\nContent-Disposition: form-data; name=\"image\"; filename=\"heavy-industry.jpg\"\nContent-Type: image/jpeg\n\n\u003c {{image}}\n\n--------------------------23f534be8db8eca0\n```\n\n### Response\n\n```json\n{\n  \"images\": [\n    {\n      \"field\": \"image\",\n      \"filename\": \"heavy-industry.jpg\",\n      \"embedding_f16_b64\": \"VykzsNCu7CQ2NLCqkjQjspWtlDeXrmm4z6htMZ6z7ayNO2M2xbVrqjU0gK3uN3c0hZiRMDY6jDTXm2GkUrZUrXarLzZINGonQ7ScuI64QTk+tH64LbhrNOU2BLU6tfYyjrT1t5QtYKgOMfi3b6lxty2437JWNA+x1TceMtQtNbBVqXegt6ttMMUtXLFaseo0H7BEt+U1bzF+tPGs0iAmsPAtCDFus/m0m64SNo8ujTDDJoIodi/vMY3Fa5meLMy3/LECOUWv/7QXIcOwO7o2K+03nLIev0A1YR1KNBkwnrbptuszlDWcrrkyOrP4Mdk7dzCGrk8tmTciMvWyti5iNGieNTMyMyWsJrmdOoAulyzGOHEvoLOYrE8nvy9TtQsym7HGJDS4MDVcsRsuyR2CN/wo7bIHODc4HDvPNWgiCy+kqkSkmrdtsuuyVDXhu+41r7FWLKc44a1rsFSx8bOJMnywRjLrNt4xNyYSpmgoLrSnsLA0KDUbq1gp7rSCLZkg+bExOIq01rAaqtuxM6yqtB4wELQ4sqC2UrYSrCe1WqzdtGOsEzbZqYE0oqBZMSo087VBNBO6UqorNuo0iLiktLO5gjYtNz+237LLomupHTgTMWkrQRyDPC80WjFQtCEwxjqaME23rytRr0E7Oy+SMma6+7B9ILwhaTjJM2Mwkjc+tKOwWT+SN5k2JKsAMu89LK4ttxqty7QVL7OuobUkolikITQ3trutbbHSGEK1cjQ5MwMszq5Itqq01LBzPFE0abfSJwe0XDB1t9M2orNytEg2QClbNW00Frmer9uz5qIVLPO0+CRxLUk1rTIOMBUwmzqrNlg3pjOGrz824bhutI40RT0stMS1yLK9LjKtwLAvuHevtzAlOioz/LEVuJa2CLgDMbY2PycplR00VLGPHvEws7gTMTMuxyh2NmI1srE6qFMse6yYtKm1QDUcs6Esz7jmsM2zUa8ltA4t/p0buGM2f6pHpw+127S7I4cwKa+1sMGhRbKiNWi06ytKtqCkh7Y7NoqxdCIJM+i3wLSYs+uvUTTPtFU1mSC4tE83UjMdMJitpzUZM7Qz8hn4tiMsbbJjK+m6BTDUsmc0+7MqLiI23DS2LO8mUCi+rB00zS1JrZKvKLDkM1S0mjCEOMqy3zeZKWk2mbD0uZWwDjXbpzU0/zoeMhS56qi5NIywXbilMi619TaCrrO3rawoqoOyOTHarBolarJFMGQySrf/sVWn5Tm0LXqvHLeCsssuVixJOF2uN5DMJi4vL7SSNGwqeLOEtHmtnThJOSowWjWsMEQ4bbiVK+qsmzNFuvwoM7amtjeqC7VWJh847rgvpIE0OToiuDSzE6R8M8U4wbQ1sA==\",\n      \"embedding_inv_norm_f16_uint16\": 11922\n    }\n  ]\n}\n```\n\n## Configuration\n\nYou can configure the app via environment variables.\n\n| Environment variable name | Default value | Purpose |\n| --- | --- | --- |\n| `PHOTOFIELD_AI_HOST` | `0.0.0.0` | The host the server will listen on. |\n| `PHOTOFIELD_AI_PORT` | `8081` | The port the server will listen on. |\n| `PHOTOFIELD_AI_MODELS_DIR` | `models/` | The directory models will be downloaded to if a URL is provided |\n| `PHOTOFIELD_AI_VISUAL_MODEL` | `https://huggingface.co/mlunar/clip-variants/resolve/main/modelclip-vit-base-patch32-visual-float16.onnx` | URL or local file path to the visual ONNX CLIP model to use for image embedding. If a URL is provided, the model will first be downloaded to `PHOTOFIELD_AI_MODELS_DIR` if it doesn't exist there already. If a local path is provided, the model will be used as is. |\n| `PHOTOFIELD_AI_TEXTUAL_MODEL` | `https://huggingface.co/mlunar/clip-variants/resolve/main/modelclip-vit-base-patch32-textual-float16.onnx` | Same as `PHOTOFIELD_AI_VISUAL_MODEL`, but for the textual model used for text embedding. |\n| `PHOTOFIELD_AI_RUNTIME` | `all` | `all` enables all available ONNX runtime providers, making use of any GPU or other accelerator device if you have the right [ONNX Runtime] prerequisites installed. `cpu` for CPU-only execution, which is faster to startup and develop with, but it is usually going to be ~10x slower than a GPU at inference. `cpu` is a shortcut for `PHOTOFIELD_AI_PROVIDERS=CPUExecutionProvider`. |\n| `PHOTOFIELD_AI_PROVIDERS` | unset | If `PHOTOFIELD_AI_RUNTIME` is not set, you can use this specify the ONNX providers you would like to use directly comma-delimited. For example: `CUDAExecutionProvider,CPUExecutionProvider`. |\n\n### Models\n\nFor `PHOTOFIELD_AI_VISUAL_MODEL` and `PHOTOFIELD_AI_TEXTUAL_MODEL` you can use\nany model from [clip-variants models].\n\nThe bigger models are likely to be better, however it probably depends on your\nuse-case. The different model types most likely won't be compatible with each\nother, however combining different data types might work fine.\n\nNote that the `qint8` models don't seem to work right now, so use `quint8` ones\ninstead.\n\n## Development Setup\n\n### Prerequisites\n\n* [Python]\n* [Poetry] - for dependency management\n* [just] - to run common commands conveniently\n* sh-like shell (e.g. sh, bash, busybox) - required by `just`\n\n**[Scoop] (Windows)**: `scoop install busybox just`\n\n### Installation\n\n1. Clone the repo\n   ```sh\n   git clone https://github.com/smilyorg/photofield-ai.git\n   ```\n2. Install Python dependencies\n   ```sh\n   poetry install\n   ```\n\n### Running\n\n* `poetry shell` to enter the virtual environment and `just watch` the source\n  files and auto-reload the server\n* or `just run` the server\n\n## Contributing\n\nPull requests are welcome. For major changes, please open an issue first to\ndiscuss what you would like to change.\n\n## License\n\nDistributed under the MIT License. See [LICENSE](LICENSE) for more information.\n\n## Acknowledgements\n* [OpenAI CLIP] for the research and machine learning model weights used here\n* [Hugging Face](https://huggingface.co/) for hosting the ONNX models\n* [CLIP-as-service by Jina](https://github.com/jina-ai/clip-as-service) as a big inspiration for this project\n* [openai-clip-js by josephrocca](https://github.com/josephrocca/openai-clip-js) on how to convert CLIP to ONNX\n* [CLIP-ONNX by Lednik7](https://github.com/Lednik7/CLIP-ONNX) on more CLIP with ONNX example code\n* [Exporting a Model from PyTorch to ONNX and running it using ONNX Runtime - PyTorch](https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html)\n* [imgbeddings by minimaxir](https://github.com/minimaxir/imgbeddings) for a similar image-focused CLIP ONNX implementation\n* [Best-README-Template](https://github.com/othneildrew/Best-README-Template)\n* [readme.so](https://readme.so/)\n\n\n[Photofield]: https://github.com/SmilyOrg/photofield\n[OpenAI CLIP]: https://github.com/openai/CLIP/\n[CLIP: Model Use]: https://github.com/openai/CLIP/blob/main/model-card.md#model-use\n[Cosine similarity]: https://en.wikipedia.org/wiki/Cosine_similarity\n[norm]: https://en.wikipedia.org/wiki/Norm_(mathematics)#Euclidean_norm\n\n[Python]: https://www.python.org/\n[Git]: https://git-scm.com/downloads\n[Poetry]: https://python-poetry.org/docs/#installation\n[FastAPI]: https://fastapi.tiangolo.com/\n[ONNX Runtime]: https://onnxruntime.ai/\n[CLIP Variants]: https://huggingface.co/mlunar/clip-variants\n[clip-variants models]: https://huggingface.co/mlunar/clip-variants/tree/main/models\n[REST Client]: https://marketplace.visualstudio.com/items?itemName=humao.rest-client\n\n[Configuration]: #configuration\n\n[open an issue]: https://github.com/SmilyOrg/photofield-ai/issues\n[Getting Started]: #getting-started\n\n[Scoop]: https://scoop.sh/\n[just]: https://github.com/casey/just\n[watchexec]: https://github.com/watchexec/watchexec\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmilyorg%2Fphotofield-ai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsmilyorg%2Fphotofield-ai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsmilyorg%2Fphotofield-ai/lists"}