An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with vision

A curated list of projects in awesome lists tagged with vision .

https://github.com/bvlc/caffe

Caffe: a fast open framework for deep learning.

deep-learning machine-learning vision

Last synced: 13 May 2025

https://github.com/BVLC/caffe

Caffe: a fast open framework for deep learning.

deep-learning machine-learning vision

Last synced: 14 Mar 2025

https://github.com/xtls/xray-core

Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens. An open platform for various uses.

anticensorship dns network proxy reality shadowsocks socks5 tls trojan tunnel utls vision vless vmess vpn wireguard xhttp xray xtls xudp

Last synced: 09 Sep 2025

https://github.com/XTLS/Xray-core

Xray, Penetrates Everything. Also the best v2ray-core. Where the magic happens.

anticensorship dns network proxy reality shadowsocks socks5 tls trojan tunnel utls vision vless vmess vpn wireguard xhttp xray xtls xudp

Last synced: 20 Mar 2025

https://github.com/danny-avila/librechat

Enhanced ChatGPT Clone: Features Agents, DeepSeek, Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, Code Interpreter, langchain, DALL-E-3, OpenAPI Actions, Functions, Secure Multi-User Auth, Presets, open-source for self-hosting. Active project.

ai anthropic artifacts assistant-api aws azure chatgpt chatgpt-clone claude clone dall-e-3 deepseek gemini google librechat o1 openai plugins vision webui

Last synced: 09 Sep 2025

https://github.com/bytedance/UI-TARS-desktop

The Open-Source Multimodal AI Agent Stack: Connecting Cutting-Edge AI Models and Agent Infra

agent agent-tars browser-use computer-use gui-agent gui-operator mcp mcp-server multimodal tars ui-tars vision vlm

Last synced: 06 Oct 2025

https://github.com/danny-avila/LibreChat

Enhanced ChatGPT Clone: Features Anthropic, AWS, OpenAI, Assistants API, Azure, Groq, o1, GPT-4o, Mistral, OpenRouter, Vertex AI, Gemini, Artifacts, AI model switching, message search, langchain, DALL-E-3, ChatGPT Plugins, OpenAI Functions, Secure Multi-User System, Presets, completely open-source for self-hosting. Actively in public development.

ai anthropic artifacts assistant-api aws azure chatgpt chatgpt-clone claude clone dall-e-3 gemini google librechat o1 openai plugins search vision webui

Last synced: 20 Mar 2025

https://github.com/mediar-ai/screenpipe

AI app store powered by 24/7 desktop history. open source | 100% local | dev friendly | 24/7 screen, mic recording

agents agi ai computer-vision llm machine-learning ml multimodal vision

Last synced: 13 May 2025

https://github.com/skyvern-ai/skyvern

Automate browser-based workflows with LLMs and Computer Vision

api automation browser browser-automation computer gpt llm playwright python rpa vision workflow

Last synced: 12 May 2025

https://github.com/bytedance/ui-tars-desktop

A GUI Agent application based on UI-TARS(Vision-Language Model) that allows you to control your computer using natural language.

agent browser-use computer-use electron gui-agents mcp mcp-server vision vite vlm

Last synced: 09 Sep 2025

https://github.com/Skyvern-AI/skyvern

Automate browser-based workflows with LLMs and Computer Vision

api automation browser browser-automation computer gpt llm playwright python rpa vision workflow

Last synced: 07 Apr 2025

https://github.com/Skyvern-AI/Skyvern

Automate browser-based workflows with LLMs and Computer Vision

api automation browser browser-automation computer gpt llm playwright python rpa vision workflow

Last synced: 09 Mar 2025

https://github.com/dooy/chatgpt-web-midjourney-proxy

One UI is all done with chatgpt web, midjourney, gpts,suno,luma,runway,viggle,flux,ideogram,realtime,pika,udio; Simultaneous support Web / PWA / Linux / Win / MacOS platform

chatgpt-ui claude-3 flux gpts gpts-ui gptstore ideogram kling luma midjourney midjourney-ui pika realtime runway suno udio viggle vision whisper-ui

Last synced: 23 Apr 2025

https://github.com/TEN-framework/ten-framework

The world’s first real-time, distributed, cloud-edge collaborative multimodal AI Agent Framework that simultaneously supports C/C++/Go/Python/JS/TS

agents ai audio-video cloud-edge-computing cpp cross-platform go golang javascript llm low-latency multimodal package-management python realtime rust typescript vision voice-assistant

Last synced: 03 May 2025

https://github.com/ten-framework/ten-agent

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking, and is fully compatible with platforms like Dify and Coze.

agent ai asr cpp gemini golang gpt-4 gpt-4o llm low-latency multimodal nextjs14 openai python rag real-time realtime tts vision voice-assistant

Last synced: 31 Mar 2025

https://github.com/TEN-framework/TEN-Agent

TEN Agent is a conversational voice AI agent powered by TEN, integrating Deepseek, Gemini, OpenAI, RTC, and hardware like ESP32. It enables realtime AI capabilities like seeing, hearing, and speaking, and is fully compatible with platforms like Dify and Coze.

agent ai asr cpp gemini golang gpt-4 gpt-4o llm low-latency multimodal nextjs14 openai python rag real-time realtime tts vision voice-assistant

Last synced: 08 Mar 2025

https://github.com/artemnovichkov/iOS-11-by-Examples

👨🏻‍💻 Examples of new iOS 11 APIs

arkit core-nfc coreml ios11 swift vision xcode9

Last synced: 12 May 2025

https://github.com/artemnovichkov/ios-11-by-examples

👨🏻‍💻 Examples of new iOS 11 APIs

arkit core-nfc coreml ios11 swift vision xcode9

Last synced: 15 May 2025

https://github.com/autorope/donkeycar

Open source hardware and software platform to build a small scale self driving car.

cv2 donkeycar jetson-nano keras python raspberry-pi self-driving-car tensorflow vision

Last synced: 29 Apr 2025

https://github.com/sightmachine/SimpleCV

The Open Source Framework for Machine Vision

computer-vision cv image-processing python vision visionprocessing

Last synced: 18 Mar 2025

https://github.com/googlecloudplatform/java-docs-samples

Java and Kotlin Code samples used on cloud.google.com

appengine auth automl cdn java kotlin samples translate video vision

Last synced: 13 May 2025

https://github.com/roatienza/deep-learning-experiments

Videos, notes and experiments to understand deep learning

artificial-intelligence deep-learning deep-learning-tutorial nlp pytorch speech vision

Last synced: 14 May 2025

https://github.com/roatienza/Deep-Learning-Experiments

Videos, notes and experiments to understand deep learning

artificial-intelligence deep-learning deep-learning-tutorial nlp pytorch speech vision

Last synced: 27 Mar 2025

https://github.com/jenly1314/mlkit

🌝 MLKit是一个强大易用的工具包。通过ML Kit您可以很轻松的实现文字识别、条码识别、图像标记、人脸检测、对象检测等功能。

android barcode barcode-scanning camerax face-detection image-labeling machine-learning machine-learning-library mlkit object-detection object-recognition ocr pose-detection qrcode recognition segmentation-selfie text-recognition vision

Last synced: 14 May 2025

https://github.com/kevingong2013/chineseidcardocr

[Deprecated] 🇨🇳中国二代身份证光学识别

cnn coreml deep-learning ios11 machine-learning swift vision xcode

Last synced: 13 Apr 2025

https://github.com/KevinGong2013/ChineseIDCardOCR

[Deprecated] 🇨🇳中国二代身份证光学识别

cnn coreml deep-learning ios11 machine-learning swift vision xcode

Last synced: 23 Apr 2025

https://github.com/aheze/OpenFind

An app to find text in real life.

app camera find hacktoberfest ios ocr photos realm swift swiftui uikit vision

Last synced: 27 Mar 2025

https://github.com/aheze/openfind

An app to find text in real life.

app camera find hacktoberfest ios ocr photos realm swift swiftui uikit vision

Last synced: 12 Apr 2025

https://github.com/lucidrains/mlp-mixer-pytorch

An All-MLP solution for Vision, from Google AI

deep-learning vision

Last synced: 15 May 2025

https://github.com/andyzeng/visual-pushing-grasping

Train robotic agents to learn to plan pushing and grasping actions for manipulation with deep reinforcement learning.

3d artificial-intelligence computer-vision deep-learning deep-reinforcement-learning grasping manipulation pushing robotics vision

Last synced: 16 May 2025

https://github.com/aravisproject/aravis

A vision library for genicam based cameras

c camera genicam gige glib gobject gobject-introspection gstreamer gtk3 meson usb3 video vision

Last synced: 12 Apr 2025

https://github.com/AravisProject/aravis

A vision library for genicam based cameras

c camera genicam gige glib gobject gobject-introspection gstreamer gtk3 meson usb3 video vision

Last synced: 02 Apr 2025

https://github.com/anupamchugh/iowncode

A curated collection of iOS, ML, AR resources sprinkled with some UI additions

alamofire arkit computer-vision coreml coremltools ios keras ml-kit natural-language-processing nlp realitykit swift swiftui vision vision-framework

Last synced: 22 Jul 2025

https://github.com/andyzeng/3dmatch-toolbox

3DMatch - a 3D ConvNet-based local geometric descriptor for aligning 3D meshes and point clouds.

3d 3d-deep-learning 3dmatch artificial-intelligence computer-vision deep-learning geometry-processing point-cloud rgbd vision

Last synced: 16 May 2025

https://github.com/Celebrandil/CudaSift

A CUDA implementation of SIFT for NVidia GPUs (1.2 ms on a GTX 1060)

cuda gpu nvidia sift vision

Last synced: 04 May 2025

https://github.com/evilgix/evil

Optical Character Recognition in Swift for iOS&macOS. 银行卡、身份证、门牌号光学识别

cnn-model keras machine-learning ocr swift4 vision

Last synced: 05 Apr 2025

https://github.com/evilgix/Evil

Optical Character Recognition in Swift for iOS&macOS. 银行卡、身份证、门牌号光学识别

cnn-model keras machine-learning ocr swift4 vision

Last synced: 15 May 2025

https://github.com/mostafasadeghi97/design2code

Convert any web design screenshot to clean HTML/CSS code

ai code-generation coding-assistant design-to-code gpt4 openai vision

Last synced: 03 Apr 2025

https://github.com/google-research/ravens

Train robotic agents to learn pick and place with deep learning for vision-based manipulation in PyBullet. Transporter Nets, CoRL 2020.

artificial-intelligence computer-vision deep-learning imitation-learning manipulation openai-gym pick-and-place pybullet rearrangement reinforcement-learning robotics tensorflow transporter-nets vision

Last synced: 16 May 2025

https://github.com/anki/vector-python-sdk

Anki Vector Python SDK

ai anki robot robotics vector vision

Last synced: 12 Apr 2025

https://github.com/robotlocomotion/pytorch-dense-correspondence

Code for "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation"

3d artificial-intelligence computer-vision deep-learning manipulation pytorch robotics self-supervised-learning vision

Last synced: 04 Apr 2025

https://github.com/RobotLocomotion/pytorch-dense-correspondence

Code for "Dense Object Nets: Learning Dense Visual Object Descriptors By and For Robotic Manipulation"

3d artificial-intelligence computer-vision deep-learning manipulation pytorch robotics self-supervised-learning vision

Last synced: 07 May 2025

https://github.com/davidbau/rewriting

Rewriting a Deep Generative Model, ECCV 2020 (oral). Interactive tool to directly edit the rules of a GAN to synthesize scenes with objects added, removed, or altered. Change StyleGANv2 to make extravagant eyebrows, or horses wearing hats.

deep-learning gans graphics hci machine-learning research vision

Last synced: 04 Apr 2025

https://github.com/rowanz/neural-motifs

Code for Neural Motifs: Scene Graph Parsing with Global Context (CVPR 2018)

pytorch scene-graph vision visual-genome

Last synced: 02 Apr 2025

https://github.com/hrnet/hrformer

[ NeurIPS2021] This is an official implementation of our paper "HRFormer: High-Resolution Transformer for Dense Prediction".

classification hrnet pose-estimation segmentation transformer vision

Last synced: 05 Apr 2025

https://github.com/HRNet/HRFormer

[ NeurIPS2021] This is an official implementation of our paper "HRFormer: High-Resolution Transformer for Dense Prediction".

classification hrnet pose-estimation segmentation transformer vision

Last synced: 12 May 2025

https://github.com/KimDarren/FaceCropper

:scissors: Crop faces, inside of your image, with iOS 11 Vision api.

face face-detection face-recognition ios ios11 swift vision vision-api

Last synced: 02 Aug 2025

https://github.com/ictnlp/llava-mini

LLaVA-Mini is a unified large multimodal model (LMM) that can support the understanding of images, high-resolution images, and videos in an efficient manner.

efficient gpt4o gpt4v large-language-models large-multimodal-models llama llava multimodal multimodal-large-language-models video vision vision-language-model visual-instruction-tuning

Last synced: 16 May 2025

https://github.com/myndex/sapc-apca

APCA (Accessible Perceptual Contrast Algorithm) is a new method for predicting contrast for use in emerging web standards (WCAG 3) for determining readability contrast. APCA is derived form the SAPC (S-LUV Advanced Predictive Color) which is an accessibility-oriented color appearance model designed for self-illuminated displays.

accessibility apca cieluv color color-contrast color-contrast-checker color-models color-theory colorimetry contrast contrast-calculator css luminance readability srgb vision wcag wcag-contrast web

Last synced: 16 May 2025

https://github.com/rowanz/r2c

Recognition to Cognition Networks (code for the model in "From Recognition to Cognition: Visual Commonsense Reasoning", CVPR 2019)

commonsense reasoning vcr vision visual visual-commonsense-reasoning

Last synced: 13 Apr 2025

https://github.com/tomrunia/opticalflow_visualization

Python optical flow visualization following Baker et al. (ICCV 2007) as used by the MPI-Sintel challenge

iccv motion opencv optical-flow python vision visualization

Last synced: 09 Apr 2025

https://github.com/zihangjiang/tokenlabeling

Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"

imagenet lv-vit pytorch segmentation transformer vision

Last synced: 04 Apr 2025

https://github.com/zihangJiang/TokenLabeling

Pytorch implementation of "All Tokens Matter: Token Labeling for Training Better Vision Transformers"

imagenet lv-vit pytorch segmentation transformer vision

Last synced: 05 May 2025

https://github.com/Myndex/SAPC-APCA

APCA (Accessible Perceptual Contrast Algorithm) is a new method for predicting contrast for use in emerging web standards (WCAG 3) for determining readability contrast. APCA is derived form the SAPC (S-LUV Advanced Predictive Color) which is an accessibility-oriented color appearance model designed for self-illuminated displays.

accessibility apca cieluv color color-contrast color-contrast-checker color-models color-theory colorimetry contrast contrast-calculator css luminance readability srgb vision wcag wcag-contrast web

Last synced: 07 May 2025

https://github.com/AprilRobotics/apriltag_ros

A ROS wrapper of the AprilTag 3 visual fiducial detector

apriltags fiducial-markers ros vision wrapper

Last synced: 05 May 2025

https://github.com/aprilrobotics/apriltag_ros

A ROS wrapper of the AprilTag 3 visual fiducial detector

apriltags fiducial-markers ros vision wrapper

Last synced: 12 Apr 2025

https://github.com/WPIRoboticsProjects/GRIP

Program for rapidly developing computer vision applications

camera computer-vision first-frc first-robotics-competition firstrobotics opencv robotics vision wpi

Last synced: 11 May 2025

https://github.com/wpiroboticsprojects/grip

Program for rapidly developing computer vision applications

camera computer-vision first-frc first-robotics-competition firstrobotics opencv robotics vision wpi

Last synced: 05 Apr 2025

https://github.com/photonvision/photonvision

PhotonVision is the free, fast, and easy-to-use computer vision solution for the FIRST Robotics Competition.

computer-vision frc java opencv vision vision-processing wpilib

Last synced: 16 Dec 2025

https://github.com/cocoa-ai/FacesVisionDemo

👀 iOS11 demo application for age and gender classification of facial images.

coreml coreml-models emotion-recognition facial-recognition gender-classification ios machine-learning swift swift4 vision

Last synced: 11 May 2025

https://github.com/cocoa-ai/facesvisiondemo

👀 iOS11 demo application for age and gender classification of facial images.

coreml coreml-models emotion-recognition facial-recognition gender-classification ios machine-learning swift swift4 vision

Last synced: 04 Aug 2025

https://github.com/andyzeng/arc-robot-vision

MIT-Princeton Vision Toolbox for Robotic Pick-and-Place at the Amazon Robotics Challenge 2017 - Robotic Grasping and One-shot Recognition of Novel Objects with Deep Learning.

3d amazon-robotics-challenge artificial-intelligence computer-vision deep-learning grasping manipulation mit-princeton rgbd vision

Last synced: 09 Apr 2025

https://github.com/andyzeng/apc-vision-toolbox

MIT-Princeton Vision Toolbox for the Amazon Picking Challenge 2016 - RGB-D ConvNet-based object segmentation and 6D object pose estimation.

3d amazon-picking-challenge artificial-intelligence computer-vision deep-learning marvin mit-princeton rgbd ros segmentation vision

Last synced: 09 Apr 2025

https://github.com/Feghal/ImageDetect

✂️ Detect and crop faces, barcodes and texts in image with iOS 11 Vision api.

barcode detector face face-detection face-recognition ios ios11 recognition swift vision vision-api

Last synced: 06 Aug 2025

https://github.com/cheind/dest

:panda_face: One Millisecond Deformable Shape Tracking Library (DEST)

face-alignment face-detector machine-learning vision

Last synced: 19 Jun 2025

https://github.com/harishdeivanayagam/rowfill

Open-source unstructured data (PDFs, Images, Audiofiles) processing platform built for knowledge workers

document document-extraction document-parsing image-ocr langgraph llama llm nextjs ocr ocr-javascript ollama openai pdf pdfs unstructured unstructured-data vision vision-api

Last synced: 13 Apr 2025

https://github.com/Olney1/ChatGPT-OpenAI-Smart-Speaker

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

agents ai artificial-intelligence chatgpt gpt-4 langchain langsmith openai smarthome smartspeaker speech-recognition speech-to-text tavily text-to-speech vision vision-and-language webscraping

Last synced: 07 Apr 2025

https://github.com/ori-mrg/robotcar-dataset-sdk

Software Development Kit for the Oxford Robotcar Dataset

datasets learning matlab python robotics vision website

Last synced: 06 Apr 2025

https://github.com/olney1/chatgpt-openai-smart-speaker

This AI Smart Speaker uses speech recognition, TTS (text-to-speech), and STT (speech-to-text) to enable voice and vision-driven conversations, with additional web search capabilities via OpenAI and Langchain agents.

agents ai artificial-intelligence chatgpt gpt-4 langchain langsmith openai smarthome smartspeaker speech-recognition speech-to-text tavily text-to-speech vision vision-and-language webscraping

Last synced: 03 Oct 2025

https://github.com/DroidsOnRoids/VisionFaceDetection

An example of use a Vision framework for face landmarks detection in iOS 11

ios11 landmark-detection landmarks vision vision-framework xcode9

Last synced: 22 Feb 2025

https://github.com/gabeur/mmt

Multi-Modal Transformer for Video Retrieval

fusion language multimodal nlp video vision

Last synced: 12 May 2025

https://github.com/fcakyon/craft-text-detector

Packaged, Pytorch-based, easy to use, cross-platform version of the CRAFT text detector

actions anaconda computer-vision craft deep-learning document hacktoberfest linux macos neural-network ocr pypi python pytorch text text-detection vision windows workflow

Last synced: 02 Apr 2025

https://github.com/rishikksh20/FNet-pytorch

Unofficial implementation of Google's FNet: Mixing Tokens with Fourier Transforms

feedforward-neural-network fnet fourier-transform image-classification language-model text text-classification transformer vision

Last synced: 08 May 2025

https://github.com/georgegach/flowiz

Converts Optical Flow files to images and optionally compiles them to a video. Flow viewer GUI is also available. Check out mockup right from Github Pages:

converter flo flow image middlebury optical python video vision visualisation visualization

Last synced: 04 Apr 2025

https://github.com/aangelopoulos/conformal_classification

Wrapper for a PyTorch classifier which allows it to output prediction sets. The sets are theoretically guaranteed to contain the true class with high probability (via conformal prediction).

artificial-intelligence classification classifier computer-vision conformal conformal-prediction deep-neural-networks distribution-free imagenet machine-learning neural-networks nonparametric nonparametric-statistics prediction-sets pytorch statistics uncertainty uncertainty-quantification vision

Last synced: 04 Apr 2025

https://github.com/zsajjad/react-native-text-detector

Text Detector from image for react native using firebase MLKit on android and Tesseract on iOS

core-ml firebase-mlkit react-native tesseract-ios tesseract-ocr text-detection vision

Last synced: 13 May 2025