Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/jookie/captionai
Software as a Service (SaaS) platform leveraging the capabilities of GPT (Generative Pre-trained Transformer) technology. Upload image and generate the metadata/description and response to the validity of picture.
https://github.com/jookie/captionai
ml replicate upstash vercel-deployment
Last synced: 2 days ago
JSON representation
Software as a Service (SaaS) platform leveraging the capabilities of GPT (Generative Pre-trained Transformer) technology. Upload image and generate the metadata/description and response to the validity of picture.
- Host: GitHub
- URL: https://github.com/jookie/captionai
- Owner: jookie
- License: mit
- Created: 2023-12-11T01:55:53.000Z (11 months ago)
- Default Branch: main
- Last Pushed: 2024-03-14T21:57:11.000Z (8 months ago)
- Last Synced: 2024-10-12T20:32:35.958Z (about 1 month ago)
- Topics: ml, replicate, upstash, vercel-deployment
- Language: TypeScript
- Homepage: https://jookie.github.io/captionai/
- Size: 60.1 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
![](assets/preview.png)
PhotoGPT
[ENGLISH](/README) | [HEBREW](/Hebrew) | [TECHNOLOGY](/TECHNICAL) |
A platform leveraging the Training with capabilities of OpenCV and usage of pretrained models with variouse GPT technology.
Software as a Service (SaaS)
PhotoChat is a Software as a Service (SaaS) platform that integrates OpenCV's powerful computer vision capabilities with pretrained models utilizing various GPT technologies.
It facilitates advanced functionalities such as face detection and recognition using OpenCV[[2](https://www.linkedin.com/pulse/building-advanced-face-detection-recognition-system-beoec)],
while also leveraging GPT technology for other tasks such as natural language understanding and generation[[1](https://platform.openai.com/docs/models)].
This platform offers a comprehensive solution for applications requiring sophisticated image analysis and natural language processing capabilities.## 🌐 Sources
1. [Models - OpenAI API](https://platform.openai.com/docs/models)
2. [Building an Advanced Face Detection and Recognition System](https://www.linkedin.com/pulse/building-advanced-face-detection-recognition-system-beoec)## Introduction
Create a private ChatGPT website with one-click for free using Vercel, support muti **text** / **images generation** conversations. Powered by OpenAI API GPT-4/3.5 and Vercel.## Features
- ⚡ Deploy quickly and for free using Vercel
- 💬 Text conversation with the ability to switch models and set context length
- 🎨 Image generation conversation supports the `DALL-E` and `Midjourney` models. It also allows for the adjustment of image size and count.
- 🌈 Multiple preset prompts added to customize AI behavior
- 🌏 Switch between various languages, currently supporting Simplified Chinese and English
- 💭 Local chat history saved with search, import and export functionality## 🗒️ Abstract
PhotoChat is a Software as a Service (SaaS) platform that harnesses the mathematics capabilities of OpenCV and GPT technology to enhance image understanding and interaction.1. **Metadata/Description Generation:** Utilizing GPT technology, PhotoChat generates comprehensive metadata and descriptions for uploaded images, improving content understanding and discoverability[[1](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)].
2. **Validity Assessment of Pictures:** With GPT's capabilities, the platform assesses the validity of uploaded images, identifying authenticity, manipulation, or misleading content. This aids users in making informed decisions based on image credibility[[4].
3. **Description Generation and Conversational Interaction:** PhotoChat enables users to generate human-like text and content, as well as answer questions in a conversational manner, leveraging GPT technology[[1]. Users can upload images to receive metadata/descriptions and text responses regarding image validity.
4. **Text Response to Picture Validity:** By analyzing uploaded images using GPT technology, PhotoChat provides insights into the image's content, context, and potential authenticity, assisting users in making informed decisions about image credibility and relevance[[1]
## 🗒️ Abstract
1. **Metadata/Description Generation**: The SaaS platform utilizes GPT technology to generate comprehensive metadata and descriptions for uploaded images, enhancing content understanding and discoverability [[1](https://aws.amazon.com/what-is/gpt/)].
2. **Validity Assessment of Pictures**: By leveraging GPT's capabilities, the platform analyzes uploaded images to assess their validity, providing insights into whether the content is authentic, manipulated, or potentially misleading. This assessment aids users in making informed decisions based on the credibility of the visual content [[4](https://www.linkedin.com/pulse/generative-pre-trained-transformer-gpt-enterprise-akheleash-raghuram)].
1. **Description**: This SaaS platform harnesses the power of GPT technology, enabling users to generate human-like text and content, as well as answer questions in a conversational manner[[1](https://aws.amazon.com/what-is/gpt/)]. Users can upload images to generate metadata/descriptions and receive text responses regarding the validity of the picture.
2. **Text Response to Picture Validity**: By uploading an image, the platform employs GPT technology to analyze and assess its validity. The generated text response provides insights into the image's content, context, and potential authenticity, helping users make informed decisions about the picture's credibility and relevance.3. **OpenCV (Open Source Computer Vision Library)**:
It a go-to tool for developers working on computer vision projects[[4](https://viso.ai/computer-vision/opencv/)].
The library's is closely tied to the AI and ML technologies.
AR/VR, and autonomous systems[[2](https://medium.com/@lotfi-habbiche/the-future-of-opencv-emerging-trends-and-technologies-27f2133df4b9)].
dditionally, its integration with deep learning frameworks like TensorFlow enhances its capabilities and keeps it at the forefront of innovation[[5](https://www.linkedin.com/pulse/unveiling-power-opencv-exploring-its-applications-world-dheeraj-kumar)].1. **Virtual Reality (VR):** VR creates a fully immersive digital environment that isolates users from the real world. Users typically wear VR headsets and headphones to experience simulations generated by computers. This technology aims to provide an entirely synthetic experience, often used for gaming, simulations, training, and entertainment purposes[[5](https://www.forbes.com/sites/ariannajohnson/2023/06/02/augmented-reality-ar-vs-virtual-reality-vr-whats-the-difference-and-how-do-they-work/)].
2. **Augmented Reality (AR):** AR overlays digital information or objects onto the real world, enhancing the user's perception of reality. Unlike VR, AR does not replace the real world but supplements it with virtual elements. AR can be experienced through devices like smartphones, tablets, or AR glasses, allowing users to interact with virtual content while still being aware of their physical surroundings. Applications of AR include navigation, education, retail, healthcare, and more[[6]
# train models with cvat
## 🗒️ Answer
CVAT (Computer Vision Annotation Tool) is primarily used for annotating images and videos to create datasets for training computer vision models. While it doesn't directly train models, it provides the necessary annotations for model training. Once you've annotated your data using CVAT, you can use various machine learning frameworks such as TensorFlow, PyTorch, or OpenCV to train your computer vision models. These frameworks offer a wide range of pre-trained models and tools for training custom models using annotated data from CVAT [[1](https://www.trainworld.com/)][[2](https://www.walthers.com/)][[3](https://modeltrainmarket.com/)][[4](https://www.lionelstore.com/category/Train-Sets)][[5](https://www.trains.com/mrr/beginners/model-railroading-the-beginners-guide/)][[6](https://www.horizonhobby.com/trains/)].
# Explain How to Train Models with CVAT
CVAT (Computer Vision Annotation Tool) primarily focuses on annotating images and videos, but it can also facilitate model training through the following steps:
1. **Annotation**: Utilize CVAT to annotate images and videos with bounding boxes, polygons, keypoints, and other annotation types. These annotations serve as ground truth labels for training data [[2](https://blog.roboflow.com/cvat/)].
2. **Data Preparation**: Once annotations are complete, export the annotated data in formats compatible with popular deep learning frameworks such as TensorFlow or PyTorch. This step involves organizing the data into training, validation, and testing sets [[4](https://www.v7labs.com/blog/cvat-guide)].
3. **Model Training**: Use the annotated datasets prepared in CVAT to train your computer vision models. You can employ a variety of deep learning techniques such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs) depending on the task at hand [[6](https://medium.com/@eng.fadishaar/automating-object-annotation-in-cvat-using-a-custom-yolov5-model-cfd36fb40a97)].
4. **Model Evaluation**: After training, evaluate the performance of your models using metrics such as precision, recall, and F1 score. CVAT can assist in visualizing model predictions alongside ground truth annotations for qualitative analysis [[6](https://medium.com/@eng.fadishaar/automating-object-annotation-in-cvat-using-a-custom-yolov5-model-cfd36fb40a97)].
5. **Iteration and Improvement**: Iterate on the training process by analyzing model performance, identifying areas for improvement, and refining annotations if necessary. CVAT's flexible annotation capabilities allow for iterative model training to achieve optimal results [[6](https://medium.com/@eng.fadishaar/automating-object-annotation-in-cvat-using-a-custom-yolov5-model-cfd36fb40a97)].
By leveraging CVAT for annotation and data preparation, you can streamline the model training pipeline, leading to more efficient and accurate computer vision solutions.
## 🌐 Sources
1. [CVAT](https://www.cvat.ai/)
2. [How to Use the CVAT Annotation Tool [2023]](https://blog.roboflow.com/cvat/)
3. [CVAT: Annotation Tool for Computer Vision [2023 Tutorial]](https://www.v7labs.com/blog/cvat-guide)
4. [Automating Object Annotation in CVAT using a Custom ...](https://medium.com/@eng.fadishaar/automating-object-annotation-in-cvat-using-a-custom-yolov5-model-cfd36fb40a97)## How it works
Developed this using [this template.](https://github.com/Nutlope/restorePhotos/tree/1c5c8ac4f52a08f68a3091d3b21be8a65aef71f2)It uses an ML model from salesforce called [BLIP](https://github.com/salesforce/BLIP) on [Replicate](https://replicate.com/) to convert images into text. This application gives you the ability to upload any photo, which will send it through this ML Model using a Next.js API route, and return your caption.
## Running Locally
### Cloning the repository the local machine.
```bash
git clone
```### Creating a account on Replicate to get an API key.
1. Go to [Replicate](https://replicate.com/) to make an account.
2. Click on your profile picture in the top right corner, and click on "Dashboard".
3. Click on "Account" in the navbar. And, here you can find your API token, copy it.### Storing API key in .env file.
Create a file in root directory of project with env. And store your API key in it, as shown in the .example.env file.
If you'd also like to do rate limiting, create an account on UpStash, create a Redis database, and populate the two environment variables in `.env` as well. If you don't want to do rate limiting, you don't need to make any changes.
### Installing the dependencies.
```bash
npm install
```### Running the application.
Then, run the application in the command line and it will be available at `http://localhost:3000`.
```bash
npm run dev
```
## DeployWhen deploying on Vercel also include the Environmentable Variables.
## Powered by
This example is powered by the following 3 services:
- [Replicate](https://replicate.com) (AI API)
- [Upload](https://upload.io) (storage)
- [Upstash Redis](https://docs.upstash.com/redis) (Rate Limiting)
- [Vercel](https://vercel.com) (hosting, serverless functions, analytics)
- [convex1](https://stack.convex.dev/full-stack-chatgpt-app)
- [convex2](https://dashboard.convex.dev/t/jookie)## OpenCV Overview:
1. **Core Functionality (core):** Defines basic data structures like Mat and fundamental functions for other modules[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].
2. **Image Processing (imgproc):** Includes linear/non-linear image filtering, geometrical transformations, color space conversion, histograms, etc[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].
3. **Video Analysis (video):** Covers motion estimation, background subtraction, and object tracking[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].
4. **Camera Calibration and 3D Reconstruction (calib3d):** Provides tools for camera calibration, stereo vision, and 3D reconstruction[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].
5. **2D Features Framework (features2d):** Offers feature detection, descriptors, and matching algorithms[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].
6. **Object Detection (objdetect):** Detects predefined objects like faces, eyes, cars, etc[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].
7. **High-level GUI (highgui):** Provides an interface for simple user interactions[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].
8. **Video I/O (videoio):** Facilitates video capturing and codec management[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].
9. **Additional Modules:** Includes FLANN, Google Test wrappers, Python bindings, and more[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].## 🌐 Sources
1. [Models - OpenAI API](https://platform.openai.com/docs/models)
2. [Building an Advanced Face Detection and Recognition System](https://www.linkedin.com/pulse/building-advanced-face-detection-recognition-system-beoec)## 🌐 Sources
- [aws.amazon.com - What is GPT AI?](https://aws.amazon.com/what-is/gpt/)
- [arxiv.org - GPT (Generative Pre-trained Transformer)](https://arxiv.org/pdf/2305.10435)
- [taskus.com - What are Generative Pre-trained Transformers (GPT)?](https://www.taskus.com/insights/what-is-gpt/)
- [linkedin.com - Generative Pre-trained Transformer (GPT) for Enterprise](https://www.linkedin.com/pulse/generative-pre-trained-transformer-gpt-enterprise-akheleash-raghuram)
- [mdpi.com - Generative Pre-Trained Transformer (GPT) in Research](https://www.mdpi.com/2078-2489/15/2/99)
- [leewayhertz.com - How to build a generative AI solution?](https://www.leewayhertz.com/how-to-build-a-generative-ai-solution/)1. [opencv.org - OpenCV: Introduction](https://docs.opencv.org/4.x/d1/dfb/intro.html)
1. **Metadata/Description Generation:** Utilizing GPT technology, PhotoChat generates comprehensive metadata and descriptions for uploaded images, improving content understanding and discoverability[[1](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)].
2. **Validity Assessment of Pictures:** With GPT's capabilities, the platform assesses the validity of uploaded images, identifying authenticity, manipulation, or misleading content. This aids users in making informed decisions based on image credibility[[4](https://stackoverflow.com/questions/50276424/cant-keep-image-exif-data-when-editing-it-with-opencv-in-python)].
3. **Description Generation and Conversational Interaction:** PhotoChat enables users to generate human-like text and content, as well as answer questions in a conversational manner, leveraging GPT technology[[1](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)]. Users can upload images to receive metadata/descriptions and text responses regarding image validity.
4. **Text Response to Picture Validity:** By analyzing uploaded images using GPT technology, PhotoChat provides insights into the image's content, context, and potential authenticity, assisting users in making informed decisions about image credibility and relevance[[1](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)].
## 🌐 Sources
1. [Medium - Tunnel vision in computer vision: can ChatGPT see?](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)
2. [Stack Overflow - Can't keep image exif data when editing it with opencv in python](https://stackoverflow.com/questions/50276424/cant-keep-image-exif-data-when-editing-it-with-opencv-in-python)
3. [OpenCV - Open Computer Vision Library](https://www.opencv.ai/blog/getting-the-hang-of-opencvs-inner-workings-with-chatgpt)
4. [Forbes - Augmented Reality (AR) Vs. Virtual Reality (VR)](https://www.forbes.com/sites/ariannajohnson/2023/06/02/augmented-reality-ar-vs-virtual-reality-vr-whats-the-difference-and-how-do-they-work/)
5. [TeamViewer - Augmented Reality (AR) vs Virtual Reality (VR)](https://www.teamviewer.com/en-us/augmented-reality-ar-vs-virtual-reality-vr/)## 🌐 Sources
1\. [Train Set | Model Railroad | Model Train Stuff | TrainWorld](https://www.trainworld.com/)
2\. [Walthers | Model Railroading | Ho Scale Trains, Scenery ...](https://www.walthers.com/)
3\. [Model Trains For Your Layout | High-Quality Model Trains ...](https://modeltrainmarket.com/)
4\. [Lionel Model Trains: All Train Sets](https://www.lionelstore.com/category/Train-Sets)
5\. [Getting Started in Model Railroading: The Beginner's Guide - Trains.com](https://www.trains.com/mrr/beginners/model-railroading-the-beginners-guide/)
6\. [Model Trains - Horizon Hobby](https://www.horizonhobby.com/trains/)