https://github.com/jookie/photodrone

Streaming ChatGPT via the OpenAI Node to generates text information from your Image
https://github.com/jookie/photodrone
openai-api opencv
Last synced: about 1 month ago
JSON representation
Streaming ChatGPT via the OpenAI Node to generates text information from your Image
Host: GitHub
URL: https://github.com/jookie/photodrone
Owner: jookie
Created: 2024-03-16T12:38:24.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-05-26T13:55:56.000Z (about 2 years ago)
Last Synced: 2025-01-15T14:41:04.785Z (over 1 year ago)
Topics: openai-api, opencv
Language: TypeScript
Homepage: https://photo-drone.vercel.app
Size: 222 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README-MAIN.md
Awesome Lists containing this project

README

          

 ![Example](assets/preview.png)

Photo Chat GPT



  

    

  

  

  

    

  



[ENGLISH](/README) | [HEBREW](/doc/READ_ME/README_DEMO) | [TECHNOLOGY](/doc/READ_ME/README_TECH) | 



 A platform leveraging the Training with capabilities of OpenCV and usage of pretrained models with variouse GPT technology. 
 

 Software as a Service (SaaS)



# Software as a Service (SaaS)

## 🗒️ Answer

1. Software as a Service (SaaS) cloud computing model where your software applications are hosted and provided to users over the internet, typically on a subscription basis [[1](https://www.techtarget.com/searchcloudcomputing/definition/Software-as-a-Service)]. 

2. Users access SaaS applications through web browsers, mobile apps, or thin clients, eliminating the need for on-premises software installation and maintenance [[2](https://www.ibm.com/topics/saas)]. 

## 🌐 Sources

- [techtarget.com - What is SaaS (Software as a Service)? Everything You ...](https://www.techtarget.com/searchcloudcomputing/definition/Software-as-a-Service)

- [ibm.com - What Is Software as a Service (SaaS)?What is SaaS? Software as a ServiceWhat is SaaS (Software as a Service)?What Is Software as a Service (SaaS)? Definition and ...What Is SaaS? - Software as a Service DefinitionSoftware as a Service (SaaS) - Cloud Information CenterWhat is SaaS? | SaaS definitionWhat is SaaS? - Software as a Service Explained](https://www.ibm.com/topics/saas)

Integrates OpenCV's powerful computer vision capabilities with pretrained models utilizing various GPT technologies. 

Harnesses the mathematics capabilities of OpenCV and GPT technology to enhance image understanding and interaction.

It facilitates advanced functionalities such as face detection and recognition using OpenCV[[2](https://www.linkedin.com/pulse/building-advanced-face-detection-recognition-system-beoec), 

while also leveraging GPT technology for other tasks such as natural language understanding and generation[[1](https://platform.openai.com/docs/models). 

This platform offers a comprehensive solution for applications requiring sophisticated image analysis and natural language processing capabilities.

## 🌐 Sources

1. [Models - OpenAI API](https://platform.openai.com/docs/models)

2. [Building an Advanced Face Detection and Recognition System](https://www.linkedin.com/pulse/building-advanced-face-detection-recognition-system-beoec)

## Features

- ⚡ Deploy quickly and for free using Vercel

- 💬 Text conversation with the ability to switch models and set context length

- 🎨 Image generation conversation supports the `DALL-E` and `Midjourney` models. It also allows for the adjustment of image size and count.

- 🌈 Multiple preset prompts added to customize AI behavior

- 🌏 Switch between various languages, currently supporting Simplified Hebrew and English

- 💭 Local chat history saved with search, import and export functionality

1. **Metadata/Description Generation**: The SaaS platform utilizes GPT technology to generate comprehensive metadata and descriptions for uploaded images, enhancing content understanding and discoverability [[1](https://aws.amazon.com/what-is/gpt/)].

2. **Validity Assessment of Pictures**: By leveraging GPT's capabilities, the platform analyzes uploaded images to assess their validity, providing insights into whether the content is authentic, manipulated, or potentially misleading. This assessment aids users in making informed decisions based on the credibility of the visual content [[4](https://www.linkedin.com/pulse/generative-pre-trained-transformer-gpt-enterprise-akheleash-raghuram)].

1. **Description**: This SaaS platform harnesses the power of GPT technology, enabling users to generate human-like text and content, as well as answer questions in a conversational manner[[1](https://aws.amazon.com/what-is/gpt/)]. Users can upload images to generate metadata/descriptions and receive text responses regarding the validity of the picture.

   

2. **Text Response to Picture Validity**: By uploading an image, the platform employs GPT technology to analyze and assess its validity. The generated text response provides insights into the image's content, context, and potential authenticity, helping users make informed decisions about the picture's credibility and relevance.

3. **OpenCV (Open Source Computer Vision Library)**:

It a go-to tool for developers working on computer vision projects[[4](https://viso.ai/computer-vision/opencv/)]. 

The library's  is closely tied to the AI and ML technologies. 

AR/VR, and autonomous systems[[2](https://medium.com/@lotfi-habbiche/the-future-of-opencv-emerging-trends-and-technologies-27f2133df4b9)]. 

dditionally, its integration with deep learning frameworks like TensorFlow enhances its capabilities and keeps it at the forefront of innovation[[5](https://www.linkedin.com/pulse/unveiling-power-opencv-exploring-its-applications-world-dheeraj-kumar)].

1. **Virtual Reality (VR):** VR creates a fully immersive digital environment that isolates users from the real world. Users typically wear VR headsets and headphones to experience simulations generated by computers. This technology aims to provide an entirely synthetic experience, often used for gaming, simulations, training, and entertainment purposes[[5](https://www.forbes.com/sites/ariannajohnson/2023/06/02/augmented-reality-ar-vs-virtual-reality-vr-whats-the-difference-and-how-do-they-work/)].

2. **Augmented Reality (AR):** AR overlays digital information or objects onto the real world, enhancing the user's perception of reality. Unlike VR, AR does not replace the real world but supplements it with virtual elements. AR can be experienced through devices like smartphones, tablets, or AR glasses, allowing users to interact with virtual content while still being aware of their physical surroundings. Applications of AR include navigation, education, retail, healthcare, and more[[6]

# Explain How to Train Models with CVAT

CVAT (Computer Vision Annotation Tool) primarily focuses on annotating images and videos, It facilitate model training through the following steps:

1. **Annotation**: Utilize CVAT to annotate images and videos with bounding boxes, polygons, keypoints, and other annotation types. These annotations serve as ground truth labels for training data [[2](https://blog.roboflow.com/cvat/)].

2. **Data Preparation**: Once annotations are complete, export the annotated data in formats compatible with popular deep learning frameworks such as TensorFlow or PyTorch. This step involves organizing the data into training, validation, and testing sets [[4](https://www.v7labs.com/blog/cvat-guide)].

3. **Model Training**: Use the annotated datasets prepared in CVAT to train your computer vision models. You can employ a variety of deep learning techniques such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs) depending on the task at hand [[6](https://medium.com/@eng.fadishaar/automating-object-annotation-in-cvat-using-a-custom-yolov5-model-cfd36fb40a97)].

4. **Model Evaluation**: After training, evaluate the performance of your models using metrics such as precision, recall, and F1 score. CVAT can assist in visualizing model predictions alongside ground truth annotations for qualitative analysis [[6](https://medium.com/@eng.fadishaar/automating-object-annotation-in-cvat-using-a-custom-yolov5-model-cfd36fb40a97)].

5. **Iteration and Improvement**: Iterate on the training process by analyzing model performance, identifying areas for improvement, and refining annotations if necessary. CVAT's flexible annotation capabilities allow for iterative model training to achieve optimal results [[6](https://medium.com/@eng.fadishaar/automating-object-annotation-in-cvat-using-a-custom-yolov5-model-cfd36fb40a97)].

By leveraging CVAT for annotation and data preparation, you can streamline the model training pipeline, leading to more efficient and accurate computer vision solutions.

## 🌐 Sources

1. [CVAT](https://www.cvat.ai/)

2. [How to Use the CVAT Annotation Tool [2023]](https://blog.roboflow.com/cvat/)

3. [CVAT: Annotation Tool for Computer Vision [2023 Tutorial]](https://www.v7labs.com/blog/cvat-guide)

4. [Automating Object Annotation in CVAT using a Custom ...](https://medium.com/@eng.fadishaar/automating-object-annotation-in-cvat-using-a-custom-yolov5-model-cfd36fb40a97)

## How it works

Developed this using [this template.](https://github.com/Nutlope/restorePhotos/tree/1c5c8ac4f52a08f68a3091d3b21be8a65aef71f2)

It uses an ML model from salesforce called [BLIP](https://github.com/salesforce/BLIP) on [Replicate](https://replicate.com/) to convert images into text. This application gives you the ability to upload any photo, which will send it through this ML Model using a Next.js API route, and return your caption.

## Running Locally

### Cloning the repository the local machine.

```bash

git clone

```

### Creating a account on Replicate to get an API key.

1. Go to [Replicate](https://replicate.com/) to make an account.

2. Click on your profile picture in the top right corner, and click on "Dashboard".

3. Click on "Account" in the navbar. And, here you can find your API token, copy it.

### Storing API key in .env file.

Create a file in root directory of project with env. And store your API key in it, as shown in the .example.env file.

If you'd also like to do rate limiting, create an account on UpStash, create a Redis database, and populate the two environment variables in `.env` as well. If you don't want to do rate limiting, you don't need to make any changes.

### Installing the dependencies.

```bash

npm install

```

### Running the application.

Then, run the application in the command line and it will be available at `http://localhost:3000`.

```bash

npm run dev

```

 

## Deploy

When deploying on Vercel also include the Environmentable Variables.

## Powered by

This example is powered by the following 3 services:

- [Replicate](https://replicate.com) (AI API)

- [Upload](https://upload.io) (storage)

- [Upstash Redis](https://docs.upstash.com/redis) (Rate Limiting)

- [Vercel](https://vercel.com) (hosting, serverless functions, analytics)

- [convex1](https://stack.convex.dev/full-stack-chatgpt-app)

- [convex2](https://dashboard.convex.dev/t/jookie)

## OpenCV Overview:

1. **Core Functionality (core):** Defines basic data structures like Mat and fundamental functions for other modules[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

2. **Image Processing (imgproc):** Includes linear/non-linear image filtering, geometrical transformations, color space conversion, histograms, etc[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

3. **Video Analysis (video):** Covers motion estimation, background subtraction, and object tracking[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

4. **Camera Calibration and 3D Reconstruction (calib3d):** Provides tools for camera calibration, stereo vision, and 3D reconstruction[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

5. **2D Features Framework (features2d):** Offers feature detection, descriptors, and matching algorithms[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

6. **Object Detection (objdetect):** Detects predefined objects like faces, eyes, cars, etc[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

7. **High-level GUI (highgui):** Provides an interface for simple user interactions[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

8. **Video I/O (videoio):** Facilitates video capturing and codec management[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

9. **Additional Modules:** Includes FLANN, Google Test wrappers, Python bindings, and more[[4](https://docs.opencv.org/4.x/d1/dfb/intro.html)].

## 🌐 Sources

1. [Models - OpenAI API](https://platform.openai.com/docs/models)

2. [Building an Advanced Face Detection and Recognition System](https://www.linkedin.com/pulse/building-advanced-face-detection-recognition-system-beoec)

## 🌐 Sources

- [aws.amazon.com - What is GPT AI?](https://aws.amazon.com/what-is/gpt/)

- [arxiv.org - GPT (Generative Pre-trained Transformer)](https://arxiv.org/pdf/2305.10435)

- [taskus.com - What are Generative Pre-trained Transformers (GPT)?](https://www.taskus.com/insights/what-is-gpt/)

- [linkedin.com - Generative Pre-trained Transformer (GPT) for Enterprise](https://www.linkedin.com/pulse/generative-pre-trained-transformer-gpt-enterprise-akheleash-raghuram)

- [mdpi.com - Generative Pre-Trained Transformer (GPT) in Research](https://www.mdpi.com/2078-2489/15/2/99)

- [leewayhertz.com - How to build a generative AI solution?](https://www.leewayhertz.com/how-to-build-a-generative-ai-solution/)

1. [opencv.org - OpenCV: Introduction](https://docs.opencv.org/4.x/d1/dfb/intro.html)

1. **Metadata/Description Generation:** Utilizing GPT technology, PhotoChat generates comprehensive metadata and descriptions for uploaded images, improving content understanding and discoverability[[1](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)].

2. **Validity Assessment of Pictures:** With GPT's capabilities, the platform assesses the validity of uploaded images, identifying authenticity, manipulation, or misleading content. This aids users in making informed decisions based on image credibility[[4](https://stackoverflow.com/questions/50276424/cant-keep-image-exif-data-when-editing-it-with-opencv-in-python)].

3. **Description Generation and Conversational Interaction:** PhotoChat enables users to generate human-like text and content, as well as answer questions in a conversational manner, leveraging GPT technology[[1](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)]. Users can upload images to receive metadata/descriptions and text responses regarding image validity.

4. **Text Response to Picture Validity:** By analyzing uploaded images using GPT technology, PhotoChat provides insights into the image's content, context, and potential authenticity, assisting users in making informed decisions about image credibility and relevance[[1](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)].

## 🌐 Sources

1. [Medium - Tunnel vision in computer vision: can ChatGPT see?](https://medium.com/voxel51/tunnel-vision-in-computer-vision-can-chatgpt-see-e6ef037c535)

2. [Stack Overflow - Can't keep image exif data when editing it with opencv in python](https://stackoverflow.com/questions/50276424/cant-keep-image-exif-data-when-editing-it-with-opencv-in-python)

3. [OpenCV - Open Computer Vision Library](https://www.opencv.ai/blog/getting-the-hang-of-opencvs-inner-workings-with-chatgpt)

4. [Forbes - Augmented Reality (AR) Vs. Virtual Reality (VR)](https://www.forbes.com/sites/ariannajohnson/2023/06/02/augmented-reality-ar-vs-virtual-reality-vr-whats-the-difference-and-how-do-they-work/)

5. [TeamViewer - Augmented Reality (AR) vs Virtual Reality (VR)](https://www.teamviewer.com/en-us/augmented-reality-ar-vs-virtual-reality-vr/)

## 🌐 Sources

1\. [Train Set | Model Railroad | Model Train Stuff | TrainWorld](https://www.trainworld.com/)

2\. [Walthers | Model Railroading | Ho Scale Trains, Scenery ...](https://www.walthers.com/)

3\. [Model Trains For Your Layout | High-Quality Model Trains ...](https://modeltrainmarket.com/)

4\. [Lionel Model Trains: All Train Sets](https://www.lionelstore.com/category/Train-Sets)

5\. [Getting Started in Model Railroading: The Beginner's Guide - Trains.com](https://www.trains.com/mrr/beginners/model-railroading-the-beginners-guide/)

6\. [Model Trains - Horizon Hobby](https://www.horizonhobby.com/trains/)

# ChatGPT Convex demo

This example app demonstrates how to use the

[OpenAI ChatGPT API](https://platform.openai.com/docs/guides/chat) with

[Convex](https://convex.dev) to implement a chat. Convex stores the messages and

runs server-side functions to interact with OpenAI.

![Example](./example.png)

Features:

- You can chat and get responses from the Chat Open GPT api.

- You can start new threads to reset your conversation with Chat GPT.

- You can specify what the chat identity is, and change it mid-thread.

- You can make new identities.

- Inputs are checked for offensive input using the moderation api.

This uses [Convex actions](https://docs.convex.dev/using/actions) to make

requests to OpenAI's API.

## Running the App

Run:

```

npm install

npm run dev

```

This will create and configure a Convex project if you don't already have one.

### OpenAI API Setup

Create a free account on openai.com and create your

[OpenAI API secret key](https://platform.openai.com/account/api-keys), and set it as

an [environment variable](https://docs.convex.dev/using/environment-variables)

with the name `OPENAI_API_KEY` via the

[Convex dashboard](https://dashboard.convex.dev/).

Then visit [localhost:3000](http://localhost:3000).

## Identities

You can add identities to talk to. I added:

**Rubber Duck**

> You are curious and respond with helpful one-sentence questions.

**Supportive Friend**

> You are a supportive and curious best friend who validates feelings and experiences and will give advice only when asked for it. You give short responses and ask questions to learn more.

**CS Coach**

> You are a highly technically trained coach with expertise in technology and best practices for developing software. Respond with concise, precise messages and ask clarifying questions when things are unclear.

> 

>

> ## Training NVIDIA Merlin:

>NVIDIA Merlin library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.

Accelerates training systems on NVIDIA GPUs. 

The library enables data scientists, machine learning engineers, and researchers to build high-performing apps. 

# What is Convex?

[Convex](https://convex.dev) is a hosted backend platform with a

built-in database that lets you write your

[database schema](https://docs.convex.dev/database/schemas) and

[server functions](https://docs.convex.dev/functions) in

[TypeScript](https://docs.convex.dev/typescript). Server-side database

[queries](https://docs.convex.dev/functions/query-functions) automatically

[cache](https://docs.convex.dev/functions/query-functions#caching--reactivity) and

[subscribe](https://docs.convex.dev/client/react#reactivity) to data, powering a

[realtime `useQuery` hook](https://docs.convex.dev/client/react#fetching-data) in our

[React client](https://docs.convex.dev/client/react). There are also

[Python](https://docs.convex.dev/client/python),

[Rust](https://docs.convex.dev/client/rust),

[ReactNative](https://docs.convex.dev/client/react-native), and

[Node](https://docs.convex.dev/client/javascript) clients, as well as a straightforward

[HTTP API](https://github.com/get-convex/convex-js/blob/main/src/browser/http_client.ts#L40).

The database support

[NoSQL-style documents](https://docs.convex.dev/database/document-storage) with

[relationships](https://docs.convex.dev/database/document-ids) and

[custom indexes](https://docs.convex.dev/database/indexes/)

(including on fields in nested objects).

The

[`query`](https://docs.convex.dev/functions/query-functions) and

[`mutation`](https://docs.convex.dev/functions/mutation-functions) server functions have transactional,

low latency access to the database and leverage our

[`v8` runtime](https://docs.convex.dev/functions/runtimes) with

[determinism guardrails](https://docs.convex.dev/functions/runtimes#using-randomness-and-time-in-queries-and-mutations)

to provide the strongest ACID guarantees on the market:

immediate consistency,

serializable isolation, and

automatic conflict resolution via

[optimistic multi-version concurrency control](https://docs.convex.dev/database/advanced/occ) (OCC / MVCC).

The [`action` server functions](https://docs.convex.dev/functions/actions) have

access to external APIs and enable other side-effects and non-determinism in

either our

[optimized `v8` runtime](https://docs.convex.dev/functions/runtimes) or a more

[flexible `node` runtime](https://docs.convex.dev/functions/runtimes#nodejs-runtime).

Functions can run in the background via

[scheduling](https://docs.convex.dev/scheduling/scheduled-functions) and

[cron jobs](https://docs.convex.dev/scheduling/cron-jobs).

Development is cloud-first, with

[hot reloads for server function](https://docs.convex.dev/cli#run-the-convex-dev-server) editing via the

[CLI](https://docs.convex.dev/cli). There is a

[dashbord UI](https://docs.convex.dev/dashboard) to

[browse and edit data](https://docs.convex.dev/dashboard/deployments/data),

[edit environment variables](https://docs.convex.dev/production/environment-variables),

[view logs](https://docs.convex.dev/dashboard/deployments/logs),

[run server functions](https://docs.convex.dev/dashboard/deployments/functions), and more.

There are built-in features for

[reactive pagination](https://docs.convex.dev/database/pagination),

[file storage](https://docs.convex.dev/file-storage),

[reactive search](https://docs.convex.dev/text-search),

[https endpoints](https://docs.convex.dev/functions/http-actions) (for webhooks),

[streaming import/export](https://docs.convex.dev/database/import-export/), and

[runtime data validation](https://docs.convex.dev/database/schemas#validators) for

[function arguments](https://docs.convex.dev/functions/args-validation) and

[database data](https://docs.convex.dev/database/schemas#schema-validation).

Everything scales automatically, and it’s [free to start](https://www.convex.dev/plans).