An open API service indexing awesome lists of open source software.

https://github.com/umaarov/screensight-ai

A desktop tool to instantly capture any part of your screen and ask questions about it using Google Gemini's multimodal capabilities.
https://github.com/umaarov/screensight-ai

desktop-app gemini-api pillow pynput python screenshot-tool tkinter

Last synced: 10 months ago
JSON representation

A desktop tool to instantly capture any part of your screen and ask questions about it using Google Gemini's multimodal capabilities.

Awesome Lists containing this project

README

          

# ScreenSight AI

ScreenSight AI is a lightweight, powerful desktop application that allows you to capture any portion of your screen and get instant AI-powered analysis using Google's Gemini model. Simply press a hotkey, select a region, and ask your question!

![ScreenSight AI Demo](/assets/asset.gif)

---

## Features

- **Hotkey Activation**: Instantly trigger the capture mode with a global hotkey (**Ctrl + Shift + F9**).
- **Region Selection**: A simple, intuitive overlay to select exactly what you want to analyze.
- **Interactive Analysis**: An interactive window displays your screenshot and allows you to ask questions.
- **Powered by Gemini**: Leverages the powerful multimodal capabilities of Google's Gemini Vision model.
- **Lightweight & Cross-Platform**: Built with Python and Tkinter for minimal overhead.

---

## How It Works

1. **Run the App**: The application runs quietly in the background, listening for the hotkey.
2. **Press the Hotkey**: Press `Ctrl + Shift + F9` to bring up a semi-transparent overlay.
3. **Select a Region**: Click and drag your mouse to draw a box around the area of interest on your screen.
4. **Ask a Question**: Once you release the mouse, an analysis window appears with your screenshot. Type your question into the input box.
5. **Get Answers**: The AI analyzes the image and your question, providing a detailed response in the text area.

---

## Installation & Setup

Follow these steps to get ScreenSight AI running on your machine.

### 1. Prerequisites

- Python 3.8+
- A Google Gemini API Key. You can get one from [Google AI Studio](https://aistudio.google.com/app/apikey).

### 2. Clone the Repository

```bash
git clone https://github.com/umaarov/ScreenSightAI.git
cd ScreenSightAI
```

### 3. Set Up a Virtual Environment

It's highly recommended to use a virtual environment to manage dependencies.

```bash
# For Windows
python -m venv venv
venv\Scripts\activate

# For macOS & Linux
python3 -m venv venv
source venv/bin/activate
```

### 4. Install Dependencies

Install the required Python packages from the `requirements.txt` file.

```bash
pip install -r requirements.txt
```

### 5. Configure Your API Key

The application securely loads your Gemini API key from an environment file.

1. In the project's root directory, create a new file named `.env`.
2. Open the `.env` file and add your API key in the following format:

```
# .env file
GEMINI_API_KEY="YOUR_GEMINI_API_KEY_HERE"
```

**Important**: Never commit your `.env` file or hardcode your API key directly in the source code. The `.gitignore` file in this repository is already configured to prevent this file from being tracked by Git.

---

## Usage

Once the setup is complete, you can run the application from your terminal.

```bash
python main.py
```

You will see a confirmation message in your terminal: `ScreenSight AI is running. Press Ctrl + Shift + F9 to select a region.`

The app now runs in the background. You can press the hotkey combination at any time, in any application, to start a capture.

### Customization

You can easily change the hotkey combination and the UI styling by editing the values in the `src/config.py` file.

---

## Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".

1. **Fork** the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a **Pull Request**

---

## License

This project is distributed under the MIT License. See the `LICENSE` file for more information.