https://github.com/umaarov/screensight-ai
A desktop tool to instantly capture any part of your screen and ask questions about it using Google Gemini's multimodal capabilities.
https://github.com/umaarov/screensight-ai
desktop-app gemini-api pillow pynput python screenshot-tool tkinter
Last synced: 10 months ago
JSON representation
A desktop tool to instantly capture any part of your screen and ask questions about it using Google Gemini's multimodal capabilities.
- Host: GitHub
- URL: https://github.com/umaarov/screensight-ai
- Owner: umaarov
- License: mit
- Created: 2025-07-14T12:15:26.000Z (12 months ago)
- Default Branch: master
- Last Pushed: 2025-07-14T12:52:04.000Z (12 months ago)
- Last Synced: 2025-09-03T02:40:03.742Z (10 months ago)
- Topics: desktop-app, gemini-api, pillow, pynput, python, screenshot-tool, tkinter
- Language: Python
- Homepage:
- Size: 24.1 MB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# ScreenSight AI
ScreenSight AI is a lightweight, powerful desktop application that allows you to capture any portion of your screen and get instant AI-powered analysis using Google's Gemini model. Simply press a hotkey, select a region, and ask your question!

---
## Features
- **Hotkey Activation**: Instantly trigger the capture mode with a global hotkey (**Ctrl + Shift + F9**).
- **Region Selection**: A simple, intuitive overlay to select exactly what you want to analyze.
- **Interactive Analysis**: An interactive window displays your screenshot and allows you to ask questions.
- **Powered by Gemini**: Leverages the powerful multimodal capabilities of Google's Gemini Vision model.
- **Lightweight & Cross-Platform**: Built with Python and Tkinter for minimal overhead.
---
## How It Works
1. **Run the App**: The application runs quietly in the background, listening for the hotkey.
2. **Press the Hotkey**: Press `Ctrl + Shift + F9` to bring up a semi-transparent overlay.
3. **Select a Region**: Click and drag your mouse to draw a box around the area of interest on your screen.
4. **Ask a Question**: Once you release the mouse, an analysis window appears with your screenshot. Type your question into the input box.
5. **Get Answers**: The AI analyzes the image and your question, providing a detailed response in the text area.
---
## Installation & Setup
Follow these steps to get ScreenSight AI running on your machine.
### 1. Prerequisites
- Python 3.8+
- A Google Gemini API Key. You can get one from [Google AI Studio](https://aistudio.google.com/app/apikey).
### 2. Clone the Repository
```bash
git clone https://github.com/umaarov/ScreenSightAI.git
cd ScreenSightAI
```
### 3. Set Up a Virtual Environment
It's highly recommended to use a virtual environment to manage dependencies.
```bash
# For Windows
python -m venv venv
venv\Scripts\activate
# For macOS & Linux
python3 -m venv venv
source venv/bin/activate
```
### 4. Install Dependencies
Install the required Python packages from the `requirements.txt` file.
```bash
pip install -r requirements.txt
```
### 5. Configure Your API Key
The application securely loads your Gemini API key from an environment file.
1. In the project's root directory, create a new file named `.env`.
2. Open the `.env` file and add your API key in the following format:
```
# .env file
GEMINI_API_KEY="YOUR_GEMINI_API_KEY_HERE"
```
**Important**: Never commit your `.env` file or hardcode your API key directly in the source code. The `.gitignore` file in this repository is already configured to prevent this file from being tracked by Git.
---
## Usage
Once the setup is complete, you can run the application from your terminal.
```bash
python main.py
```
You will see a confirmation message in your terminal: `ScreenSight AI is running. Press Ctrl + Shift + F9 to select a region.`
The app now runs in the background. You can press the hotkey combination at any time, in any application, to start a capture.
### Customization
You can easily change the hotkey combination and the UI styling by editing the values in the `src/config.py` file.
---
## Contributing
Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement".
1. **Fork** the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingFeature`)
3. Commit your Changes (`git commit -m 'Add some AmazingFeature'`)
4. Push to the Branch (`git push origin feature/AmazingFeature`)
5. Open a **Pull Request**
---
## License
This project is distributed under the MIT License. See the `LICENSE` file for more information.