https://github.com/huggingface/screenenv

A powerful Python library for creating and managing isolated desktop environments using Docker containers.
https://github.com/huggingface/screenenv
Last synced: 3 months ago
JSON representation
A powerful Python library for creating and managing isolated desktop environments using Docker containers.
Host: GitHub
URL: https://github.com/huggingface/screenenv
Owner: huggingface
License: mit
Created: 2025-07-07T07:50:47.000Z (6 months ago)
Default Branch: main
Last Pushed: 2025-09-08T23:38:50.000Z (4 months ago)
Last Synced: 2025-09-25T13:44:33.966Z (4 months ago)
Language: Python
Homepage:
Size: 8.83 MB
Stars: 388
Watchers: 6
Forks: 40
Open Issues: 4
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # ScreenEnv

A powerful Python library for creating and managing isolated desktop environments using Docker containers. ScreenEnv provides a sandboxed Ubuntu desktop environment with XFCE4 that you can programmatically control for GUI automation, testing, and development.

## Features

- 🖥️ **Isolated Desktop Environment**: Full Ubuntu desktop with XFCE4 running in Docker

- 🎮 **GUI Automation**: Complete mouse and keyboard control

- 🌐 **Web Automation**: Built-in browser automation with Playwright

- 📹 **Screen Recording**: Capture video recordings of all actions

- 📸 **Screenshot Capabilities**: Desktop and browser screenshots

- 🖱️ **Mouse Control**: Click, drag, scroll, and mouse movement

- ⌨️ **Keyboard Input**: Text typing and key combinations

- 🪟 **Window Management**: Launch, activate, and close applications

- 📁 **File Operations**: Upload, download, and file management

- 🐚 **Terminal Access**: Execute commands and capture output

- 🤖 **MCP Server Support**: Model Context Protocol integration for AI/LLM automation

- 🐳 **Docker Ready**: Pre-built Docker image with all dependencies

## Quick Start

### Installation

1. **Clone the repository**:

   ```bash

   git clone https://github.com/huggingface/screenenv

   cd screenenv

   ```

2. **Install the package** (choose one):

   **latest release:**

   ```bash

   pip install screenenv

   # or

   uv pip install screenenv

   ```

   **from source:**

   ```bash

   pip install .

   # or

   uv sync

   ```

### Basic Usage

```python

from screenenv import Sandbox

# Create a sandbox environment

sandbox = Sandbox()

try:

    # Launch a terminal

    sandbox.launch("xfce4-terminal")

    # Type some text

    sandbox.write("echo 'Hello from ScreenEnv!'")

    sandbox.press("Enter")

    # Take a screenshot

    screenshot = sandbox.screenshot()

    with open("screenshot.png", "wb") as f:

        f.write(screenshot)

finally:

    # Clean up

    sandbox.close()

```

> For usage, see the source code in `examples/sandbox_demo.py`

## MCP Server Support

ScreenEnv includes full support for the Model Context Protocol (MCP), enabling seamless integration with AI/LLM systems for desktop automation.

### What is MCP?

The Model Context Protocol (MCP) is a standard for AI assistants to interact with external tools and data sources. ScreenEnv's MCP server provides desktop automation capabilities that can be used by any MCP-compatible AI system.

### MCP Server Features

- **30+ Automation Tools**: Complete desktop control via MCP

- **Streamable HTTP Transport**: Efficient communication protocol

### Starting the MCP Server

```python

from screenenv import MCPRemoteServer

# Start MCP server

server = MCPRemoteServer()

print(f"MCP Server URL: {server.server_url}")

print(f"Server Configuration: {server.mcp_server_json}")

```

### MCP Client Usage

```python

import asyncio

from mcp import ClientSession

from mcp.client.streamable_http import streamablehttp_client

from screenenv import MCPRemoteServer

async def mcp_automation():

    # Start MCP server

    server = MCPRemoteServer(headless=False)

    try:

        # Connect to MCP server

        async with streamablehttp_client(server.server_url) as (

            read_stream, write_stream, _

        ):

            async with ClientSession(read_stream, write_stream) as session:

                await session.initialize()

                # Launch terminal

                await session.call_tool("launch", {

                    "application": "xfce4-terminal",

                    "wait_for_window": True

                })

                # Type commands

                await session.call_tool("write", {"text": "echo 'Hello MCP!'"})

                await session.call_tool("press", {"key": ["Enter"]})

                # Take screenshot

                response = await session.call_tool("screenshot", {})

                screenshot_base64 = response.content[0].data

                screenshot_bytes = base64.b64decode(screenshot_base64)

                image = Image.open(io.BytesIO(screenshot_bytes))

                image.save("screenshot.png")

                ...

                print("MCP automation completed!")

    finally:

        server.close()

# Run the automation

asyncio.run(mcp_automation())

```

### Available MCP Tools

#### System Operations

- `execute_command` - Execute shell commands

- `get_platform` - Get system platform information

- `get_screen_size` - Get screen dimensions

- `get_desktop_path` - Get desktop directory path

- `get_directory_tree` - List directory contents

- `get_file` - Get file contents

- `download_file` - Download file from URL

- `start_recording` - Start screen recording

- `end_recording` - End screen recording

#### Application Management

- `wait` - Wait for specified milliseconds

- `open` - Open file or URL

- `launch` - Launch application

- `get_current_window_id` - Get current window ID

- `get_application_windows` - Get windows for application

- `get_window_name` - Get window name/title

- `get_window_size` - Get window size

- `activate_window` - Activate window

- `close_window` - Close window

- `get_terminal_output` - Get terminal output

#### GUI Automation

- `screenshot` - Take screenshot

- `left_click` - Left click at coordinates

- `double_click` - Double click at coordinates

- `right_click` - Right click at coordinates

- `middle_click` - Middle click at coordinates

- `scroll` - Scroll mouse wheel

- `move_mouse` - Move mouse to coordinates

- `mouse_press` - Press mouse button

- `mouse_release` - Release mouse button

- `get_cursor_position` - Get cursor position

- `write` - Type text

- `press` - Press keys

- `drag` - Drag mouse from one position to another

### MCP Server Configuration

```python

# Advanced MCP server configuration

server = MCPRemoteServer(

    os_type="Ubuntu",

    provider_type="docker",

    headless=True,

    resolution=(1920, 1080),

    disk_size="32G",

    ram_size="4G",

    cpu_cores="4",

    session_password="your_password",

    stream_server=True,

    dpi=96,

    timeout=1000

)

```

## Sandbox Instantiation

### Basic Configuration

```python

from screenenv import Sandbox

# Minimal configuration

sandbox = Sandbox()

# With custom settings

sandbox = Sandbox(

    os_type="Ubuntu",           # Currently only Ubuntu is supported

    provider_type="docker",     # Currently only Docker is supported

    headless=True,              # Run without VNC viewer

    resolution=(1920, 1080),

    disk_size="32G",

    ram_size="4G",

    cpu_cores="4",

    session_password="your_password",

    stream_server=True,

    dpi=96,

    timeout=1000

)

```

## Core Features

### Mouse Control

```python

# Click operations

sandbox.left_click(x=100, y=200)

sandbox.right_click(x=300, y=400)

sandbox.double_click(x=500, y=600)

# Mouse movement

sandbox.move_mouse(x=800, y=900)

# Drag and drop

sandbox.drag(fr=(100, 100), to=(200, 200))

# Scrolling

sandbox.scroll(direction="down", amount=3)

sandbox.mouse_release(button="left")

sandbox.mouse_press(button="left")

sandbox.mouse_release(button="left")

```

### Keyboard Input

```python

# Type text

sandbox.write("Hello, World!", delay_in_ms=50)

# Key combinations

sandbox.press(["Ctrl", "C"])  # Copy

sandbox.press(["Ctrl", "V"])  # Paste

sandbox.press(["Alt", "Tab"]) # Switch windows

sandbox.press("Enter")        # Single key

```

### Application Management

```python

# Launch applications

sandbox.launch("xfce4-terminal")

sandbox.launch("libreoffice --writer")

sandbox.open("https://www.google.com")

# Window management

windows = sandbox.get_application_windows("xfce4-terminal")

window_id = windows[0]

sandbox.activate_window(window_id)

window_id = sandbox.get_current_window_id() # get the current activate window id.

sandbox.window_size(window_id)

sandbox.get_window_title(window_id)

sandbox.close_window(window_id)

```

### File Operations

```python

# Upload files to sandbox

sandbox.upload_file_to_remote("local_file.txt", "/home/user/remote_file.txt")

# Download files from sandbox

sandbox.download_file_from_remote("/home/user/remote_file.txt", "local_file.txt")

# Download from URL

sandbox.download_url_file_to_remote("https://example.com/file.txt", "/home/user/file.txt")

```

### Screenshots and Recording

```python

# Start recording

sandbox.start_recording()

# Take screenshots

desktop_screenshot = sandbox.desktop_screenshot()

# Stop recording and save it locally to a file 'demo.mp4'

sandbox.end_recording("demo.mp4")

```

### Terminal Operations

```python

# Execute commands

response = sandbox.execute_command("ls -la")

print(response.output)

# Python commands

response = sandbox.execute_python_command("print('Hello')", ["os"])

print(response.output)

# Get terminal output

output = sandbox.get_terminal_output() # Only if a desktop terminal application is running. To get command output, use execute_command() instead.

```

## Examples

### Complete GUI Automation Demo

```python

from screenenv import Sandbox

import time

def demo_automation():

    sandbox = Sandbox(headless=False)

    try:

        # Launch terminal

        sandbox.launch("xfce4-terminal")

        time.sleep(2)

        # Type commands

        sandbox.write("echo 'Starting automation demo'")

        sandbox.press("Enter")

        # Open web browser

        sandbox.open("https://www.python.org")

        time.sleep(3)

        # Take screenshot

        screenshot = sandbox.screenshot()

        with open("demo_screenshot.png", "wb") as f:

            f.write(screenshot)

    finally:

        sandbox.close()

if __name__ == "__main__":

    demo_automation()

```

### Web Automation with Playwright

```python

from screenenv import Sandbox

def web_automation():

    sandbox = Sandbox(headless=True)

    try:

        # Open website

        sandbox.open("https://www.example.com")

        # Take browser screenshot

        screenshot = sandbox.playwright_screenshot(full_page=True)

        with open("web_screenshot.png", "wb") as f:

            f.write(screenshot)

        playwright_browser = sandbox.playwright_browser()

    finally:

        sandbox.close()

```

## MCP Server Demo

```bash

python -m examples.mcp_server_demo # or sudo -E python -m examples.mcp_server_demo if not in docker group

```

## Sandbox Demo

```bash

python -m examples.sandbox_demo # or sudo -E python -m examples.sandbox_demo if not in docker group

```

## Desktop Agent Demo

```bash

python -m examples.desktop_agent # or sudo -E python -m examples.desktop_agent if not in docker group

```

## System Requirements

- **Docker**: Must be installed and running

- **Python**: 3.10 or higher

- **Playwright**: For web automation features

- **Memory**: At least 4GB RAM recommended

## Sandbox Image

The sandbox uses a custom Ubuntu 22.04 Docker image with:

- XFCE4 desktop environment

- VNC server for remote access

- Google Chrome/Chromium browser

- LibreOffice suite

- Python development tools

- MCP server support
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/huggingface/screenenv

Awesome Lists containing this project

README