{"id":24251344,"url":"https://github.com/mdgrey33/pyvisionai","last_synced_at":"2025-10-23T02:42:18.804Z","repository":{"id":264936797,"uuid":"894697799","full_name":"MDGrey33/pyvisionai","owner":"MDGrey33","description":"The PyVisionAI Official Repo","archived":false,"fork":false,"pushed_at":"2025-03-06T13:00:46.000Z","size":10416,"stargazers_count":98,"open_issues_count":4,"forks_count":10,"subscribers_count":5,"default_branch":"main","last_synced_at":"2025-04-06T02:04:25.441Z","etag":null,"topics":["claude-3-5-sonnet","computer","llama","localllm","ocr","ollama","open-source","openai","python","vision","vision-models","vlm"],"latest_commit_sha":null,"homepage":"https://pyvisionai.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MDGrey33.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-11-26T20:28:11.000Z","updated_at":"2025-03-28T19:43:22.000Z","dependencies_parsed_at":"2025-02-22T22:32:05.920Z","dependency_job_id":null,"html_url":"https://github.com/MDGrey33/pyvisionai","commit_stats":null,"previous_names":["mdgrey33/file_extractor","mdgrey33/content-extractor-with-vision","mdgrey33/pyvisionai"],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MDGrey33%2Fpyvisionai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MDGrey33%2Fpyvisionai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MDGrey33%2Fpyvisionai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MDGrey33%2Fpyvisionai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MDGrey33","download_url":"https://codeload.github.com/MDGrey33/pyvisionai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247423513,"owners_count":20936626,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["claude-3-5-sonnet","computer","llama","localllm","ocr","ollama","open-source","openai","python","vision","vision-models","vlm"],"created_at":"2025-01-15T02:50:50.902Z","updated_at":"2025-10-23T02:42:18.797Z","avatar_url":"https://github.com/MDGrey33.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# PyVisionAI\n# Content Extractor and Image Description with Vision LLM\n\nExtract and describe content from documents using Vision Language Models.\n\n## Repository\n\nhttps://github.com/MDGrey33/pyvisionai\n\n## Requirements\n\n- Python 3.8 or higher\n- Operating system: Windows, macOS, or Linux\n- Disk space: At least 1GB free space (more if using local Llama model)\n\n## Features\n\n- Extract text and images from PDF, DOCX, PPTX, and HTML files\n- Capture interactive HTML pages as images with full rendering\n- Describe images using:\n  - Cloud-based models (OpenAI GPT-4 Vision, Anthropic Claude Vision)\n  - Local models (Ollama's Llama Vision)\n- Save extracted text and image descriptions in markdown format\n- Support for both CLI and library usage\n- Multiple extraction methods for different use cases\n- Detailed logging with timestamps for all operations\n- Customizable image description prompts\n\n## Installation\n\nFor macOS users, you can install using Homebrew:\n```bash\nbrew tap mdgrey33/pyvisionai\nbrew install pyvisionai\n```\nFor more details and configuration options, see the [Homebrew tap repository](https://github.com/roland/homebrew-pyvisionai).\n\n1. **Install System Dependencies**\n   ```bash\n   # macOS (using Homebrew)\n   brew install --cask libreoffice  # Required for DOCX/PPTX processing\n   brew install poppler             # Required for PDF processing\n   pip install playwright          # Required for HTML processing\n   playwright install              # Install browser dependencies\n\n   # Ubuntu/Debian\n   sudo apt-get update\n   sudo apt-get install -y libreoffice  # Required for DOCX/PPTX processing\n   sudo apt-get install -y poppler-utils # Required for PDF processing\n   pip install playwright               # Required for HTML processing\n   playwright install                   # Install browser dependencies\n\n   # Windows\n   # Download and install:\n   # - LibreOffice: https://www.libreoffice.org/download/download/\n   # - Poppler: http://blog.alivate.com.au/poppler-windows/\n   # Add poppler's bin directory to your system PATH\n   pip install playwright\n   playwright install\n   ```\n\n2. **Install PyVisionAI**\n   ```bash\n   # Using pip\n   pip install pyvisionai\n\n   # Using poetry (will automatically install playwright as a dependency)\n   poetry add pyvisionai\n   poetry run playwright install  # Install browser dependencies\n   ```\n\n## Directory Structure\n\nBy default, PyVisionAI uses the following directory structure:\n```\ncontent/\n├── source/      # Default input directory for files to process\n├── extracted/   # Default output directory for processed files\n└── log/         # Directory for log files and benchmarks\n```\n\nThese directories are created automatically when needed, but you can:\n1. Create them manually:\n   ```bash\n   mkdir -p content/source content/extracted content/log\n   ```\n2. Override them with custom paths:\n   ```bash\n   # Specify custom input and output directories\n   file-extract -t pdf -s /path/to/inputs -o /path/to/outputs\n\n   # Process a single file with custom output\n   file-extract -t pdf -s ~/documents/file.pdf -o ~/results\n   ```\n\nNote: While the default directories provide a organized structure, you're free to use any directory layout that suits your needs by specifying custom paths with the `-s` (source) and `-o` (output) options.\n\n## Setup for Image Description\n\nFor cloud image description (default, recommended):\n```bash\n# Set OpenAI API key (for GPT-4 Vision)\nexport OPENAI_API_KEY='your-openai-key'\n\n# Or set Anthropic API key (for Claude Vision)\nexport ANTHROPIC_API_KEY='your-anthropic-key'\n```\n\nFor local image description (optional):\n```bash\n# Install Ollama\n# macOS\nbrew install ollama\n\n# Linux\ncurl -fsSL https://ollama.com/install.sh | sh\n\n# Windows\n# Download from https://ollama.com/download/windows\n\n# Start Ollama server\nollama serve\n\n# Pull the required model\nollama pull llama3.2-vision\n\n# Verify installation\nollama list  # Should show llama3.2-vision\ncurl http://localhost:11434/api/tags  # Should return JSON response\n```\n\nNote: The local Llama model:\n- Runs entirely on your machine\n- No API key required\n- Requires about 8GB of disk space\n- Needs 16GB+ RAM for optimal performance\n- May be slower than cloud models but offers privacy\n\n## Features\n\n- Extract text and images from PDF, DOCX, PPTX, and HTML files\n- Capture interactive HTML pages as images with full rendering\n- Describe images using:\n  - Cloud-based models (OpenAI GPT-4 Vision, Anthropic Claude Vision)\n  - Local models (Ollama's Llama Vision)\n- Save extracted text and image descriptions in markdown format\n- Support for both CLI and library usage\n- Multiple extraction methods for different use cases\n- Detailed logging with timestamps for all operations\n\n## Usage\n\n### Command Line Interface\n\n1. **Extract Content from Files**\n   ```bash\n   # Process a single file (using default page-as-image method)\n   file-extract -t pdf -s path/to/file.pdf -o output_dir\n   file-extract -t docx -s path/to/file.docx -o output_dir\n   file-extract -t pptx -s path/to/file.pptx -o output_dir\n   file-extract -t html -s path/to/file.html -o output_dir\n\n   # Process with specific model\n   file-extract -t pdf -s input.pdf -o output_dir -m claude\n   file-extract -t pdf -s input.pdf -o output_dir -m gpt4\n   file-extract -t pdf -s input.pdf -o output_dir -m llama\n\n   # Process with specific extractor\n   file-extract -t pdf -s input.pdf -o output_dir -e text_and_images\n\n   # Process all files in a directory\n   file-extract -t pdf -s input_dir -o output_dir\n\n   # Example with custom prompt\n   file-extract -t pdf -s document.pdf -o output_dir -p \"Extract the exact text as present in the image and write one sentence about each visual in the image\"\n   ```\n\n   **Note:** The custom prompt for file extraction will affect the content of the output document. In case of page_as_image It should contain instructions to extract text and describe visuals. Variations are acceptable as long as they encompass these tasks. Avoid prompts like \"What's the color of this picture?\" as they may not yield the desired results.\n\n2. **Describe Images**\n   ```bash\n   # Using GPT-4 Vision (default)\n   describe-image -i path/to/image.jpg\n\n   # Using Claude Vision (with --model parameter)\n   describe-image -i path/to/image.jpg -m claude -k your-anthropic-key\n\n   # Using local Llama model (with --model parameter)\n   describe-image -i path/to/image.jpg -m llama\n\n   # Using custom prompt\n   describe-image -i image.jpg -p \"List the main colors in this image\"\n\n   # Using legacy --use-case parameter (deprecated, use --model instead)\n   describe-image -i path/to/image.jpg -u claude -k your-anthropic-key\n\n   # Additional options\n   describe-image -i image.jpg -v  # Verbose output\n   ```\n\n   **Note:** The `-u/--use-case` parameter is deprecated but maintained for backward compatibility. Please use `-m/--model` instead.\n\n### Library Usage\n\n```python\nfrom pyvisionai import (\n    create_extractor,\n    describe_image_openai,\n    describe_image_claude,\n    describe_image_ollama\n)\n\n# 1. Extract content from files\n# Using GPT-4 Vision (default)\nextractor = create_extractor(\"pdf\")\noutput_path = extractor.extract(\"input.pdf\", \"output_dir\")\n\n# Using Claude Vision\nextractor = create_extractor(\"pdf\", model=\"claude\")\noutput_path = extractor.extract(\"input.pdf\", \"output_dir\")\n\n# Using specific extraction method\nextractor = create_extractor(\"pdf\", extractor_type=\"text_and_images\")\noutput_path = extractor.extract(\"input.pdf\", \"output_dir\")\n\n# 2. Describe images\n# Using GPT-4 Vision\ndescription = describe_image_openai(\n    \"image.jpg\",\n    model=\"gpt-4o-mini\",  # default\n    api_key=\"your-openai-key\",  # optional if set in environment\n    max_tokens=300,  # default\n    prompt=\"Describe this image focusing on colors and textures\"  # optional\n)\n\n# Using Claude Vision\ndescription = describe_image_claude(\n    \"image.jpg\",\n    api_key=\"your-anthropic-key\",  # optional if set in environment\n    prompt=\"Describe this image focusing on colors and textures\"  # optional\n)\n\n# Using local Llama model\ndescription = describe_image_ollama(\n    \"image.jpg\",\n    model=\"llama3.2-vision\",  # default\n    prompt=\"List the main objects in this image\"  # optional\n)\n```\n\n## Logging\n\nThe application maintains detailed logs of all operations:\n- By default, logs are stored in `content/log/` with timestamp-based filenames\n- Each run creates a new log file: `pyvisionai_YYYYMMDD_HHMMSS.log`\n- Logs include:\n  - Timestamp for each operation\n  - Processing steps and their status\n  - Error messages and warnings\n  - Extraction method used\n  - Input and output file paths\n\n## Environment Variables\n\n```bash\n# Required for OpenAI Vision (if using GPT-4)\nexport OPENAI_API_KEY='your-openai-key'\n\n# Required for Claude Vision (if using Claude)\nexport ANTHROPIC_API_KEY='your-anthropic-key'\n\n# Optional: Ollama host (if using local description)\nexport OLLAMA_HOST='http://localhost:11434'\n```\n\n## Performance Optimization\n\n1. **Memory Management**\n   - Use `text_and_images` method for large documents\n   - Process files in smaller batches\n   - Monitor memory usage during batch processing\n   - Clean up temporary files regularly\n\n2. **Processing Speed**\n   - Cloud models (GPT-4, Claude) are generally faster than local models\n   - Use parallel processing for batch operations\n   - Consider SSD storage for better I/O performance\n   - Optimize image sizes before processing\n\n3. **API Usage**\n   - Implement proper rate limiting\n   - Use appropriate retry mechanisms\n   - Cache results when possible\n   - Monitor API quotas and usage\n\n## License\n\nThis project is licensed under the [Apache License 2.0](LICENSE).\n\n## Command Parameters\n\n### `file-extract` Command\n```bash\nfile-extract [-h] -t TYPE -s SOURCE -o OUTPUT [-e EXTRACTOR] [-m MODEL] [-k API_KEY] [-v]\n\nRequired Arguments:\n  -t, --type TYPE         File type to process (pdf, docx, pptx, html)\n  -s, --source SOURCE     Source file or directory path\n  -o, --output OUTPUT     Output directory path\n\nOptional Arguments:\n  -h, --help             Show help message and exit\n  -e, --extractor TYPE   Extraction method:\n                         - page_as_image: Convert pages to images (default)\n                         - text_and_images: Extract text and images separately\n                         - hybrid: ⚠️ EXPERIMENTAL - NOT RECOMMENDED (see HYBRID_METHOD_DECISION.md)\n                         Note: HTML only supports page_as_image\n  -m, --model MODEL      Vision model for image description:\n                         - gpt4: GPT-4 Vision (default)\n                         - claude: Claude Vision\n                         - llama: Local Llama model\n  -k, --api-key KEY      API key (required for GPT-4 and Claude)\n  -v, --verbose          Enable verbose logging\n  -p, --prompt TEXT      Custom prompt for image description\n```\n\n### `describe-image` Command\n```bash\ndescribe-image [-h] -s SOURCE [-m MODEL] [-k API_KEY] [-v] [-p PROMPT]\n\nRequired Arguments:\n  -s, --source SOURCE   Path to the image file to describe\n\nOptional Arguments:\n  -h, --help            Show help message and exit\n  -m, --model MODEL     Model to use for description:\n                        - gpt4: GPT-4 Vision (default)\n                        - claude: Claude Vision\n                        - llama: Local Llama model\n  -k, --api-key KEY     API key (required for GPT-4 and Claude)\n  -v, --verbose         Enable verbose logging\n  -p, --prompt TEXT     Custom prompt for image description\n\nNote: For backward compatibility, you can also use -i/--image instead of -s/--source.\n      The -u/--use-case parameter is deprecated. Please use -m/--model instead.\n```\n\n### File Extraction Examples\n```bash\n# Basic usage with defaults (page_as_image method, GPT-4 Vision)\nfile-extract -t pdf -s document.pdf -o output_dir\nfile-extract -t html -s webpage.html -o output_dir  # HTML always uses page_as_image\n\n# Specify extraction method (not applicable for HTML)\nfile-extract -t docx -s document.docx -o output_dir -e text_and_images\n\n# Use hybrid extraction (EXPERIMENTAL - NOT RECOMMENDED due to performance/accuracy issues)\n# file-extract -t pdf -s document.pdf -o output_dir -e hybrid\n\n# Use local Llama model for image description\nfile-extract -t pptx -s slides.pptx -o output_dir -m llama\n\n# Process all PDFs in a directory with verbose logging\nfile-extract -t pdf -s input_dir -o output_dir -v\n\n# Use custom OpenAI API key\nfile-extract -t pdf -s document.pdf -o output_dir -k \"your-api-key\"\n\n# Use custom prompt for image descriptions\nfile-extract -t pdf -s document.pdf -o output_dir -p \"Focus on text content and layout\"\n```\n\n### Image Description Examples\n```bash\n# Basic usage with defaults (GPT-4 Vision)\ndescribe-image -s photo.jpg\ndescribe-image -i photo.jpg  # Legacy parameter, still supported\n\n# Using specific models\ndescribe-image -s photo.jpg -m claude -k your-anthropic-key\ndescribe-image -s photo.jpg -m llama\ndescribe-image -i photo.jpg -m gpt4  # Legacy parameter style\n\n# Using custom prompt\ndescribe-image -s photo.jpg -p \"List the main colors and their proportions\"\n\n# Customize token limit\ndescribe-image -s photo.jpg -t 500\n\n# Enable verbose logging\ndescribe-image -s photo.jpg -v\n\n# Use custom OpenAI API key\ndescribe-image -s photo.jpg -k \"your-api-key\"\n\n# Combine options\ndescribe-image -s photo.jpg -m llama -p \"Describe the lighting and shadows\" -v\n```\n\n## Custom Prompts\n\nPyVisionAI supports custom prompts for both file extraction and image description. Custom prompts allow you to control how content is extracted and described.\n\n### Using Custom Prompts\n\n1. **CLI Usage**\n   ```bash\n   # File extraction with custom prompt\n   file-extract -t pdf -s document.pdf -o output_dir -p \"Extract all text verbatim and describe any diagrams or images in detail\"\n\n   # Image description with custom prompt\n   describe-image -i image.jpg -p \"List the main colors and describe the layout of elements\"\n   ```\n\n2. **Library Usage**\n   ```python\n   # File extraction with custom prompt\n   extractor = create_extractor(\n       \"pdf\",\n       extractor_type=\"page_as_image\",\n       prompt=\"Extract all text exactly as it appears and provide detailed descriptions of any charts or diagrams\"\n   )\n   output_path = extractor.extract(\"input.pdf\", \"output_dir\")\n\n   # Image description with custom prompt\n   description = describe_image_openai(\n       \"image.jpg\",\n       prompt=\"Focus on spatial relationships between objects and any text content\"\n   )\n   ```\n\n3. **Environment Variable**\n   ```bash\n   # Set default prompt via environment variable\n   export FILE_EXTRACTOR_PROMPT=\"Extract text and describe visual elements with emphasis on layout\"\n   ```\n\n### Writing Effective Prompts\n\n1. **For Page-as-Image Method**\n   - Include instructions for both text extraction and visual description since the entire page is processed as an image\n   - Example: \"Extract the exact text as it appears on the page and describe any images, diagrams, or visual elements in detail\"\n\n2. **For Text-and-Images Method**\n   - Focus only on image description since text is extracted separately\n   - The model only sees the images, not the text content\n   - Example: \"Describe the visual content, focusing on what the image represents and any visual elements it contains\"\n\n3. **For Image Description**\n   - Be specific about what aspects to focus on\n   - Example: \"Describe the main elements, their arrangement, and any text visible in the image\"\n\nNote: For page-as-image method, prompts must include both text extraction and visual description instructions as the entire page is processed as an image. For text-and-images method, prompts should focus solely on image description as text is handled separately.\n\n## Contributing\n\nWe welcome contributions to PyVisionAI! Whether you're fixing bugs, improving documentation, or proposing new features, your help is appreciated.\n\nPlease read our [Contributing Guidelines](CONTRIBUTING.md) for detailed information on:\n- Setting up your development environment\n- Code style and standards\n- Testing requirements\n- Pull request process\n- Documentation guidelines\n\n### Quick Start for Contributors\n\n1. Fork and clone the repository\n2. Install development dependencies:\n   ```bash\n   pip install poetry\n   poetry install\n   ```\n3. Install pre-commit hooks:\n   ```bash\n   poetry run pre-commit install\n   ```\n4. Make your changes\n5. Run tests:\n   ```bash\n   poetry run pytest\n   ```\n6. Submit a pull request\n\nFor more detailed instructions, see [CONTRIBUTING.md](CONTRIBUTING.md).\n\n## Docker Support\n\nPyVisionAI includes Docker support for easy deployment of the FastAPI server.\n\n### Building the Docker Image\n\n```bash\n# Build the Docker image\ndocker build -t pyvisionai-app .\n```\n\n### Running with Docker Compose (Recommended)\n\nThe easiest way to run PyVisionAI in Docker is using Docker Compose:\n\n```bash\n# Start the container\ndocker compose up -d\n\n# The API will be available at http://localhost:8001\n# (mapped from container port 8000 to host port 8001)\n\n# Stop the container\ndocker compose down\n```\n\nThe `docker-compose.yml` file automatically:\n- Maps port 8001 on your host to port 8000 in the container\n- Passes your `OPENAI_API_KEY` environment variable to the container\n- Names the container `pyvisionai-container`\n- Configures automatic restart unless stopped\n\n### Running with Docker (Manual)\n\nIf you prefer to use Docker directly:\n\n```bash\n# Run the container\ndocker run -d \\\n  -p 8001:8000 \\\n  -e OPENAI_API_KEY=\"${OPENAI_API_KEY}\" \\\n  --name pyvisionai-container \\\n  pyvisionai-app\n\n# Stop and remove the container\ndocker stop pyvisionai-container\ndocker rm pyvisionai-container\n```\n\n### Testing the Docker Container\n\nOnce running, test the API with:\n\n```bash\n# Test image description endpoint\ncurl -X POST \"http://127.0.0.1:8001/api/v1/describe/openai\" \\\n  -F \"file=@content/test/source/test.jpeg\" \\\n  -F \"model=gpt-4o\" \\\n  -F \"prompt=Describe this image in detail\"\n\n# View API documentation\nopen http://localhost:8001/docs\n```\n\n### Environment Variables\n\nThe Docker container supports the following environment variables:\n- `OPENAI_API_KEY`: Required for OpenAI GPT-4 Vision\n- `ANTHROPIC_API_KEY`: Required for Claude Vision\n- `OLLAMA_HOST`: Optional, for connecting to Ollama server\n\n### Notes\n\n- The container runs on port 8000 internally, mapped to 8001 on your host\n- Ensure your API keys are set in your environment before running\n- The image includes all necessary dependencies for processing PDFs, DOCX, PPTX, and HTML files\n- For production use, consider adding health checks and resource limits\n\n## Unified Server Management\n\nPyVisionAI provides a unified command to manage both the API server and MCP server:\n\n### Quick Start\n\n```bash\n# Start both servers (API on port 8001, MCP on port 8002)\n./run_servers.sh\n\n# Or using Python directly\npython scripts/run_pyvisionai.py\n```\n\n### Server Management Commands\n\n```bash\n# Start both servers (default)\n./run_servers.sh\n\n# Start only the API server\n./run_servers.sh --api\n\n# Start only the MCP server\n./run_servers.sh --mcp\n\n# Check server status\n./run_servers.sh --status\n\n# Stop all servers\n./run_servers.sh --stop\n\n# Build images and start servers\n./run_servers.sh --build\n\n# Use custom ports\n./run_servers.sh --api-port 8080 --mcp-port 8090\n```\n\n### What Each Server Provides\n\n1. **API Server (Port 8001)**\n   - REST API endpoints at `/api/v1/`\n   - Swagger UI at http://localhost:8001/docs\n   - Direct HTTP calls for image description and PDF extraction\n   - Suitable for programmatic access and web applications\n\n2. **MCP Server (Port 8002)**\n   - SSE endpoint at http://localhost:8002/sse\n   - Integration with Claude Desktop and Cursor\n   - Simplified tool interface for AI assistants\n   - Tools: `describe_image_with_openai`, `describe_image_with_ollama`, `describe_image_with_claude`, `extract_pdf_content`\n\n### MCP Configuration\n\nAfter starting the servers, add this to your Claude Desktop or Cursor configuration:\n\n```json\n{\n  \"mcpServers\": {\n    \"pyvisionai\": {\n      \"url\": \"http://localhost:8002/sse\"\n    }\n  }\n}\n```\n\n## MCP (Model Context Protocol) Support\n\nPyVisionAI can also run as an MCP server, exposing its image description capabilities as tools for AI assistants like Cursor.\n\n### What is MCP?\n\nMCP (Model Context Protocol) is a standard for exposing tools and resources to AI assistants. When running PyVisionAI as an MCP server, AI assistants can directly use the image description capabilities.\n\n### Running as MCP Server\n\n#### Quick Start with Docker\n\n```bash\n# Build and start the MCP server\ndocker compose -f docker-compose.mcp.yml up -d\n\n# The MCP server will be available at http://localhost:8002/sse\n```\n\n#### Configuring in Cursor\n\nAdd this to your `~/.cursor/mcp.json`:\n\n```json\n{\n  \"mcpServers\": {\n    \"pyvisionai\": {\n      \"url\": \"http://localhost:8002/sse\"\n    }\n  }\n}\n```\n\nThen restart Cursor to load the new MCP server.\n\n### Available MCP Tools\n\nThe MCP server exposes three tools:\n\n1. **describe_image_with_openai**\n   - Uses OpenAI GPT-4 Vision models\n   - Requires `OPENAI_API_KEY` environment variable\n   - Supports custom prompts and model selection\n\n2. **describe_image_with_ollama**\n   - Uses local Ollama vision models\n   - Requires Ollama running locally\n   - Good for privacy-sensitive applications\n\n3. **describe_image_with_claude**\n   - Uses Anthropic's Claude Vision\n   - Requires `ANTHROPIC_API_KEY` environment variable\n   - Excellent for detailed analysis\n\n### MCP Server Management\n\n```bash\n# View logs\ndocker compose -f docker-compose.mcp.yml logs\n\n# Stop the server\ndocker compose -f docker-compose.mcp.yml down\n\n# Restart the server\ndocker compose -f docker-compose.mcp.yml restart\n```\n\n### Using MCP Tools\n\nOnce configured, you can ask your AI assistant to describe images:\n- \"Use PyVisionAI to describe the image at /path/to/image.jpg\"\n- \"Analyze this screenshot using OpenAI vision\"\n- \"Describe this image using the local Ollama model\"\n\nThe tools accept both file paths and base64-encoded images.\n\n## Contributing\n\nWe welcome contributions to PyVisionAI! Whether you're fixing bugs, improving documentation, or proposing new features, your help is appreciated.\n\nPlease read our [Contributing Guidelines](CONTRIBUTING.md) for detailed information on:\n- Setting up your development environment\n- Code style and standards\n- Testing requirements\n- Pull request process\n- Documentation guidelines\n\n### Quick Start for Contributors\n\n1. Fork and clone the repository\n2. Install development dependencies:\n   ```bash\n   pip install poetry\n   poetry install\n   ```\n3. Install pre-commit hooks:\n   ```bash\n   poetry run pre-commit install\n   ```\n4. Make your changes\n5. Run tests:\n   ```bash\n   poetry run pytest\n   ```\n6. Submit a pull request\n\nFor more detailed instructions, see [CONTRIBUTING.md](CONTRIBUTING.md).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdgrey33%2Fpyvisionai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmdgrey33%2Fpyvisionai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmdgrey33%2Fpyvisionai/lists"}