{"id":31841578,"url":"https://github.com/dirvine/saorsa","last_synced_at":"2025-11-07T15:06:15.126Z","repository":{"id":266148326,"uuid":"897539170","full_name":"dirvine/saorsa","owner":"dirvine","description":"AI \u0026 Robotics Lab","archived":false,"fork":false,"pushed_at":"2025-09-02T14:22:42.000Z","size":14385,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-02T16:16:32.542Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"https://soarsa.vercel.app","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dirvine.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-12-02T20:01:19.000Z","updated_at":"2025-09-02T14:22:45.000Z","dependencies_parsed_at":"2024-12-03T13:47:48.439Z","dependency_job_id":"e4ca43bc-a629-4868-8594-c2eca2d8b4a0","html_url":"https://github.com/dirvine/saorsa","commit_stats":null,"previous_names":["dirvine/soarsa","dirvine/saorsa"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/dirvine/saorsa","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirvine%2Fsaorsa","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirvine%2Fsaorsa/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirvine%2Fsaorsa/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirvine%2Fsaorsa/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dirvine","download_url":"https://codeload.github.com/dirvine/saorsa/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dirvine%2Fsaorsa/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279010341,"owners_count":26084738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-12T02:00:06.719Z","response_time":53,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-12T05:21:09.692Z","updated_at":"2025-10-12T05:21:13.494Z","avatar_url":"https://github.com/dirvine.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Saorsa: Advanced Voice-Controlled SO-101 Robot Arms\n\nA sophisticated natural language interface for controlling Hugging Face's SO-101 robot arms, featuring voice recognition, AI-powered command interpretation, and computer vision integration. Optimized for macOS with Apple Silicon.\n\n## Overview\n\nSaorsa provides three levels of robot control sophistication:\n\n- **Phase 1**: Core voice control with basic movement commands\n- **Phase 2**: AI-enhanced natural language processing with context awareness  \n- **Phase 3**: Multimodal interface combining voice commands with computer vision\n\n## Features\n\n### Core Capabilities\n- 🎤 **Natural Language Control**: Speak commands in plain English\n- 🤖 **Multi-Robot Support**: Control single or dual SO-101 robot arms\n- 🛡️ **Safety First**: Built-in workspace limits and emergency controls\n- ⚡ **Mac M3 Optimized**: Leverages Apple Silicon for efficient processing\n- 🔧 **Modular Architecture**: Extensible for custom commands and behaviors\n\n### AI \u0026 Vision Features\n- 🧠 **Local AI Models**: Advanced command interpretation using Hugging Face transformers\n- 👁️ **Computer Vision**: Real-time object detection and spatial reasoning\n- 🎯 **Spatial References**: \"Pick up the red block on the left\"\n- 📝 **Context Awareness**: Remember objects and locations across commands\n- 🖼️ **Visual Feedback**: Real-time overlays showing detected objects and robot status\n\n## Quick Start\n\n### 1. System Requirements\n\n#### Hardware\n- **Computer**: macOS with Apple Silicon (M1, M2, M3, or later)\n- **Memory**: 16GB RAM minimum (32GB recommended for advanced AI features)\n- **Storage**: 25GB free space for models and data\n- **Robot**: SO-101 Robot Arms (1-2 units)\n- **Audio**: Built-in microphone or external USB microphone\n- **Camera**: Built-in camera, USB camera, or iPhone (via Continuity Camera)\n\n#### Software\n- **OS**: macOS 13.0+ (Ventura or later)\n- **Python**: 3.11 or higher\n- **Homebrew**: For system dependencies\n\n### 2. Installation\n\n```bash\n# Clone the repository\ngit clone https://github.com/dirvine/saorsa.git\ncd saorsa\n\n# Run the automated installation script\n./scripts/install_mac.sh\n\n# Download AI models (optional for basic usage)\npython scripts/download_models.py --basic\n\n# For full AI features, download all models\npython scripts/download_models.py --all\n```\n\n### 3. Robot Setup\n\n```bash\n# Find your robot's serial port\nls /dev/tty.usbserial-*\n\n# Test robot connection (replace with your actual port)\npython src/main_mac.py test-robot --port /dev/tty.usbserial-FT1234\n\n# Calibrate the robot if needed\npython scripts/calibrate_robot.py /dev/tty.usbserial-FT1234\n```\n\n### 4. Test Systems\n\n```bash\n# Test audio input and speech recognition\npython src/main_mac.py test-audio\n\n# Test camera system (for Phase 3 features)\npython src/main_mac.py test-camera\n\n# Check overall system status\npython src/main_mac.py status\n```\n\n### 5. Launch Voice Control\n\nChoose your operation mode:\n\n```bash\n# Basic voice control (Phase 1)\npython src/main_mac.py run -l /dev/tty.usbserial-FT1234\n\n# AI-enhanced processing (Phase 2)\npython src/main_mac.py run -l /dev/tty.usbserial-FT1234 --mode ai\n\n# Full multimodal with vision (Phase 3)\npython src/main_mac.py run -l /dev/tty.usbserial-FT1234 --mode multimodal\n\n# Dual robot setup\npython src/main_mac.py run -l /dev/tty.usbserial-FT1234 -f /dev/tty.usbserial-FT5678 --mode multimodal\n```\n\n## Voice Commands\n\n### Basic Commands (Phase 1)\n\n#### Movement\n- \"move left\" / \"move right\"\n- \"move forward\" / \"move back\" / \"move backward\"\n- \"move up\" / \"move down\"\n- \"turn left\" / \"turn right\"\n\n#### Gripper Control\n- \"open gripper\" / \"open\"\n- \"close gripper\" / \"close\" / \"grab\"\n- \"release\"\n\n#### Position Commands\n- \"home\" / \"home position\"\n- \"ready\" / \"ready position\"\n\n#### Control\n- \"stop\"\n- \"halt\" / \"emergency\" / \"emergency stop\" / \"freeze\"\n\n### AI-Enhanced Commands (Phase 2)\n\n#### Object Manipulation\n- \"pick up the red block\"\n- \"put it over there\"\n- \"move it to the left side\"\n- \"place it on the table\"\n\n#### Task-Level Commands\n- \"stack the blocks\"\n- \"organize the objects by color\"\n- \"arrange the items neatly\"\n- \"sort everything by size\"\n\n#### Context-Aware References\n- \"move that to the center\" (refers to previously mentioned object)\n- \"put this next to the blue cup\" (spatial relationships)\n- \"stack it on top of that\" (object references)\n\n### Multimodal Commands (Phase 3)\n\n#### Spatial References with Vision\n- \"pick up the object on the left\"\n- \"grab the largest item\"\n- \"move the cup next to the bottle\"\n- \"stack the blocks by size\"\n\n#### Visual Confirmation\n- \"show me what you can see\"\n- \"identify the objects on the table\"\n- \"point to the red object\"\n\n## AI Model Configuration\n\n### Model Selection\n\nThe system supports multiple AI models for different use cases:\n\n#### Language Models (Phase 2)\n\n**Lightweight Models (8GB+ RAM)**:\n```yaml\n# In configs/default.yaml\nai_models:\n  primary: \"HuggingFaceTB/SmolLM2-1.7B-Instruct\"  # Fast, efficient\n  fallback: \"microsoft/DialoGPT-medium\"           # Conversation\n```\n\n**Balanced Models (16GB+ RAM)**:\n```yaml\nai_models:\n  primary: \"Qwen/Qwen2.5-3B-Instruct\"            # Good balance\n  fallback: \"microsoft/Phi-3.5-mini-instruct\"     # Reliable\n```\n\n**High-Performance Models (32GB+ RAM)**:\n```yaml\nai_models:\n  primary: \"Qwen/Qwen2.5-14B-Instruct\"           # Best quality\n  fallback: \"meta-llama/Llama-3.2-3B-Instruct\"   # Backup\n```\n\n#### Vision Models (Phase 3)\n\n**Object Detection Models**:\n```yaml\nvision_models:\n  object_detection: \"facebook/detr-resnet-50\"     # Default, reliable\n  # alternatives:\n  # object_detection: \"facebook/detr-resnet-101\"  # Higher accuracy\n  # object_detection: \"microsoft/table-transformer-object-detection\"  # Specialized\n```\n\n**YOLO Models** (if ultralytics installed):\n```yaml\nvision_models:\n  object_detection: \"yolov8n\"  # Fastest\n  # object_detection: \"yolov8s\"  # Balanced\n  # object_detection: \"yolov8m\"  # More accurate\n  # object_detection: \"yolov8l\"  # Best accuracy\n```\n\n### Custom Model Configuration\n\nEdit `configs/mac_m3.yaml` for your specific setup:\n\n```yaml\n# Model optimization settings\nmodel_optimization:\n  enable_mps: true              # Use Metal Performance Shaders\n  enable_torch_compile: true    # PyTorch 2.0 optimizations\n  fp16_inference: true          # Half precision for speed\n  batch_size: 1                 # Adjust based on available memory\n\n# AI processing settings  \nai_processing:\n  confidence_threshold: 0.7     # Command confidence threshold\n  context_window: 10           # Number of previous commands to remember\n  enable_context_tracking: true # Track objects and locations\n  \n# Vision processing settings\nvision_processing:\n  detection_confidence: 0.6     # Object detection threshold\n  detection_fps: 10            # Frames per second for detection\n  max_detections: 100          # Maximum objects per frame\n  enable_tracking: true        # Track objects across frames\n```\n\n### Downloading Specific Models\n\n```bash\n# Download specific model sets\npython scripts/download_models.py --language-only      # Just language models\npython scripts/download_models.py --vision-only        # Just vision models\npython scripts/download_models.py --lightweight        # Lightweight models only\n\n# Download specific models\npython scripts/download_models.py --model \"Qwen/Qwen2.5-3B-Instruct\"\npython scripts/download_models.py --model \"facebook/detr-resnet-101\"\n\n# Check what's downloaded\npython scripts/download_models.py --check\npython scripts/download_models.py --list-available\n```\n\n## Demo and Testing\n\n### AI Capabilities Demo\n\n```bash\n# Test AI command processing\npython src/main_mac.py demo-ai\n\n# Test with specific models\npython src/main_mac.py demo-ai --model \"Qwen/Qwen2.5-3B-Instruct\"\n```\n\n### Vision System Demo\n\n```bash\n# Test computer vision capabilities\npython src/main_mac.py demo-vision\n\n# Test camera access and permissions\npython src/main_mac.py test-camera\n```\n\n### Integration Testing\n\n```bash\n# Test multimodal integration (voice + vision)\npython src/main_mac.py run --mode multimodal --demo\n\n# Run system diagnostics\npython src/main_mac.py status --verbose\n```\n\n## Advanced Configuration\n\n### Audio Settings\n\n```yaml\n# In configs/default.yaml\naudio:\n  sample_rate: 16000\n  chunk_size: 1024\n  whisper_model: \"base\"           # tiny, base, small, medium, large\n  vad_threshold: 0.5              # Voice activity detection\n  silence_timeout: 2.0            # Seconds of silence to end recording\n  phrase_timeout: 3.0             # Maximum phrase length\n```\n\n### Robot Control Settings\n\n```yaml\nrobot_control:\n  movement_speed: 50              # Joint movement speed (0-100)\n  acceleration: 30                # Movement acceleration\n  gripper_force: 50               # Gripper closing force\n  position_tolerance: 5           # Position accuracy in degrees\n  \nsafety:\n  workspace_bounds:               # Workspace limits in degrees\n    joint1: [-150, 150]\n    joint2: [-90, 90]\n    joint3: [-90, 90]\n    joint4: [-180, 180]\n    joint5: [-90, 90]\n    joint6: [-180, 180]\n  emergency_stop_on_error: true\n  max_temperature: 70             # Maximum motor temperature (°C)\n```\n\n### Vision System Settings\n\n```yaml\nvision:\n  camera:\n    resolution: [1280, 720]       # Camera resolution\n    fps: 30                       # Camera framerate\n    enable_continuity_camera: true # iPhone camera support\n    \n  detection:\n    confidence_threshold: 0.6     # Detection confidence\n    nms_threshold: 0.5           # Non-maximum suppression\n    max_detections: 100          # Max objects per frame\n    \n  display:\n    show_detections: true        # Show detection overlays\n    show_robot_status: true      # Show robot information\n    show_workspace_bounds: true  # Show workspace limits\n    overlay_alpha: 0.7          # Overlay transparency\n```\n\n## Troubleshooting\n\n### Installation Issues\n\n#### PyTorch Installation\n```bash\n# Ensure correct PyTorch for Apple Silicon\npip uninstall torch torchvision torchaudio\npip install torch torchvision torchaudio\n\n# Verify MPS availability\npython -c \"import torch; print('MPS available:', torch.backends.mps.is_available())\"\n```\n\n#### Model Download Issues\n```bash\n# Clear model cache and retry\nrm -rf ~/.cache/huggingface/\npython scripts/download_models.py --basic\n\n# Download with specific cache directory\nexport HF_HOME=/path/to/large/storage\npython scripts/download_models.py --all\n```\n\n### Runtime Issues\n\n#### Audio Problems\n```bash\n# Check microphone permissions\n# System Preferences \u003e Security \u0026 Privacy \u003e Microphone\n\n# Test audio devices\npython -c \"\nimport sounddevice as sd\nprint('Audio devices:', sd.query_devices())\n\"\n\n# Test Whisper directly\npython -c \"\nimport whisper\nmodel = whisper.load_model('base')\nprint('Whisper model loaded successfully')\n\"\n```\n\n#### Robot Connection Issues\n```bash\n# Check USB serial permissions\nls -la /dev/tty.usbserial-*\n\n# Test with different baud rates\npython scripts/test_robot_connection.py --port /dev/tty.usbserial-FT1234 --baud 1000000\n\n# Check motor health\npython scripts/diagnose_motors.py /dev/tty.usbserial-FT1234\n```\n\n#### Vision System Issues\n```bash\n# Check camera permissions\n# System Preferences \u003e Security \u0026 Privacy \u003e Camera\n\n# Test camera access\npython -c \"\nfrom src.mac_camera_handler import MacCameraHandler\nhandler = MacCameraHandler()\nprint('Camera access:', handler.test_camera_access())\n\"\n\n# Test object detection\npython src/object_detector.py  # Runs built-in demo\n```\n\n#### Performance Issues\n```bash\n# Monitor system resources\npython scripts/monitor_performance.py\n\n# Check model memory usage\npython -c \"\nimport torch\nprint(f'GPU memory: {torch.mps.current_allocated_memory()/1024**3:.2f} GB')\n\"\n\n# Optimize for lower memory usage\n# Edit configs/mac_m3.yaml:\n# model_optimization:\n#   fp16_inference: true\n#   low_memory_mode: true\n```\n\n### Common Error Messages\n\n#### \"MPS not available\"\n- Update to macOS 12.3+ and PyTorch 1.12+\n- Ensure you're using Apple Silicon Mac\n\n#### \"Model not found\"\n- Run `python scripts/download_models.py --check`\n- Download missing models with appropriate command\n\n#### \"Camera permission denied\"\n- Enable camera access in System Preferences \u003e Security \u0026 Privacy \u003e Camera\n\n#### \"Robot connection timeout\"\n- Check USB cable and robot power\n- Verify correct serial port with `ls /dev/tty.usbserial-*`\n- Try different baud rate in robot configuration\n\n## Development\n\n### Project Structure\n\n```\nsaorsa/\n├── src/                          # Main source code\n│   ├── main_mac.py              # Application entry point\n│   ├── mac_audio_handler.py     # Voice recognition\n│   ├── robot_controller_m3.py   # Robot control\n│   ├── ai_command_processor.py  # AI command interpretation\n│   ├── mac_camera_handler.py    # Camera integration\n│   ├── object_detector.py       # Computer vision\n│   ├── visual_feedback.py       # Visual overlays\n│   ├── multimodal_interface.py  # Voice + vision integration\n│   ├── context_manager.py       # Context awareness\n│   ├── model_manager.py         # AI model management\n│   ├── mps_optimizer.py         # Apple Silicon optimization\n│   └── utils/                   # Utility modules\n├── configs/                     # Configuration files\n├── scripts/                     # Setup and utility scripts\n├── models/                      # Downloaded AI models\n├── logs/                        # Application logs\n└── docs/                        # Documentation\n```\n\n### Running Tests\n\n```bash\n# Unit tests\npython -m pytest tests/\n\n# Component tests\npython scripts/test_audio.py\npython scripts/test_robot.py\npython scripts/test_vision.py\n\n# Integration tests\npython scripts/test_multimodal.py\n\n# Performance benchmarks\npython scripts/benchmark_models.py\n```\n\n### Adding Custom Commands\n\n1. **Basic Commands**: Edit `src/main_mac.py` in `CommandProcessor.basic_commands`\n\n2. **AI Commands**: Train/fine-tune models or add examples in `configs/ai_examples.yaml`\n\n3. **Multimodal Commands**: Extend `src/multimodal_interface.py` spatial reference resolution\n\n### Configuration Management\n\nAll configurations are in YAML format under `configs/`:\n\n- `default.yaml` - Base configuration for all systems\n- `mac_m3.yaml` - Apple Silicon optimizations  \n- `robot_configs.yaml` - Robot-specific settings\n- `voice_commands.yaml` - Voice command mappings\n- `ai_models.yaml` - AI model configurations\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Add tests for new functionality\n4. Ensure all tests pass\n5. Submit a pull request\n\n### Code Style\n\n- Follow PEP 8 formatting\n- Use type hints for all functions\n- Add docstrings for public methods\n- Include unit tests for new features\n\n## License\n\nApache 2.0 - See [LICENSE](LICENSE) for details.\n\n## Changelog\n\n### Version 1.3.0 (Current)\n- ✅ Phase 3: Computer vision and multimodal interface\n- ✅ Real-time object detection with local models\n- ✅ Spatial reference resolution (\"pick up the object on the left\")\n- ✅ Visual feedback with detection overlays\n- ✅ iPhone Continuity Camera support\n- ✅ Enhanced context management with visual data\n\n### Version 1.2.0\n- ✅ Phase 2: AI integration with local Hugging Face models\n- ✅ Context-aware command processing\n- ✅ Advanced natural language understanding\n- ✅ Mac M3 MPS optimization for AI models\n\n### Version 1.1.0\n- ✅ Phase 1: Core voice control and robot integration\n- ✅ OpenAI Whisper speech recognition\n- ✅ SO-101 robot arm control\n- ✅ Safety monitoring and emergency stops\n\n## Documentation\n\n- [Hardware Setup Guide](docs/hardware_setup.md)\n- [Software Installation Guide](docs/software_setup.md)\n- [Voice Commands Reference](docs/voice_commands.md)\n- [AI Model Guide](docs/ai_models.md)\n- [Vision System Guide](docs/vision_setup.md)\n- [API Documentation](docs/api_reference.md)\n- [Troubleshooting Guide](docs/troubleshooting.md)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdirvine%2Fsaorsa","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdirvine%2Fsaorsa","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdirvine%2Fsaorsa/lists"}