{"id":28958814,"url":"https://github.com/togethercomputer/open-data-scientist","last_synced_at":"2025-06-23T23:32:36.698Z","repository":{"id":299662170,"uuid":"996892761","full_name":"togethercomputer/open-data-scientist","owner":"togethercomputer","description":"Open AI data scientist agent that automates complex data analysis tasks using the ReAct framework. Execute Python code locally or in the cloud, upload datasets, and generate detailed analytical reports with minimal setup.","archived":false,"fork":false,"pushed_at":"2025-06-17T17:06:34.000Z","size":11390,"stargazers_count":48,"open_issues_count":0,"forks_count":7,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-06-17T17:36:07.819Z","etag":null,"topics":["agents","ai","data-science","llms"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/togethercomputer.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-05T16:13:09.000Z","updated_at":"2025-06-17T17:06:38.000Z","dependencies_parsed_at":"2025-06-17T17:49:26.110Z","dependency_job_id":null,"html_url":"https://github.com/togethercomputer/open-data-scientist","commit_stats":null,"previous_names":["togethercomputer/open-data-scientist"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/togethercomputer/open-data-scientist","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Fopen-data-scientist","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Fopen-data-scientist/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Fopen-data-scientist/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Fopen-data-scientist/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/togethercomputer","download_url":"https://codeload.github.com/togethercomputer/open-data-scientist/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/togethercomputer%2Fopen-data-scientist/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261575526,"owners_count":23179534,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agents","ai","data-science","llms"],"created_at":"2025-06-23T23:31:22.276Z","updated_at":"2025-06-23T23:32:36.693Z","avatar_url":"https://github.com/togethercomputer.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Together Open Data Scientist\n\nAn AI-powered data analysis assistant that follows the ReAct (Reasoning + Acting) framework to perform comprehensive data science tasks. The agent can execute Python code either locally via Docker or in the cloud using [Together Code Interpreter (TCI)](https://www.together.ai/code-interpreter).\n\n## ⚠️ Experimental Software Notice\n\n**This is an experimental tool** powered by large language models. Please be aware of the following limitations:\n\n- **AI-Generated Code**: All analysis and code is generated by AI and may contain errors, bugs, or suboptimal approaches\n- **No Guarantee of Accuracy**: Results should be carefully reviewed and validated before making important decisions\n- **Learning Tool**: Best suited for exploration, learning, and initial analysis rather than production use\n- **Human Oversight Required**: Always verify outputs, especially for critical business or research applications\n- **Evolving Technology**: Capabilities and reliability may vary as the underlying models are updated\n\n## 🚀 Quick Start\n\n### Install Together Open Data Scientist using PyPI\n   ```bash\n   pip install open-data-scientist\n   ```\n### Run Together Open Data Scientist using command line and TCI\n   ```bash\n   # export together api key\n   export TOGETHER_API_KEY=\"your-api-key-here\"\n\n   # run the agent\n   open-data-scientist --executor tci --write-report\n   ```\n\n## 📖 Example Output\n\nOur Open Data Scientist can perform comprehensive data analysis and generate detailed reports. Below is an example of a complete analysis report for molecular solubility prediction (see [the example](examples/solubility_prediction/)):\n\n### Report Example\n![Solubility Prediction Report](examples/solubility_prediction/screenshots/report_title.png)\n\n![Analysis Results](examples/solubility_prediction/screenshots/report_result.png)\n\n## 🤖 Install from Source\n\n### Prerequisites\n\n- Python 3.12 or higher\n- [uv](https://docs.astral.sh/uv/) - Fast Python package manager\n- Together AI API key (get one at [together.ai](https://together.ai))\n- Docker and Docker Compose (for local execution mode)\n\n### Installation\n\n####  Clone the repository:\n   ```bash\n   cd open-data-scientist\n   ```\n\n####  Install the package:\n   ```bash\n   # Install uv (faster alternative to pip)\n   curl -LsSf https://astral.sh/uv/install.sh | sh\n\n   # Create and activate virtual environment\n   uv venv --python=3.12\n   source .venv/bin/activate\n   uv pip install -e .\n   ```\n\n####  Set up your API key:\n   ```bash\n   export TOGETHER_API_KEY=\"your-api-key-here\"\n   ```\n\n#### (Optional, needed when using docker for code execution) Docker Mode Setup\n\n⚠️ Important: Docker mode has session isolation limitations and security considerations for local development. (1) Session isolation: While user variables are isolated between sessions, module modifications and global state changes affect all sessions. (2) Host directory access: The container has read-write access to specific host directories. (3)Best for: Single-user local development and data analysis workflows. For detailed technical information, security warnings, and setup instructions, see the [Interpreter README](interpreter/README.md).\n1. launch docker service:\n   ```bash\n   cd interpreter\n   docker-compose up --build -d\n   ```\n\n2. Stop services:\n   ```bash\n   docker-compose down\n   ```\n\n\n \n\n#### Usage\n\n1. Command Line Interface (CLI): The easiest way to get started is using the command line interface\n\n```bash\n# Basic usage with local Docker execution\nopen-data-scientist\n\n# Use cloud execution with TCI\nopen-data-scientist --executor tci\n\n# Specify a custom model and more iterations\nopen-data-scientist --model \"deepseek-ai/DeepSeek-V3\" --iterations 15\n\n# Use specific data directory\nopen-data-scientist --data-dir /path/to/your/data\n\n# Combine options\nopen-data-scientist --executor tci --model \"meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo\" --iterations 20 --data-dir ./my_data\n```\n\nCLI Options\n\n| Option | Short | Description | Default |\n|--------|-------|-------------|---------|\n| `--model` | `-m` | Language model to use | `deepseek-ai/DeepSeek-V3` |\n| `--iterations` | `-i` | Maximum reasoning iterations | `20` |\n| `--executor` | `-e` | Execution mode: `tci` or `internal` | `internal` |\n| `--data-dir` | `-d` | Data directory to upload | Current directory (with confirmation) |\n| `--session-id` | `-s` | Reuse existing session ID | Auto-generated |\n| `--help` | `-h` | Show help message | - |\n\n\n2. Python API: For programmatic usage, you can also use the Python API directly\n\n```python\nfrom open_data_scientist.codeagent import ReActDataScienceAgent\n\n# Cloud execution with TCI\nagent = ReActDataScienceAgent(\n    executor=\"tci\",\n    data_dir=\"path/to/your/data\",  # Optional: auto-upload files\n    max_iterations=10\n)\n\n# Local execution with Docker\nagent = ReActDataScienceAgent(\n    executor=\"internal\", \n    data_dir=\"path/to/your/data\",  # Optional: auto-upload files\n    max_iterations=10\n)\n\nresult = agent.run(\"Explore the uploaded CSV files and create summary statistics\")\n```\n\n## 🎯 Execution Modes\n\nThe ReAct agent supports two execution modes for running Python code:\n\n| Feature | TCI (Together Code Interpreter) | Docker/Internal |\n|---------|--------------------------------|-----------------|\n| **Execution Location** | ☁️ Cloud-based (Together AI) | 🏠 Local Docker container |\n| **Setup Required** | API key only | Docker + docker-compose |\n| **File Handling** | ☁️ Files uploaded to cloud | 🏠 Files stay local |\n| **Session Persistence** | ✅ Managed by Together | ✅ Local session management |\n| **Session Isolation** | ✅ Independent isolated sessions | ⚠️ Limited isolation (see below) |\n| **Concurrent Usage** | ✅ Multiple users/processes safely | ⚠️ File conflicts possible |\n| **Dependencies** | Pre-installed environment | Custom Docker environment |\n| **Plot Saving** | ✅ Can save created plots to disk | ❌ Plots not saved to disk |\n\n## ⚠️ Important Privacy Warning\n\n**TCI Mode**: Using TCI will upload your files to Together AI's cloud servers. Only use this mode if you're comfortable with your data being processed in the cloud.\n\n**Docker Mode**: All code execution and file processing happens locally in your Docker container. For detailed technical information, security warnings, and setup instructions, see the [Interpreter README](interpreter/README.md)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftogethercomputer%2Fopen-data-scientist","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftogethercomputer%2Fopen-data-scientist","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftogethercomputer%2Fopen-data-scientist/lists"}