https://github.com/oeway/conda-env-executor
A python module for executing python script in a specified conda environment
https://github.com/oeway/conda-env-executor
Last synced: about 1 month ago
JSON representation
A python module for executing python script in a specified conda environment
- Host: GitHub
- URL: https://github.com/oeway/conda-env-executor
- Owner: oeway
- License: mit
- Created: 2025-02-24T09:09:05.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-07-15T20:31:11.000Z (11 months ago)
- Last Synced: 2025-07-16T18:40:33.046Z (11 months ago)
- Language: Python
- Size: 88.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Conda Environment Executor
A robust Python package for executing code in isolated conda environments with efficient data passing and multiple environment creation options.
## Features
- **Isolated Execution**: Run Python code in isolated conda environments
- **Multiple Environment Sources**: Support for conda-pack files, YAML specifications, dictionaries, and temporary environments
- **Efficient Data Passing**: Optimized data transfer between environments using JSON serialization with NumPy support
- **Environment Caching**: Automatic caching of environments for faster subsequent executions
- **Async Support**: Built-in support for async/await patterns through Hypha service integration
- **Job Management**: Submit, monitor, cancel, and retrieve results from jobs via job queue system
- **Type Safety**: Full type safety with proper error handling and result objects
- **Comprehensive Testing**: Extensive test coverage
## Installation
```bash
pip install https://github.com/oeway/conda-env-executor/archive/refs/heads/main.zip
```
## Quick Start
### 1. Temporary Environments (Simplest)
The easiest way to get started is creating temporary environments on-the-fly:
```python
from conda_env_executor import CondaEnvExecutor
# Create a temporary environment with specific packages
executor = CondaEnvExecutor.create_temp_env(
packages=['python=3.11', 'numpy', 'pandas'],
channels=['conda-forge']
)
# Define code to run - must include an 'execute' function
code = '''
import numpy as np
import pandas as pd
def execute(data):
df = pd.DataFrame(data)
return {
'mean': df['values'].mean(),
'shape': df.shape,
'description': df.describe().to_dict()
}
'''
# Execute with input data
input_data = {"values": [1, 2, 3, 4, 5]}
with executor:
result = executor.execute(code, input_data)
if result.success:
print(result.result)
else:
print(f"Error: {result.error}")
```
### 2. YAML Environment Specifications
For reproducible environments, use YAML specifications:
```yaml
# environment.yml
name: data-analysis
channels:
- conda-forge
- defaults
dependencies:
- python=3.11
- numpy>=1.20
- pandas>=1.3
- scikit-learn
- matplotlib
- pip
- pip:
- some-pip-package
```
```python
from conda_env_executor import CondaEnvExecutor
# Method 1: Direct path
executor = CondaEnvExecutor("environment.yml")
# Method 2: Using class method
executor = CondaEnvExecutor.from_yaml("environment.yml")
# Execute code
with executor:
result = executor.execute(code, input_data)
```
### 3. Dictionary Specifications
Define environments programmatically:
```python
from conda_env_executor import CondaEnvExecutor
# Define environment as dictionary
env_spec = {
"name": "ml-env",
"channels": ["conda-forge", "pytorch"],
"dependencies": [
"python=3.11",
"numpy",
"pandas",
"pytorch",
"scikit-learn",
{"pip": ["transformers", "datasets"]}
]
}
executor = CondaEnvExecutor(env_spec)
with executor:
result = executor.execute(code, input_data)
```
## Advanced Usage Patterns
### 4. Conda-Pack Files (Production Deployments)
For production environments or when you need to share exact environments across machines:
```python
from conda_env_executor import CondaEnvExecutor
# Use a pre-built conda-pack file
executor = CondaEnvExecutor("myenv.tar.gz")
with executor:
result = executor.execute(code, input_data)
```
To create conda-pack files:
```bash
# Create environment
conda create -n myenv python=3.11 numpy pandas scikit-learn
conda activate myenv
# Package the environment
conda install conda-pack
conda pack -n myenv -o myenv.tar.gz
```
### 5. Data Handling Patterns
The executor handles various data types automatically:
```python
import numpy as np
from conda_env_executor import CondaEnvExecutor
executor = CondaEnvExecutor.create_temp_env(['python=3.11', 'numpy'])
# NumPy arrays are automatically serialized/deserialized
data = np.random.rand(1000, 10)
code = """
import numpy as np
def execute(data):
# data is automatically converted back to numpy array
return {
'mean': float(data.mean()),
'shape': list(data.shape),
'std': float(data.std())
}
"""
with executor:
result = executor.execute(code, data)
print(result.result) # {'mean': 0.5, 'shape': [1000, 10], 'std': 0.29}
```
### 6. Complex Dependencies
Handle pip packages and mixed dependencies:
```python
# Complex environment with conda and pip packages
executor = CondaEnvExecutor.create_temp_env(
packages=[
'python=3.11',
'numpy',
'pandas',
'matplotlib',
{'pip': [
'transformers>=4.20.0',
'datasets',
'torch-audio'
]}
],
channels=['conda-forge', 'pytorch']
)
code = """
import pandas as pd
from transformers import pipeline
def execute(texts):
# Use Hugging Face transformers
classifier = pipeline("sentiment-analysis")
results = classifier(texts)
# Convert to pandas for analysis
df = pd.DataFrame(results)
return df.to_dict('records')
"""
with executor:
result = executor.execute(code, ["I love this!", "This is terrible"])
```
### 7. Environment Reuse and Caching
Environments are automatically cached for performance:
```python
# First execution - environment is created
executor1 = CondaEnvExecutor.create_temp_env(['python=3.11', 'numpy'])
with executor1:
result1 = executor1.execute(code, data) # Slow first time
# Second execution - environment is reused from cache
executor2 = CondaEnvExecutor.create_temp_env(['python=3.11', 'numpy'])
with executor2:
result2 = executor2.execute(code, data) # Fast subsequent times
```
### 8. Error Handling and Debugging
Comprehensive error handling with timing information:
```python
from conda_env_executor import CondaEnvExecutor
executor = CondaEnvExecutor.create_temp_env(['python=3.11'])
# Code with an error
bad_code = """
def execute(data):
return undefined_variable # This will cause an error
"""
with executor:
result = executor.execute(bad_code, {"test": "data"})
if result.success:
print(f"Result: {result.result}")
else:
print(f"Execution failed: {result.error}")
print(f"Stdout: {result.stdout}")
print(f"Stderr: {result.stderr}")
# Timing information
if result.timing:
print(f"Environment setup: {result.timing.env_setup_time:.2f}s")
print(f"Code execution: {result.timing.execution_time:.2f}s")
print(f"Total time: {result.timing.total_time:.2f}s")
```
### 9. Multiple Executions with Same Environment
Reuse the same environment for multiple code executions:
```python
executor = CondaEnvExecutor.create_temp_env(['python=3.11', 'numpy', 'pandas'])
# Execute multiple different pieces of code
codes = [
"def execute(data): import numpy as np; return np.mean(data)",
"def execute(data): import pandas as pd; return pd.Series(data).describe().to_dict()",
"def execute(data): return {'sum': sum(data), 'len': len(data)}"
]
data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
with executor: # Environment is set up once
for i, code in enumerate(codes):
result = executor.execute(code, data)
print(f"Result {i+1}: {result.result}")
```
## Complete API Reference
### Core Classes
#### CondaEnvExecutor
Main class for executing code in conda environments.
**Constructor Options:**
```python
# From conda-pack file
CondaEnvExecutor("path/to/env.tar.gz")
# From YAML file
CondaEnvExecutor("environment.yml")
# From dictionary
CondaEnvExecutor({
"name": "myenv",
"channels": ["conda-forge"],
"dependencies": ["python=3.11", "numpy"]
})
# From EnvSpec object
CondaEnvExecutor(EnvSpec(...))
```
**Class Methods:**
```python
# Create temporary environment
CondaEnvExecutor.create_temp_env(
packages=["python=3.11", "numpy"],
channels=["conda-forge"]
)
# Create from YAML file
CondaEnvExecutor.from_yaml("environment.yml")
```
**Instance Methods:**
```python
# Execute code
result = executor.execute(code, input_data=None)
# Manual cleanup
executor.cleanup()
# Context manager (automatic cleanup)
with executor:
result = executor.execute(code, input_data)
```
#### ExecutionResult
Container for execution results:
```python
@dataclass
class ExecutionResult:
success: bool # Whether execution succeeded
result: Optional[Any] = None # The returned result
error: Optional[str] = None # Error message if failed
stdout: Optional[str] = None # Standard output
stderr: Optional[str] = None # Standard error
timing: Optional[TimingInfo] = None # Timing information
```
#### TimingInfo
Timing information for execution:
```python
@dataclass
class TimingInfo:
env_setup_time: float # Time to set up environment
execution_time: float # Time to execute code
total_time: float # Total execution time
```
### Dependency Specification Formats
#### List Format
```python
packages = [
"python=3.11", # Specific version
"numpy>=1.20", # Version constraint
"pandas", # Latest version
{"pip": ["requests"]}, # Pip packages
{"pip": [ # Multiple pip packages
"transformers>=4.20.0",
"datasets"
]}
]
```
#### Dictionary Format (environment.yml style)
```python
env_spec = {
"name": "myenv",
"channels": ["conda-forge", "pytorch"],
"dependencies": [
"python=3.11",
"numpy",
{"pip": ["requests", "beautifulsoup4"]}
]
}
```
### Code Requirements
Your code must define an `execute` function:
```python
def execute(input_data):
"""
This function will be called by the executor.
Args:
input_data: The data passed to executor.execute()
Can be None if no input_data provided
Returns:
Any JSON-serializable object
"""
# Your code here
return result
```
## Job Queue System (Hypha Service)
For async execution and job management, use the Hypha service:
### Starting the Service
```bash
python -m conda_env_executor.hypha_service \
--workspace YOUR_WORKSPACE \
--server-url https://hypha.aicell.io \
--token YOUR_TOKEN
```
### Using the Service
```python
import asyncio
from hypha_rpc import connect_to_server
async def main():
server = await connect_to_server({
"server_url": "https://hypha.aicell.io",
"token": "your_token",
"workspace": "your_workspace"
})
service = await server.get_service("conda-executor-service-id")
# Submit a job
result = await service.submit_job(
code="def execute(data): return data * 2",
input_data=21,
dependencies=["python=3.11", "numpy"]
)
job_id = result["job_id"]
# Wait for completion
final_result = await service.wait_for_result(job_id)
print(final_result) # 42
asyncio.run(main())
```
## Development
### Setup
```bash
# Clone the repository
git clone https://github.com/yourusername/conda-env-executor.git
cd conda-env-executor
# Create virtual environment
python -m venv venv
source venv/bin/activate # or `venv\Scripts\activate` on Windows
# Install development dependencies
pip install -e ".[dev,test]"
```
### Running Tests
```bash
# Run all tests
pytest
# Run with coverage
pytest --cov=conda_env_executor
# Run specific test file
pytest tests/test_executor.py -v
```
### Code Quality
```bash
# Format code
black conda_env_executor tests
# Run linter
ruff check .
# Run type checker
mypy conda_env_executor
```
## Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Job Queue Examples
Example client usage for job management:
```bash
# Submit a job
python examples/job_queue_client.py --workspace YOUR_WORKSPACE --service-id SERVICE_ID \
--code-file examples/sample_job.py \
--input-data examples/sample_input.json \
--dependencies "python=3.11,numpy,pandas,matplotlib"
# List jobs
python examples/job_queue_client.py --workspace YOUR_WORKSPACE --service-id SERVICE_ID --list-jobs
# Check job status and wait for completion
python examples/job_queue_client.py --workspace YOUR_WORKSPACE --service-id SERVICE_ID \
--job-id JOB_ID --wait
```
## Conda-Pack Tutorial
[Conda-pack](https://conda.github.io/conda-pack/) is a tool for creating relocatable conda environments. This is useful for deploying code in an isolated environment, copying environments to a different location or machine, or for archiving environments.
### 1. Install conda-pack
First, you need to install conda-pack:
```bash
# Install in your base environment
pip install conda-pack
# Or install in a specific environment
conda install -c conda-forge conda-pack
```
### 2. Create and Set Up Your Environment
Create a conda environment with the packages you need:
```bash
# Create a new environment
conda create -n myenv python=3.11 numpy pandas scikit-learn matplotlib
# Activate the environment
conda activate myenv
# Install any additional packages
pip install some-package
```
### 3. Package Your Environment
Once your environment is set up with all required packages, use conda-pack to create a portable archive:
```bash
# Basic packaging (from outside the environment)
conda pack -n myenv -o myenv.tar.gz
# Or if you're inside the environment
conda pack -o myenv.tar.gz
# For more verbose output
conda pack -n myenv -o myenv.tar.gz --verbose
```
### 4. Using the Packed Environment
Use the packed environment with conda-env-executor:
```python
from conda_env_executor import CondaEnvExecutor
# Create an executor using the packed environment
executor = CondaEnvExecutor("myenv.tar.gz")
with executor:
result = executor.execute(code, input_data)
```
### 5. Troubleshooting Conda Packs
If you encounter issues with your packed environment:
- Make sure all dependencies are properly installed in the original environment
- Try packaging with `--ignore-editable` if you have editable packages
- Use `--ignore-missing-files` if there are path conflicts
- For compatibility across different systems, pack from a similar OS/architecture as the target system
## Requirements
- Python >=3.10
- pyyaml >=6.0
- psutil >=5.9.0
- conda-pack >=0.7.0
## Acknowledgments
This project incorporates ideas and code from:
- [conda-execute](https://github.com/conda-tools/conda-execute) (BSD 3-Clause License)
- [conda-pack](https://github.com/conda/conda-pack) (BSD 3-Clause License)