Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bhimrazy/litdata-with-minio
Use LitData with MinIO
https://github.com/bhimrazy/litdata-with-minio
data docker docker-compose litdata minio streaming
Last synced: about 3 hours ago
JSON representation
Use LitData with MinIO
- Host: GitHub
- URL: https://github.com/bhimrazy/litdata-with-minio
- Owner: bhimrazy
- License: mit
- Created: 2024-06-15T06:11:05.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-09-14T13:29:34.000Z (2 months ago)
- Last Synced: 2024-09-14T23:29:16.859Z (2 months ago)
- Topics: data, docker, docker-compose, litdata, minio, streaming
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Use LitData with MinIO
LitData empowers efficient data optimization and distributed training across cloud storage environments, supporting diverse data types like images, text, and video. Pairing seamlessly with MinIO—a high-performance, S3-compatible object store designed for large-scale AI/ML, data lakes, and databases—this integration exemplifies streamlined, scalable data handling for modern applications.
## Prerequisites
1. **Start MinIO Server**:
Start a MinIO server using Docker Compose with the provided configuration:```bash
docker-compose up -d
```Access MinIO at [http://localhost:9001](http://localhost:9001) with default credentials:
- Username: `MINIO_ROOT_USER`
- Password: `MINIO_ROOT_PASSWORD`2. **Install Required Packages**:
Install Python dependencies listed in `requirements.txt`:```bash
pip install -r requirements.txt
```## Usage
### Step 1: Setup AWS Configuration
You can configure AWS credentials for MinIO access either via environment variables or by creating a `~/.aws/{credentials,config}` file.
**Using Environment Variables:**
```bash
export AWS_ACCESS_KEY_ID=access_key
export AWS_SECRET_ACCESS_KEY=secret_key
export AWS_ENDPOINT_URL=http://localhost:9000
```**Using `~/.aws/{credentials,config}` File:**
```bash
mkdir -p ~/.aws && \
cat <> ~/.aws/credentials
[default]
aws_access_key_id = access_key
aws_secret_access_key = secret_key
EOLcat <> ~/.aws/config
[default]
region = us-east-1
output = json
endpoint_url = http://localhost:9000
EOL
```### Step 2: Prepare Data
Prepare your data using Python script `prepare_data.py`:
```bash
python prepare_data.py
```### Step 3: Upload Data to MinIO
Ensure the bucket exists or create it if necessary, then upload your data:
```bash
# Create the bucket if it does not exist
aws s3 mb s3://my-bucket# List buckets to verify
aws s3 ls# Upload data to the bucket
aws s3 cp --recursive my_optimized_dataset s3://my-bucket/my_optimized_dataset
```### Step 4: Use StreamingDataset
Utilize `streaming_dataset.py` to work with data as a streaming dataset:
```bash
python streaming_dataset.py
```## Conclusion
This example illustrates how to integrate litdata with MinIO for efficient data management. Similar approaches can be applied to other S3-compatible object stores.
## References
- [Litdata](https://github.com/Lightning-AI/litdata)
- [MinIO Docker Compose Configuration](https://github.com/minio/minio/blob/master/docs/orchestration/docker-compose/docker-compose.yaml)## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
## Authors
- [Bhimraj Yadav](https://github.com/bhimrazy)
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.