Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/bhimrazy/litdata-with-minio
Use LitData with MinIO
https://github.com/bhimrazy/litdata-with-minio
data docker docker-compose litdata minio streaming
Last synced: 3 months ago
JSON representation
Use LitData with MinIO
- Host: GitHub
- URL: https://github.com/bhimrazy/litdata-with-minio
- Owner: bhimrazy
- License: mit
- Created: 2024-06-15T06:11:05.000Z (8 months ago)
- Default Branch: main
- Last Pushed: 2024-09-14T13:29:34.000Z (5 months ago)
- Last Synced: 2024-09-14T23:29:16.859Z (5 months ago)
- Topics: data, docker, docker-compose, litdata, minio, streaming
- Language: Python
- Homepage:
- Size: 17.6 KB
- Stars: 6
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Use LitData with MinIO
![]()
LitData empowers efficient data optimization and distributed training across cloud storage environments, supporting diverse data types like images, text, and video. Pairing seamlessly with MinIO—a high-performance, S3-compatible object store designed for large-scale AI/ML, data lakes, and databases—this integration exemplifies streamlined, scalable data handling for modern applications.
## Prerequisites
1. **Start MinIO Server**:
Start a MinIO server using Docker Compose with the provided configuration:```bash
docker-compose up -d
```Access MinIO at [http://localhost:9001](http://localhost:9001) with default credentials:
- Username: `MINIO_ROOT_USER`
- Password: `MINIO_ROOT_PASSWORD`2. **Install Required Packages**:
Install Python dependencies listed in `requirements.txt`:```bash
pip install -r requirements.txt
```## Usage
### Step 1: Setup AWS Configuration
You can configure AWS credentials for MinIO access either via environment variables or by creating a `~/.aws/{credentials,config}` file.
**Using Environment Variables:**
```bash
export AWS_ACCESS_KEY_ID=access_key
export AWS_SECRET_ACCESS_KEY=secret_key
export AWS_ENDPOINT_URL=http://localhost:9000
```**Using `~/.aws/{credentials,config}` File:**
```bash
mkdir -p ~/.aws && \
cat <> ~/.aws/credentials
[default]
aws_access_key_id = access_key
aws_secret_access_key = secret_key
EOLcat <> ~/.aws/config
[default]
region = us-east-1
output = json
endpoint_url = http://localhost:9000
EOL
```### Step 2: Prepare Data
Prepare your data using Python script `prepare_data.py`:
```bash
python prepare_data.py
```### Step 3: Upload Data to MinIO
Ensure the bucket exists or create it if necessary, then upload your data:
```bash
# Create the bucket if it does not exist
aws s3 mb s3://my-bucket# List buckets to verify
aws s3 ls# Upload data to the bucket
aws s3 cp --recursive my_optimized_dataset s3://my-bucket/my_optimized_dataset
```### Step 4: Use StreamingDataset
Utilize `streaming_dataset.py` to work with data as a streaming dataset:
```bash
python streaming_dataset.py
```## Conclusion
This example illustrates how to integrate litdata with MinIO for efficient data management. Similar approaches can be applied to other S3-compatible object stores.
## References
- [Litdata](https://github.com/Lightning-AI/litdata)
- [MinIO Docker Compose Configuration](https://github.com/minio/minio/blob/master/docs/orchestration/docker-compose/docker-compose.yaml)## Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
## Authors
- [Bhimraj Yadav](https://github.com/bhimrazy)
## License
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.