https://github.com/malondaclement/datalake
DataLake project 💾
https://github.com/malondaclement/datalake
datalake mysql python3
Last synced: 3 months ago
JSON representation
DataLake project 💾
- Host: GitHub
- URL: https://github.com/malondaclement/datalake
- Owner: MalondaClement
- License: mit
- Created: 2022-05-27T14:09:56.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-06T13:35:28.000Z (about 3 years ago)
- Last Synced: 2025-10-13T22:09:27.518Z (3 months ago)
- Topics: datalake, mysql, python3
- Language: Python
- Homepage: https://github.com/MalondaClement/DataLake/wiki
- Size: 72.3 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# DataLake
 
-----
## 1. Database schema

**Fig. 1** - _Database Schema_
## 2. How to use this project
### 2.1 Init the database
```
python3 main.py init
```
### 2.2 Insert data in the database
```
python3 main.py insert
```
#### 2.2.1 Insert classification dataset
Dataset tree:
* root
* images
* label_1
* image1.jpg
* image2.jpg
* ...
* label_2
* image1.jpg
* image2.jpg
* ...
* label_3
* ...
* labels.csv
labels.csv columns name `image`, `label`
#### 2.2.2 Insert detection dataset
##### 2.2.2.a XML format
Dataset tree:
* root
* images
* image1.jpg
* image2.jpg
* ...
* labels
* label1.xlm
* label2.xlm
* ...
```xml
000005.jpg
500
375
3
chair
263
211
324
339
chair
165
264
253
372
chair
5
244
67
374
chair
241
194
295
299
chair
277
186
312
220
```
##### 2.2.2.b CSV format
Dataset tree:
* root
* images
* image1.jpg
* image2.jpg
* ...
* labels.csv
labels.csv columns name `image`, `label`, `xmin`, `ymin`, `xmax`, `ymax`
### 2.3 Create a new dataset from the data in the database
```
python3 main.py create
```
The description of the new dataset is a json file :
```json
{
"type": "[classif|detection|segmentation]",
"path": "/path/to/the/root/directory",
"classes": {
"label_1": ["other_label_1", "other_label_1"],
"label_2": [],
"label_3": ["other_label_3"]
}
}
```
### 2.4 List label names or datasets
```
python3 main.py list
```
### 2.5 Clear all the database
```
python3 main.py clear
```