https://github.com/itdxer/dslib
Useful tools for Data Scientist
https://github.com/itdxer/dslib
Last synced: about 1 month ago
JSON representation
Useful tools for Data Scientist
- Host: GitHub
- URL: https://github.com/itdxer/dslib
- Owner: itdxer
- License: mit
- Created: 2016-09-22T17:27:14.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2016-09-22T17:35:28.000Z (over 8 years ago)
- Last Synced: 2023-08-03T03:55:15.007Z (almost 2 years ago)
- Language: Python
- Size: 4.88 KB
- Stars: 4
- Watchers: 3
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Useful tools for Data Scientist
## Installation
```bash
$ pip install dslib
```## Logging
```python
>>> import time
>>> from dslib.logs import get_logger, logtime
>>>
>>> logger = get_logger()
>>> logger.info("Basic logging message")
[INFO :22/09/2016 20:32:20] Basic logging message
>>>
>>> with logtime("Logging sleep function"):
... time.sleep(5)
...
[INFO :22/09/2016 20:32:22] [start:001] Start Logging sleep function
[INFO :22/09/2016 20:32:27] [finish:001] Finish Logging sleep function (took 5.005 sec)
```## Checkpoints
```python
from sklearn import datasets, linear_model, preprocessing
from dslib.logs import get_logger
from dslib.checkpoint import Checkpointlogger = get_logger()
class ClassiyData(Checkpoint):
def step_1(self, outputs):
logger.info("Loading dataset")
iris_dataset = datasets.load_iris()logger.info("Applying standard scaler")
scaler = preprocessing.StandardScaler()
data = scaler.fit_transform(iris_dataset.data)return scaler, data, iris_dataset.target
def step_2(self, outputs):
_, data, target = outputs['step_1']logger.info("Training model")
logreg = linear_model.LogisticRegression()
logreg.fit(data, target)return logreg
if __name__ == '__main__':
logger.info("> Run classifier for the first time")
classify_data = ClassiyData(
name='classify-data',
checkpoint_folder='.checkpoint',
version=1
)
classify_data.run()logger.info("> Run classifier for the second time")
classify_data.run(start_from=2)logger.info("> Load outputs")
outputs = classify_data.load_outputs()
logger.info("> Found outputs for {} steps".format(len(outputs)))
```Output
```
[INFO :22/09/2016 20:33:42] > Run classifier for the first time
[INFO :22/09/2016 20:33:42] Checkpoint: #1
[INFO :22/09/2016 20:33:42] Loading dataset
[INFO :22/09/2016 20:33:42] Applying standard scaler
[INFO :22/09/2016 20:33:42] Saving checkpoint into file: classify-data-v1-step1.pkl
[INFO :22/09/2016 20:33:42] Checkpoint: #2
[INFO :22/09/2016 20:33:42] Training model
[INFO :22/09/2016 20:33:42] Saving checkpoint into file: classify-data-v1-step2.pkl
[INFO :22/09/2016 20:33:42] > Run classifier for the second time
[INFO :22/09/2016 20:33:42] Checkpoint: #1
[INFO :22/09/2016 20:33:42] Loading checkpoint from file: classify-data-v1-step1.pkl
[INFO :22/09/2016 20:33:42] Checkpoint: #2
[INFO :22/09/2016 20:33:42] Training model
[INFO :22/09/2016 20:33:42] Saving checkpoint into file: classify-data-v1-step2.pkl
[INFO :22/09/2016 20:33:42] > Load outputs
[INFO :22/09/2016 20:33:42] > Found outputs for 2 steps
```