https://github.com/birchkwok/spinesutils

A library that provides template code for Python development to shorten the project development cycle.
https://github.com/birchkwok/spinesutils

data-science machine-learning machine-learning-algorithms preprocessing-data

Last synced: 3 months ago
JSON representation

A library that provides template code for Python development to shorten the project development cycle.

Host: GitHub
URL: https://github.com/birchkwok/spinesutils
Owner: BirchKwok
License: apache-2.0
Created: 2023-07-20T02:28:00.000Z (almost 3 years ago)
Default Branch: main
Last Pushed: 2025-03-08T00:56:58.000Z (about 1 year ago)
Last Synced: 2025-09-25T10:44:54.377Z (7 months ago)
Topics: data-science, machine-learning, machine-learning-algorithms, preprocessing-data
Language: Python
Homepage:
Size: 209 KB
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          


  spinesUtils

  Accelerate your Python development workflow



## Overview

**spinesUtils** is a powerful library that provides ready-to-use features and utilities for Python development to shorten the project development cycle. Our goal is to help developers focus on solving their core problems instead of reimplementing common functionality.

## Features

- [x] **Logging functionality** - High-performance logging tools with zero learning curve

- [x] **Type checking and parameter validation** - Robust validation decorators

- [x] **CSV file reading acceleration** - Performance-optimized data loading

- [x] **Imbalanced data classifiers** - Specialized ML tools for imbalanced datasets

- [x] **Pandas DataFrame data compression** - Memory optimization for large datasets

- [x] **DataFrame insight tools** - Quick data analysis and visualization

- [x] **Large data train-test splitting** - Efficient data partitioning for ML pipelines

- [x] **Intuitive timer** - Feature-rich yet easy-to-use precision timer

This library is currently undergoing rapid iteration. If you encounter any issues with its functionalities, feel free to [raise an issue](https://github.com/BirchKwok/spinesUtils/issues).

## Installation

You can install spinesUtils from PyPI:

```bash

pip install spinesUtils

```

## Usage Examples

### Logger

The Logger class provides convenient logging without worrying about handler conflicts with the native Python logging module.

```python

# load spinesUtils module

from spinesUtils.logging import Logger # The alias is FastLogger

# create a logger instance, with name "MyLogger", and no file handler, the default level is "INFO"

# You can specify a file path `fp` during instantiation. If not specified, logs will not be written to a file.

logger = Logger(name="MyLogger", fp=None, level="DEBUG")

logger.log("This is an info log emitted by the log function.", level='INFO')

logger.debug("This is an debug message")

logger.info("This is an info message.")

logger.warning("This is an warning message.")

logger.error("This is an error message.")

logger.critical("This is an critical message.")

```

#### Performance Comparison

FastLogger vs Python's standard logging library (1 million messages, 20 threads):

| Metric | Standard logging | FastLogger | Improvement |

|--------|-----------------|------------|-------------|

| Total time (seconds) | 17.73 | 0.82 | 21.58x faster |

| Messages per second | 56,389 | 1,216,862 | 21.58x higher |

| Write speed (MB/s) | 6.94 | 14.04 | 2.02x faster |

| Average message size (bytes) | 129.00 | 12.10 | 10.66x smaller |

| Total log file size (MB) | 123.02 | 11.54 | 10.66x smaller |

*Test environment: MacBook Pro (Apple Silicon M1 Pro, 32GB RAM)*

### Type Checking and Parameter Validation

Ensure your functions receive the correct input types and values:

```python

from spinesUtils.asserts import *

# Check parameter type

@ParameterTypeAssert({

    'a': (int, float),

    'b': (int, float)

})

def add(a, b):

    return a + b

# Check parameter value

@ParameterValuesAssert({

    'a': lambda x: x > 0,

    'b': lambda x: x > 0

})

def divide(a, b):

    return a / b

# Generate function kwargs

params = generate_function_kwargs(add, a=1, b=2)

```

### CSV Reading Acceleration

Read large CSV files efficiently:

```python

from spinesUtils import read_csv

df = read_csv(

    fp='/path/to/your/file.csv',

    sep=',',  # equal to pandas read_csv.sep

    turbo_method='polars',  # use turbo_method to speed up load time

    chunk_size=None,  # it can be integer if you want to use pandas backend

    transform2low_mem=True,  # compresses file to save memory

    verbose=False

)

```

### Classifiers for Imbalanced Data

Handle imbalanced datasets effectively:

```python

from spinesUtils.models import MultiClassBalanceClassifier

from sklearn.ensemble import RandomForestClassifier

classifier = MultiClassBalanceClassifier(

    base_estimator=RandomForestClassifier(n_estimators=100),

    n_classes=3,

    random_state=0,

    verbose=0

)

# Fit and predict as you would with any scikit-learn estimator

classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

```

### DataFrame Data Compression

Optimize memory usage for large DataFrames:

```python

from spinesUtils import transform_dtypes_low_mem

# Compress a single DataFrame

transform_dtypes_low_mem(df, verbose=True, inplace=True)

# Batch compress multiple DataFrames

from spinesUtils import transform_batch_dtypes_low_mem

transform_batch_dtypes_low_mem([df1, df2, df3, df4], verbose=True, inplace=True)

```

### DataFrame Insight Tools

Quickly analyze your data:

```python

from spinesUtils import df_preview, classify_samples_dist

# Get comprehensive DataFrame insights

df_insight = df_preview(df)

```

### Data Splitting Utilities

Efficiently split large datasets:

```python

from spinesUtils import train_test_split_bigdata, train_test_split_bigdata_df

from spinesUtils.feature_tools import get_x_cols

# Return numpy arrays

X_train, X_valid, X_test, y_train, y_valid, y_test = train_test_split_bigdata(

    df=df, 

    x_cols=get_x_cols(df, y_col='target_column'),

    y_col='target_column', 

    shuffle=True,

    return_valid=True,

    train_size=0.8,

    valid_size=0.5

)

# Return pandas DataFrames

train_df, valid_df, test_df = train_test_split_bigdata_df(

    df=df, 

    x_cols=get_x_cols(df, y_col='target_column'),

    y_col='target_column', 

    shuffle=True,

    return_valid=True,

    train_size=0.8,

    valid_size=0.5

)

```

### Timer Utility

Time your code execution simply:

```python

from spinesUtils.timer import Timer

# As a context manager

with Timer().session() as t:

    # Your code here

    t.sleep(1)

    print(f"Step 1 time: {t.last_timestamp_diff():.2f}s")

    

    # Mark a middle point

    t.middle_point()

    

    # More code

    t.sleep(2)

    print(f"Step 2 time: {t.last_timestamp_diff():.2f}s")

    

print(f"Total time: {t.total_elapsed_time():.2f}s")

# Or use it manually

timer = Timer()

timer.start()

# Your code here

timer.end()

print(f"Elapsed: {timer.total_elapsed_time():.2f}s")

```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/birchkwok/spinesutils

Awesome Lists containing this project

README

spinesUtils