https://github.com/iboud0/scikit_learn_clone

Custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn
https://github.com/iboud0/scikit_learn_clone

Last synced: 4 months ago
JSON representation

Custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn

Host: GitHub
URL: https://github.com/iboud0/scikit_learn_clone
Owner: iboud0
Created: 2024-04-08T03:02:29.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2024-06-16T23:34:21.000Z (about 1 year ago)
Last Synced: 2025-02-09T08:16:04.634Z (5 months ago)
Language: Python
Homepage: https://pypi.org/project/sktlearn-clone/
Size: 601 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

### README.md

# Scikit Learn Clone

Scikit Learn Clone is a custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn. This project is designed to provide a comprehensive set of tools for data preprocessing, model selection, evaluation, and supervised learning.

## Project Structure

The project is organized into several modules, each focusing on a specific aspect of machine learning. Below is the detailed structure:

```
scikit_learn_clone/
├── data/
├── ensemble_methods/
│ ├── __init__.py
│ ├── adaBoost.py
│ ├── bagging.py
│ ├── gradient_boosting.py
│ ├── randomForest.py
│ ├── stacking.py
├── metrics_model_evaluation/
│ ├── __init__.py
│ ├── accuracy.py
│ ├── confusion_matrix.py
│ ├── f1_score.py
│ ├── mean_absolute_error.py
│ ├── mean_squared_error.py
│ ├── precision.py
│ ├── r2_score.py
│ ├── recall.py
├── model_selection/
│ ├── __init__.py
│ ├── cross_validation.py
│ ├── grid_search.py
│ ├── kfold.py
│ ├── make_scorer.py
│ ├── param_grid.py
│ ├── train_test_split.py
├── preprocessing/
│ ├── __init__.py
│ ├── impute_missing_values_mean.py
│ ├── impute_missing_values_median.py
│ ├── impute_missing_values_mode.py
│ ├── labelencoder.py
│ ├── normalize_features.py
│ ├── numerical_categorical_variable.py
│ ├── onehotencoder.py
│ ├── outliers.py
│ ├── scale_features_min_max.py
│ ├── scale_features_standard.py
│ ├── select_features.py
├── supervised_learning/
│ ├── __init__.py
│ ├── DecisionTree.py
│ ├── knn.py
│ ├── LinearRegression.py
│ ├── LogisticRegression.py
│ ├── NaiveBayes.py
│ ├── NeuralNetwork.py
├── testing/
├── utilities/
│ ├── __init__.py
│ ├── Estimator.py
│ ├── MetaEstimator.py
│ ├── ModelSelector.py
│ ├── Pipeline.py
│ ├── Predictor.py
│ ├── Transformer.py
├── .gitignore
├── README.md
└── setup.py

```
## Installation

To install the package, use pip:

```bash
pip install sktlearn-clone
```

## Usage

Here are some examples of how to use the various modules in this package.

### Example: Decision Tree Classifier

```python
from scikit_learn_clone.supervised_learning.DecisionTree import DecisionTreeClassifier
from scikit_learn_clone.model_selection.train_test_split import train_test_split
from scikit_learn_clone.metrics_model_evaluation.accuracy import accuracy_score

# Sample dataset
X = [[0, 0], [1, 1]]
y = [0, 1]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

# Initialize and train the classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```

### Example: K-Fold Cross-Validation

```python
from scikit_learn_clone.model_selection.kfold import KFold
from scikit_learn_clone.supervised_learning.LinearRegression import LinearRegression

# Sample dataset
X = [[i] for i in range(10)]
y = [2 * i for i in range(10)]

# Initialize KFold
kf = KFold(n_splits=5)

# Initialize model
model = LinearRegression()

# Perform K-Fold Cross-Validation
for train_index, test_index in kf.split(X):
X_train, X_test = [X[i] for i in train_index], [X[i] for i in test_index]
y_train, y_test = [y[i] for i in train_index], [y[i] for i in test_index]
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(f"Fold results: {predictions}")
```

## Features

### Ensemble Methods

- **AdaBoost**: Adaptive Boosting algorithm.
- **Bagging**: Bootstrap Aggregating algorithm.
- **Gradient Boosting**: Gradient Boosting algorithm.
- **Random Forest**: Ensemble of Decision Trees.
- **Stacking**: Stacked generalization.

### Metrics and Model Evaluation

- **Accuracy**: Classification accuracy.
- **Confusion Matrix**: Performance measurement for classification.
- **F1 Score**: Harmonic mean of precision and recall.
- **Mean Absolute Error**: Regression metric.
- **Mean Squared Error**: Regression metric.
- **Precision**: Classification precision.
- **R2 Score**: Coefficient of determination.
- **Recall**: Classification recall.

### Model Selection

- **Cross-Validation**: Split the dataset into k consecutive folds.
- **Grid Search**: Exhaustive search over specified parameter values.
- **K-Fold**: K-Fold cross-validation iterator.
- **Make Scorer**: Convert metrics into callables.
- **Param Grid**: Define the parameter grid for search.
- **Train-Test Split**: Split arrays or matrices into random train and test subsets.

### Preprocessing

- **Imputation**: Handle missing values.
- Mean, Median, Mode imputation.
- **Label Encoding**: Encode categorical features as an integer array.
- **Normalization**: Scale input vectors individually to unit norm.
- **One-Hot Encoding**: Encode categorical integer features as a one-hot numeric array.
- **Outlier Detection**: Identify and handle outliers in the data.
- **Feature Scaling**: Standardize features by removing the mean and scaling to unit variance.
- Min-Max scaling.
- **Feature Selection**: Select features based on importance or correlation.

### Supervised Learning

- **Decision Tree**: Decision Tree classifier.
- **k-Nearest Neighbors**: k-Nearest Neighbors algorithm.
- **Linear Regression**: Linear Regression algorithm.
- **Logistic Regression**: Logistic Regression algorithm.
- **Naive Bayes**: Naive Bayes algorithm.
- **Neural Network**: Neural Network algorithm.

## Contributing

Contributions are welcome! Please follow these steps:

1. Fork the repository.
2. Create a new branch (`git checkout -b feature/your-feature`).
3. Make your changes.
4. Commit your changes (`git commit -m 'Add some feature'`).
5. Push to the branch (`git push origin feature/your-feature`).
6. Open a pull request.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
```

This revised version ensures the project structure is properly separated and formatted. This will improve readability and help users quickly find relevant information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/iboud0/scikit_learn_clone

Awesome Lists containing this project

README