Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/iboud0/scikit_learn_clone

Custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn
https://github.com/iboud0/scikit_learn_clone

Last synced: 7 days ago
JSON representation

Custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn

Awesome Lists containing this project

README

        

### README.md

# Scikit Learn Clone

Scikit Learn Clone is a custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn. This project is designed to provide a comprehensive set of tools for data preprocessing, model selection, evaluation, and supervised learning.

## Project Structure

The project is organized into several modules, each focusing on a specific aspect of machine learning. Below is the detailed structure:

```
scikit_learn_clone/
├── data/
├── ensemble_methods/
│ ├── __init__.py
│ ├── adaBoost.py
│ ├── bagging.py
│ ├── gradient_boosting.py
│ ├── randomForest.py
│ ├── stacking.py
├── metrics_model_evaluation/
│ ├── __init__.py
│ ├── accuracy.py
│ ├── confusion_matrix.py
│ ├── f1_score.py
│ ├── mean_absolute_error.py
│ ├── mean_squared_error.py
│ ├── precision.py
│ ├── r2_score.py
│ ├── recall.py
├── model_selection/
│ ├── __init__.py
│ ├── cross_validation.py
│ ├── grid_search.py
│ ├── kfold.py
│ ├── make_scorer.py
│ ├── param_grid.py
│ ├── train_test_split.py
├── preprocessing/
│ ├── __init__.py
│ ├── impute_missing_values_mean.py
│ ├── impute_missing_values_median.py
│ ├── impute_missing_values_mode.py
│ ├── labelencoder.py
│ ├── normalize_features.py
│ ├── numerical_categorical_variable.py
│ ├── onehotencoder.py
│ ├── outliers.py
│ ├── scale_features_min_max.py
│ ├── scale_features_standard.py
│ ├── select_features.py
├── supervised_learning/
│ ├── __init__.py
│ ├── DecisionTree.py
│ ├── knn.py
│ ├── LinearRegression.py
│ ├── LogisticRegression.py
│ ├── NaiveBayes.py
│ ├── NeuralNetwork.py
├── testing/
├── utilities/
│ ├── __init__.py
│ ├── Estimator.py
│ ├── MetaEstimator.py
│ ├── ModelSelector.py
│ ├── Pipeline.py
│ ├── Predictor.py
│ ├── Transformer.py
├── .gitignore
├── README.md
└── setup.py

```
## Installation

To install the package, use pip:

```bash
pip install sktlearn-clone
```

## Usage

Here are some examples of how to use the various modules in this package.

### Example: Decision Tree Classifier

```python
from scikit_learn_clone.supervised_learning.DecisionTree import DecisionTreeClassifier
from scikit_learn_clone.model_selection.train_test_split import train_test_split
from scikit_learn_clone.metrics_model_evaluation.accuracy import accuracy_score

# Sample dataset
X = [[0, 0], [1, 1]]
y = [0, 1]

# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)

# Initialize and train the classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

# Predict and evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```

### Example: K-Fold Cross-Validation

```python
from scikit_learn_clone.model_selection.kfold import KFold
from scikit_learn_clone.supervised_learning.LinearRegression import LinearRegression

# Sample dataset
X = [[i] for i in range(10)]
y = [2 * i for i in range(10)]

# Initialize KFold
kf = KFold(n_splits=5)

# Initialize model
model = LinearRegression()

# Perform K-Fold Cross-Validation
for train_index, test_index in kf.split(X):
X_train, X_test = [X[i] for i in train_index], [X[i] for i in test_index]
y_train, y_test = [y[i] for i in train_index], [y[i] for i in test_index]
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(f"Fold results: {predictions}")
```

## Features

### Ensemble Methods

- **AdaBoost**: Adaptive Boosting algorithm.
- **Bagging**: Bootstrap Aggregating algorithm.
- **Gradient Boosting**: Gradient Boosting algorithm.
- **Random Forest**: Ensemble of Decision Trees.
- **Stacking**: Stacked generalization.

### Metrics and Model Evaluation

- **Accuracy**: Classification accuracy.
- **Confusion Matrix**: Performance measurement for classification.
- **F1 Score**: Harmonic mean of precision and recall.
- **Mean Absolute Error**: Regression metric.
- **Mean Squared Error**: Regression metric.
- **Precision**: Classification precision.
- **R2 Score**: Coefficient of determination.
- **Recall**: Classification recall.

### Model Selection

- **Cross-Validation**: Split the dataset into k consecutive folds.
- **Grid Search**: Exhaustive search over specified parameter values.
- **K-Fold**: K-Fold cross-validation iterator.
- **Make Scorer**: Convert metrics into callables.
- **Param Grid**: Define the parameter grid for search.
- **Train-Test Split**: Split arrays or matrices into random train and test subsets.

### Preprocessing

- **Imputation**: Handle missing values.
- Mean, Median, Mode imputation.
- **Label Encoding**: Encode categorical features as an integer array.
- **Normalization**: Scale input vectors individually to unit norm.
- **One-Hot Encoding**: Encode categorical integer features as a one-hot numeric array.
- **Outlier Detection**: Identify and handle outliers in the data.
- **Feature Scaling**: Standardize features by removing the mean and scaling to unit variance.
- Min-Max scaling.
- **Feature Selection**: Select features based on importance or correlation.

### Supervised Learning

- **Decision Tree**: Decision Tree classifier.
- **k-Nearest Neighbors**: k-Nearest Neighbors algorithm.
- **Linear Regression**: Linear Regression algorithm.
- **Logistic Regression**: Logistic Regression algorithm.
- **Naive Bayes**: Naive Bayes algorithm.
- **Neural Network**: Neural Network algorithm.

## Contributing

Contributions are welcome! Please follow these steps:

1. Fork the repository.
2. Create a new branch (`git checkout -b feature/your-feature`).
3. Make your changes.
4. Commit your changes (`git commit -m 'Add some feature'`).
5. Push to the branch (`git push origin feature/your-feature`).
6. Open a pull request.

## License

This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
```

This revised version ensures the project structure is properly separated and formatted. This will improve readability and help users quickly find relevant information.