Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/iboud0/scikit_learn_clone
Custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn
https://github.com/iboud0/scikit_learn_clone
Last synced: 7 days ago
JSON representation
Custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn
- Host: GitHub
- URL: https://github.com/iboud0/scikit_learn_clone
- Owner: iboud0
- Created: 2024-04-08T03:02:29.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2024-06-16T23:34:21.000Z (7 months ago)
- Last Synced: 2024-12-13T08:38:25.360Z (about 1 month ago)
- Language: Python
- Homepage: https://pypi.org/project/sktlearn-clone/
- Size: 601 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
### README.md
# Scikit Learn Clone
Scikit Learn Clone is a custom implementation of various machine learning algorithms and utilities inspired by Scikit-Learn. This project is designed to provide a comprehensive set of tools for data preprocessing, model selection, evaluation, and supervised learning.
## Project Structure
The project is organized into several modules, each focusing on a specific aspect of machine learning. Below is the detailed structure:
```
scikit_learn_clone/
├── data/
├── ensemble_methods/
│ ├── __init__.py
│ ├── adaBoost.py
│ ├── bagging.py
│ ├── gradient_boosting.py
│ ├── randomForest.py
│ ├── stacking.py
├── metrics_model_evaluation/
│ ├── __init__.py
│ ├── accuracy.py
│ ├── confusion_matrix.py
│ ├── f1_score.py
│ ├── mean_absolute_error.py
│ ├── mean_squared_error.py
│ ├── precision.py
│ ├── r2_score.py
│ ├── recall.py
├── model_selection/
│ ├── __init__.py
│ ├── cross_validation.py
│ ├── grid_search.py
│ ├── kfold.py
│ ├── make_scorer.py
│ ├── param_grid.py
│ ├── train_test_split.py
├── preprocessing/
│ ├── __init__.py
│ ├── impute_missing_values_mean.py
│ ├── impute_missing_values_median.py
│ ├── impute_missing_values_mode.py
│ ├── labelencoder.py
│ ├── normalize_features.py
│ ├── numerical_categorical_variable.py
│ ├── onehotencoder.py
│ ├── outliers.py
│ ├── scale_features_min_max.py
│ ├── scale_features_standard.py
│ ├── select_features.py
├── supervised_learning/
│ ├── __init__.py
│ ├── DecisionTree.py
│ ├── knn.py
│ ├── LinearRegression.py
│ ├── LogisticRegression.py
│ ├── NaiveBayes.py
│ ├── NeuralNetwork.py
├── testing/
├── utilities/
│ ├── __init__.py
│ ├── Estimator.py
│ ├── MetaEstimator.py
│ ├── ModelSelector.py
│ ├── Pipeline.py
│ ├── Predictor.py
│ ├── Transformer.py
├── .gitignore
├── README.md
└── setup.py```
## InstallationTo install the package, use pip:
```bash
pip install sktlearn-clone
```## Usage
Here are some examples of how to use the various modules in this package.
### Example: Decision Tree Classifier
```python
from scikit_learn_clone.supervised_learning.DecisionTree import DecisionTreeClassifier
from scikit_learn_clone.model_selection.train_test_split import train_test_split
from scikit_learn_clone.metrics_model_evaluation.accuracy import accuracy_score# Sample dataset
X = [[0, 0], [1, 1]]
y = [0, 1]# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5)# Initialize and train the classifier
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)# Predict and evaluate
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
```### Example: K-Fold Cross-Validation
```python
from scikit_learn_clone.model_selection.kfold import KFold
from scikit_learn_clone.supervised_learning.LinearRegression import LinearRegression# Sample dataset
X = [[i] for i in range(10)]
y = [2 * i for i in range(10)]# Initialize KFold
kf = KFold(n_splits=5)# Initialize model
model = LinearRegression()# Perform K-Fold Cross-Validation
for train_index, test_index in kf.split(X):
X_train, X_test = [X[i] for i in train_index], [X[i] for i in test_index]
y_train, y_test = [y[i] for i in train_index], [y[i] for i in test_index]
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(f"Fold results: {predictions}")
```## Features
### Ensemble Methods
- **AdaBoost**: Adaptive Boosting algorithm.
- **Bagging**: Bootstrap Aggregating algorithm.
- **Gradient Boosting**: Gradient Boosting algorithm.
- **Random Forest**: Ensemble of Decision Trees.
- **Stacking**: Stacked generalization.### Metrics and Model Evaluation
- **Accuracy**: Classification accuracy.
- **Confusion Matrix**: Performance measurement for classification.
- **F1 Score**: Harmonic mean of precision and recall.
- **Mean Absolute Error**: Regression metric.
- **Mean Squared Error**: Regression metric.
- **Precision**: Classification precision.
- **R2 Score**: Coefficient of determination.
- **Recall**: Classification recall.### Model Selection
- **Cross-Validation**: Split the dataset into k consecutive folds.
- **Grid Search**: Exhaustive search over specified parameter values.
- **K-Fold**: K-Fold cross-validation iterator.
- **Make Scorer**: Convert metrics into callables.
- **Param Grid**: Define the parameter grid for search.
- **Train-Test Split**: Split arrays or matrices into random train and test subsets.### Preprocessing
- **Imputation**: Handle missing values.
- Mean, Median, Mode imputation.
- **Label Encoding**: Encode categorical features as an integer array.
- **Normalization**: Scale input vectors individually to unit norm.
- **One-Hot Encoding**: Encode categorical integer features as a one-hot numeric array.
- **Outlier Detection**: Identify and handle outliers in the data.
- **Feature Scaling**: Standardize features by removing the mean and scaling to unit variance.
- Min-Max scaling.
- **Feature Selection**: Select features based on importance or correlation.### Supervised Learning
- **Decision Tree**: Decision Tree classifier.
- **k-Nearest Neighbors**: k-Nearest Neighbors algorithm.
- **Linear Regression**: Linear Regression algorithm.
- **Logistic Regression**: Logistic Regression algorithm.
- **Naive Bayes**: Naive Bayes algorithm.
- **Neural Network**: Neural Network algorithm.## Contributing
Contributions are welcome! Please follow these steps:
1. Fork the repository.
2. Create a new branch (`git checkout -b feature/your-feature`).
3. Make your changes.
4. Commit your changes (`git commit -m 'Add some feature'`).
5. Push to the branch (`git push origin feature/your-feature`).
6. Open a pull request.## License
This project is licensed under the MIT License. See the [LICENSE](LICENSE) file for details.
```This revised version ensures the project structure is properly separated and formatted. This will improve readability and help users quickly find relevant information.