{"id":34090418,"url":"https://github.com/blitzml/blitzml","last_synced_at":"2026-03-11T12:17:18.030Z","repository":{"id":62235737,"uuid":"559043065","full_name":"blitzml/blitzml","owner":"blitzml","description":"Automate machine learning pipelines rapidly","archived":false,"fork":false,"pushed_at":"2023-08-25T12:14:23.000Z","size":2534,"stargazers_count":5,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-07-03T17:37:51.035Z","etag":null,"topics":["automation","classification","clustering","low-code","machine-learning","python","regression","time-series"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/blitzml.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-10-28T22:37:23.000Z","updated_at":"2023-07-10T11:26:38.000Z","dependencies_parsed_at":"2023-01-21T22:04:25.735Z","dependency_job_id":null,"html_url":"https://github.com/blitzml/blitzml","commit_stats":null,"previous_names":["ahmedmohamed25/blitzml"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/blitzml/blitzml","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blitzml%2Fblitzml","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blitzml%2Fblitzml/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blitzml%2Fblitzml/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blitzml%2Fblitzml/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/blitzml","download_url":"https://codeload.github.com/blitzml/blitzml/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/blitzml%2Fblitzml/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":27729718,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-12-14T02:00:11.348Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","classification","clustering","low-code","machine-learning","python","regression","time-series"],"created_at":"2025-12-14T14:18:52.007Z","updated_at":"2026-03-11T12:17:18.015Z","avatar_url":"https://github.com/blitzml.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n\u003cdiv align=\"center\"\u003e\n\u003cimg src=\"auxiliary/docs/logo.png\" alt=\"BlitzML\" width=\"400\"/\u003e\n\n### **Automate machine learning pipelines rapidly**\n\n\n\u003cdiv align=\"left\"\u003e\n\n- [Install BlitzML](#install-blitzml)\n- [Classification](#classification)\n- [Regression](#regression)\n- [Time-Series](#time-series)\n- [Clustering](#clustering)\n\n\n\n# Install BlitzML  \n\n\n```bash\npip install blitzml\n```\n\n\n# Classification\n\n```python\nfrom blitzml.tabular import Classification\nimport pandas as pd\n\n# prepare your dataframes\ntrain_df = pd.read_csv(\"auxiliary/datasets/banknote/train.csv\")\ntest_df = pd.read_csv(\"auxiliary/datasets/banknote/test.csv\")\n\n# create the pipeline\nauto = Classification(train_df, test_df, algorithm = 'RF', n_estimators = 50)\n\n# perform the entire process\nauto.run()\n\n# We can get their values using:\npred_df = auto.pred_df\nmetrics_dict = auto.metrics_dict\n\nprint(pred_df.head())\nprint(metrics_dict)\n```\n\n\n## Available Classifiers\n\n- Random Forest 'RF' \n- LinearDiscriminantAnalysis 'LDA' \n- Support Vector Classifier 'SVC' \n- KNeighborsClassifier 'KNN' \n- GaussianNB 'GNB' \n- LogisticRegression 'LR'\n- AdaBoostClassifier 'AB'\n- GradientBoostingClassifier 'GB'\n- DecisionTreeClassifier 'DT'\n- MLPClassifier 'MLP'\n\n\n## **Parameters**\n**classifier**  \noptions: {'RF','LDA','SVC','KNN','GNB','LR','AB','GB','DT','MLP', 'auto', 'custom'}, default = 'RF'  \n`auto: selects the best scoring classifier based on f1-score`  \n`custom: enables providing a custom classifier through *file_path* and *class_name* parameters`  \n**file_path**  \nwhen using 'custom' classifier, pass the path of the file containing the custom class, default = 'none'  \n**class_name**  \nwhen using 'custom' classifier, pass the class name through this parameter, default = 'none'  \n**feature_selection**  \noptions: {'correlation', 'importance', 'none'}, default = 'none'  \n`correlation: use feature columns with the highest correlation with the target`  \n`importance: use feature columns that are important for the model to predict the target`  \n`none: use all feature columns`  \n**validation_percentage**  \nvalue determining the validation split percentage (value from 0 to 1), default = 0.1  \n**average_type**  \nwhen performing multiclass classification, provide the average type for the resulting metrics, default = 'macro'  \n**cross_validation_k_folds**  \nnumber of k-folds for cross validation, if 1 then no cv will be performed, default = 1  \n****kwargs**  \noptional parameters for the chosen classifier. you can find available parameters in the [sklearn docs](https://scikit-learn.org/stable/user_guide.html)  \n## **Attributes**  \n**train_df**  \nthe preprocessed train dataset (after running `Classification.preprocess()`)  \n**test_df**  \nthe preprocessed test dataset (after running `Classification.preprocess()`)  \n**model**  \nthe trained model (after running `Classification.train_the_model()`)  \n**pred_df**  \nthe prediction dataframe (test_df + predicted target) (after running `Classification.gen_pred_df(Classification.test_df)`)  \n**metrics_dict**  \nthe validation metrics (after running `Classification.gen_metrics_dict()`)  \n{  \n    \"accuracy\": acc,  \n    \"f1\": f1,  \n    \"precision\": pre,  \n    \"recall\": recall,  \n    \"hamming_loss\": h_loss,  \n    \"cross_validation_score\":cv_score, `returns None if cross_validation_k_folds==1`  \n}   \n## **Methods**  \n**run()**  \na shortcut that runs the entire process:  \n- preprocessing\n- model training  \n- prediction  \n- model evaluation  \n\n**accuracy_history()**  \naccuracy scores when varying the sampling size of the train_df (after running `Classification.train_the_model()`).  \n*returns:*  \n{  \n    'x':train_df_sample_sizes,  \n    'y1':train_scores_mean,  \n    'y2':test_scores_mean,  \n    'title':title  \n}  \n**plot()**\n\nplotting line chart visualizes accuracy history\n\n\n# Regression  \n\n```python\nfrom blitzml.tabular import Regression\nimport pandas as pd\n\n# prepare your dataframes\ntrain_df = pd.read_csv(\"auxiliary/datasets/house prices/train.csv\")\ntest_df = pd.read_csv(\"auxiliary/datasets/house prices/test.csv\")\n\n# create the pipeline\nauto = Regression(train_df, test_df, algorithm = 'RF')\n\n# perform the entire process\nauto.run()\n\n# We can get their values using:\npred_df = auto.pred_df\nmetrics_dict = auto.metrics_dict\n\nprint(pred_df.head())\nprint(metrics_dict)\n```\n\n\n## Available Regressors\n\n- Random Forest 'RF'\n- Support Vector Regressor 'SVR'\n- KNeighborsRegressor 'KNN'\n- Lasso Regressor 'LSS'\n- LinearRegression 'LR'\n- Ridge Regressor 'RDG'\n- GaussianProcessRegressor 'GPR'\n- GradientBoostingRegressor 'GB'\n- DecisionTreeRegressor 'DT'\n- MLPRegressor 'MLP'\n\n## **Parameters**\n**regressor**  \noptions: {'RF','SVR','KNN','LSS','LR','RDG','GPR','GB','DT','MLP', 'auto', 'custom'}, default = 'RF'  \n`auto: selects the best scoring regressor based on r2 score`  \n`custom: enables providing a custom regressor through *file_path* and *class_name* parameters`  \n**file_path**  \nwhen using 'custom' regressor, pass the path of the file containing the custom class, default = 'none'  \n**class_name**  \nwhen using 'custom' regressor, pass the class name through this parameter, default = 'none'  \n**feature_selection**  \noptions: {'correlation', 'importance', 'none'}, default = 'none'  \n`correlation: use feature columns with the highest correlation with the target`  \n`importance: use feature columns that are important for the model to predict the target`  \n`none: use all feature columns`  \n**validation_percentage**  \nvalue determining the validation split percentage (value from 0 to 1), default = 0.1  \n**cross_validation_k_folds**  \nnumber of k-folds for cross validation, if 1 then no cv will be performed, default = 1  \n****kwargs**  \noptional parameters for the chosen regressor. you can find available parameters in the [sklearn docs](https://scikit-learn.org/stable/user_guide.html)  \n## **Attributes**  \n**train_df**  \nthe preprocessed train dataset (after running `Regression.preprocess()`)  \n**test_df**  \nthe preprocessed test dataset (after running `Regression.preprocess()`)   \n**model**  \nthe trained model (after running `Regression.train_the_model()`)  \n**pred_df**  \nthe prediction dataframe (test_df + predicted target) (after running `Regression.gen_pred_df(Regression.test_df)`)  \n**metrics_dict**  \nthe validation metrics (after running `Regression.gen_metrics_dict()`)  \n{  \n    \"r2_score\": r2,  \n    \"mean_squared_error\": mse,  \n    \"root_mean_squared_error\": rmse,  \n    \"mean_absolute_error\" : mae,  \n    \"cross_validation_score\":cv_score, `returns None if cross_validation_k_folds==1`  \n}  \n## **Methods**  \n\n\n**run()**  \na shortcut that runs the entire process:  \n- preprocessing\n- model training  \n- prediction  \n- model evaluation   \n\n**plot()**\n\nplotting line chart visualizes RMSE history\n\n**RMSE_history()**  \nRMSE scores when varying the sampling size of the train_df (after running `Regression.train_the_model()`).  \n*returns:*  \n{  \n    'x':train_df_sample_sizes,  \n    'y1':train_scores_mean,  \n    'y2':test_scores_mean,  \n    'title':title  \n}  \n# Time-series\ntime series is a particular problem of Regression, but time series have some additional functions:\n- stationary test (IsStationary()). \n- convert to stationary.\n- reverse predicted.\n\nand the dataset must have a DateTime column, even if the DataType of this column is Object.\n```python\nfrom blitzml.tabular import TimeSeries \nimport pandas as pd\n\n# prepare your dataframes\ntrain_df = pd.read_csv(\"train_dataset.csv\")\ntest_df = pd.read_csv(\"test_dataset.csv\")\n\n# create the pipeline\nauto = TimeSeries(train_df, test_df, algorithm = 'RF')\n\n# Perform the entire process:\nauto.run()\n\n# We can get their values using:\npred_df = auto.pred_df\nmetrics_dict = auto.metrics_dict\n\nprint(pred_df.head())\nprint(metrics_dict)\n```\n# Clustering \n\n```python\nfrom blitzml.unsupervised import Clustering\nimport pandas as pd\n\n# prepare your dataframe\ntrain_df = pd.read_csv(\"auxiliary/datasets/customer personality/train.csv\")\n\n# create the pipeline\nauto = Clustering(train_df, clustering_algorithm = 'KM')\n\n# first perform data preprocessing\nauto.preprocess()\n# second train the model\nauto.train_the_model()\n\n# After training the model we can generate:\nauto.gen_pred_df()\nauto.gen_metrics_dict()\n\n# We can get their values using:\nprint(auto.pred_df.head())\nprint(auto.metrics_dict)\n```\n\n\n## Available Clustering Algorithms \n\n- K-Means 'KM' \n- Affinity Propagation 'AP' \n- Agglomerative Clustering 'AC' \n- Mean Shift 'MS' \n- Spectral Clustering 'SC' \n- Birch 'Birch' \n- Bisecting K-Means 'BKM' \n- OPTICS 'OPTICS' \n- DBSCAN 'DBSCAN' \n\n## **Parameters** \n**clustering_algorithm**  \noptions: {\"KM\", \"AP\", \"AC\", \"MS\", \"SC\", \"Birch\", \"BKM\", \"OPTICS\", \"DBSCAN\", 'auto', 'custom'}, default = 'KM' \n`auto: selects the best scoring clustering algorithm based on silhouette score` \n`custom: enables providing a custom clustering algorithm through *file_path* and *class_name* parameters` \n**file_path** \nwhen using 'custom' clustering_algorithm, pass the path of the file containing the custom class, default = 'none'   \n**class_name**\nwhen using 'custom' clustering_algorithm, pass the class name through this parameter, default = 'none' \n**feature_selection** \noptions: {'importance', 'none'}, default = 'none' \n`importance: use feature columns that are important for the model to predict the target` \n`none: use all feature columns` \n****kwargs** \noptional parameters for the chosen clustering_algorithm. you can find available parameters in the [sklearn docs](https://scikit-learn.org/stable/user_guide.html) \n## **Attributes** \n**train_df** \nthe preprocessed train dataset (after running `Clustering.preprocess()`)  \n**model** \nthe trained model (after running `Clustering.train_the_model()`) \n**pred_df** \nthe prediction dataframe (test_df + predicted target) (after running `Clustering.gen_pred_df()`) \n**metrics_dict** \nthe validation metrics (after running `Clustering.gen_metrics_dict()`) \n{ \n    \"silhouette_score\": sil_score, \n    \"calinski_harabasz_score\": cal_har_score, \n    \"davies_bouldin_score\": dav_boul_score, \n    \"n_clusters\" : n \n} \n## **Methods** \n**preprocess()** \nperform preprocessing on train_df  \n**train_the_model()** \ntrain the chosen clustering algorithm on the train_df \n**clustering_visualization()** \n2-d visualization of the data points with its corresponding labels  (after doing dimensionality reduction using Principal Componenet Analysis). \n*returns:* \n{ \n    'principal_component_1':pc1, \n    'principal_component_2':pc2, \n    'cluster_labels':labels, \n    'title':title \n} \n**gen_pred_df()** \ngenerates the prediction dataframe and assigns it to the `pred_df` attribute \n**gen_metrics_dict()** \ngenerates the clustering metrics and assigns it to the `metrics_dict`  \n**run()** \na shortcut that runs the following methods: \npreprocess() \ntrain_the_model() \ngen_pred_df() \ngen_metrics_dict() \n## Development  \n\n- Clone the repo  \n- run `pip install virtualenv`\n- run `python -m virtualenv venv`\n- run `. ./venv/bin/activate` on UNIX based systems or `. ./venv/Scripts/activate.ps1` if on windows\n- run `pip install -r requirements.txt`\n- run `pre-commit install`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblitzml%2Fblitzml","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fblitzml%2Fblitzml","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fblitzml%2Fblitzml/lists"}