{"id":24522098,"url":"https://github.com/nickssilver/irismlpipeline","last_synced_at":"2025-03-15T13:11:23.815Z","repository":{"id":233346136,"uuid":"630180030","full_name":"nickssilver/IrisMLPipeline","owner":"nickssilver","description":"An example of a machine learning pipeline using the Iris dataset","archived":false,"fork":false,"pushed_at":"2023-04-19T21:14:42.000Z","size":10,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-01-22T03:17:43.401Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/nickssilver.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2023-04-19T20:52:27.000Z","updated_at":"2023-04-19T21:05:02.000Z","dependencies_parsed_at":"2024-04-15T22:22:18.534Z","dependency_job_id":"0e4f39ae-6a83-4d69-a6e1-1dffc7f13f09","html_url":"https://github.com/nickssilver/IrisMLPipeline","commit_stats":null,"previous_names":["nickssilver/irismlpipeline"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickssilver%2FIrisMLPipeline","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickssilver%2FIrisMLPipeline/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickssilver%2FIrisMLPipeline/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/nickssilver%2FIrisMLPipeline/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/nickssilver","download_url":"https://codeload.github.com/nickssilver/IrisMLPipeline/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243732296,"owners_count":20338839,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-22T03:17:52.842Z","updated_at":"2025-03-15T13:11:23.794Z","avatar_url":"https://github.com/nickssilver.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Machine Learning Pipeline Example\n\nThis repository demonstrates the creation of a machine learning pipeline using scikit-learn with the Random Forest Regression algorithm. The pipeline includes data preprocessing, model training, and evaluation. The sources used for creating this example are as follows:\n\n   - [turing.com: Building an ML Pipeline in Python with scikit-learn](https://www.turing.com/kb/building-ml-pipeline-in-python-with-scikit-learn)\n   - [freecodecamp.org: Machine Learning Pipeline](https://www.freecodecamp.org/news/machine-learning-pipeline/)\n   - [towardsdatascience.com: Building a Machine Learning Pipeline](https://towardsdatascience.com/building-a-machine-learning-pipeline-3bba20c2352b)\n   - [analyticsvidhya.com: Build your first Machine Learning Pipeline using scikit-learn](https://www.analyticsvidhya.com/blog/2020/01/build-your-first-machine-learning-pipeline-using-scikit-learn/)\n\n## Getting Started\n\n   1. Clone this repository to your local machine.\n   2. Install the required packages:\n\npip install scikit-learn pandas numpy\n\n    Run the pipeline script:\n\npython src/pipeline.py\n\n## Pipeline Design\n\nThe pipeline is designed in three stages:\n\n    Data preprocessing: The dataset is cleaned by dropping unnecessary columns, filling missing values, and encoding categorical features.\n\n    Model training: The preprocessed data is split into training and testing sets, and a Random Forest Regression model is trained using the training set.\n\n    Model evaluation: The trained model is evaluated on the testing set to measure its performance.\n\n## Data Preprocessing\n\nThe data preprocessing stage includes the following steps:\n\n   - Dropping unused columns: df.drop(['record_id', 'casual', 'registered', 'datetime', 'temp'], axis=1, inplace=True)\n   - Creating pipelines for numerical and categorical features using Pipeline(steps=[('step name', transform function), …])\n   - Filling missing values with SimpleImputer\n   - Scaling numerical features with MinMaxScaler\n   - Encoding categorical features with OneHotEncoder(handle_unknown='ignore')\n\n## Model Training\n\nIn this stage, the preprocessed data is split into training and testing sets, and a Random Forest Regression model is trained using the training set. The pipeline is built using Pipeline(steps=[('scaler', StandardScaler()), ('classifier', RandomForestClassifier())]), and the model is trained with the fit() method.\nModel Evaluation\n\nThe trained model is evaluated on the testing set using accuracy_score and balanced_accuracy_score from scikit-learn's metrics module. The results are printed to the console.\n### Authors\n\n [Nicks M. Gitobu, Software Engineer](https://www.linkedin.com/in/nicholas-gitobu-973b081b9/)\n\nLicense\n\nThis project is licensed under the **MIT License**.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnickssilver%2Firismlpipeline","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnickssilver%2Firismlpipeline","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnickssilver%2Firismlpipeline/lists"}