{"id":21674145,"url":"https://github.com/abhimishra91/hackathon-framework","last_synced_at":"2025-06-16T10:32:34.045Z","repository":{"id":129909046,"uuid":"267203616","full_name":"abhimishra91/hackathon-framework","owner":"abhimishra91","description":"Framework that will be used for Data Hackathon","archived":false,"fork":false,"pushed_at":"2021-01-21T06:05:00.000Z","size":63,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-25T09:42:46.003Z","etag":null,"topics":["data-science","framework","kaggle-competition"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/abhimishra91.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-05-27T02:41:13.000Z","updated_at":"2023-11-20T08:16:49.000Z","dependencies_parsed_at":"2023-03-29T00:19:30.797Z","dependency_job_id":null,"html_url":"https://github.com/abhimishra91/hackathon-framework","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhimishra91%2Fhackathon-framework","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhimishra91%2Fhackathon-framework/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhimishra91%2Fhackathon-framework/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/abhimishra91%2Fhackathon-framework/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/abhimishra91","download_url":"https://codeload.github.com/abhimishra91/hackathon-framework/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244583162,"owners_count":20476233,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-science","framework","kaggle-competition"],"created_at":"2024-11-25T13:43:15.002Z","updated_at":"2025-03-20T08:44:05.674Z","avatar_url":"https://github.com/abhimishra91.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Hackathon Framework\n\nObjective of this project to enable quick experimentation in Data Analytics projects with minimal cookie cutter programming.\nGetting rid of all the fit_transforms.! \n\n---\n***NOTE***\n\n- This is a work in progress. Underlying modules are in process of development.\n- As this project matures there will be changes in the scripts such as `train.py` and `predict.py`\n- TODO\n    * Create modules for `tuning`, `stacking`\n    * Removal of some of the modules that are redundant\n\n\n---\n\nThe framework is designed to make the Data Science flow easier to perform, by encapsulating different techniques for each step within 1 method.\nThere are classes for each of the below listed steps:\n\n - Feature Evaluation\n    * Report to give an intution of the dataset\n \n - Feature Engineering\n    * Modules to perform feature transformations on Categorical and Numerical Dataset.\n    * Various applicable techniques are encoded within these modules and are accesed with an argument.\n \n - Fefature Generation\n    * Module to create new features based on different techniques\n   \n - Cross Validation\n    * Stratified Folding both for Regression and Classification\n\n - Training\n    * Run multiple models using 1 class.\n    * Evaluating and Saving the results in an organized manner\n\n - Tuning\n    * Hyper-parameter tuning of multiple models, based on json arguments for parameter values.\n \n - Prediction\n\n - Evaluating the model\n\n  \n  \n\n## Steps to use the framework\n\n  \n\n1. Clone the repo.\n2. Create 3 folders `input` and `model` and `tuneq`.\n3. Save the training, testing and sample submission file in `input` folder. \n4. The outputs generated from training such as trained model, encoders and oof_preds will be saved in `model` folder.\n5. The parameters for fine tuning the models should be saved in the `tune` folder.\n6. Update the `config.py` to point it to the correct path for data, model and tuning.\n7. Update the `dispatcher.py` with model/models you want to run your dataset on.\n8. Use the sample notebook to understand how to use this framework after this intial configuration is completed.\n\n\n\n\n## Description of Files and their Purpose\n\n- `config.py`: Config file to give path of all the datasets and other standard configuration items. Such as csv files path, random seed etc.\n \n- `feature_eval.py`: This script and the class inside is used to analyze the dataframe and its columns to get the following output:\n\t - min, max and unique values of each column\n\t - histogram/ distribution of each column\n\t - corelation of columns using a heat map\n\t \n- `feature_gen.py`: Encapsulates method to generate new features. Currently implemented the `Polynomial features` method from sklearn.\n    Returns Dataframe with new features. \n\n- `feature_impute.py`: Encapsulates the method to impute blank values in a dataframe.\n    Currently, it supports 3 imputation methods:\n    - Simple Imputer\n    - Model Based Imputer: Extra Trees or knn\n    - Knn based imputer\n    - Returns updated Dataframe\n\n- `cross_validation.py`: This class is used to perform cross validation on any dataframe based on the type of problem statement. It is used to create cross validated dataset.\n\n- `categorical.py`: This class can be used for encoding of categorical features in a given dataframe.\n\t- Inputs : Dataframe, Categorical Columns List, Type of Encoding\n\t- Output: Encoded Dataframe\n\t- Supported Encoding Techniques:\n\t    - Lable Encoding\n\t    - Binary Encoding\n\t    - One Hot Encoding\n\n- `numerical.py`: This class can be used for encoding of numerical features in a given dataframe.\n\t- Inputs : Dataframe, Categorical Columns List, Type of Encoding\n\t- Output: Encoded Dataframe, Transformer Object for later use. \n\t- Support Techniques:\n\t    - Standard Scaler\n\t    - Min-Max Scaler\n\t    - Power Tranformer\n\t    - Log Transformer\n\t\n- `metrics.py`: This class can be used to evaluate the results of given predictions and actual value. \n\n- `dispatcher.py`: Python File with Models and parameters. They have been designed to supply the models to `engine.py` for training on a given dataset\n\n- `engine.py`: This script encapsulates the method to train and evaluate the multiple models simultaneously\n    - Leverages on `dispatcher.py` and `metrics.py` for model and metrics\n    - The results for each fold are also saved in the `models` folder as `oof_predictions.csv` for each model.\n    - **To Do** Stacking module to suporrt stacking of multiple models\n\n- **Scripts to be ignored for now**:\n    - `train.py`: For training\n    - `predict.py`: For prediction\n    - `tune.py`: For tuning h-parameter\n    - `create_folds.py`: To create folded datframe","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhimishra91%2Fhackathon-framework","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabhimishra91%2Fhackathon-framework","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabhimishra91%2Fhackathon-framework/lists"}