{"id":24717886,"url":"https://github.com/ashithapallath/feature-engineering","last_synced_at":"2026-04-18T06:33:53.082Z","repository":{"id":220111254,"uuid":"750784081","full_name":"ashithapallath/Feature-Engineering","owner":"ashithapallath","description":"This repository contains a range of examples and techniques for feature engineering, aimed at improving dataset quality and boosting model performance. It covers essential methods such as Exploratory Data Analysis (EDA) and Interquartile Range (IQR) analysis for detecting and handling outliers. ","archived":false,"fork":false,"pushed_at":"2025-01-14T05:16:21.000Z","size":1052,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-05-29T18:34:55.180Z","etag":null,"topics":["exploratory-data-analysis","feature-engineering","iqr-method","matplotlib","numpy","outlier-detection","pandas","python","seaborn"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/ashithapallath.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-01-31T10:08:33.000Z","updated_at":"2025-01-14T07:18:49.000Z","dependencies_parsed_at":"2024-01-31T11:29:34.257Z","dependency_job_id":"a052a371-e4b4-48c1-9437-b41577ffba82","html_url":"https://github.com/ashithapallath/Feature-Engineering","commit_stats":null,"previous_names":["ashithapallath/feature-engineering"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/ashithapallath/Feature-Engineering","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashithapallath%2FFeature-Engineering","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashithapallath%2FFeature-Engineering/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashithapallath%2FFeature-Engineering/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashithapallath%2FFeature-Engineering/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/ashithapallath","download_url":"https://codeload.github.com/ashithapallath/Feature-Engineering/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/ashithapallath%2FFeature-Engineering/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31959880,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-18T00:39:45.007Z","status":"online","status_checked_at":"2026-04-18T02:00:07.018Z","response_time":103,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["exploratory-data-analysis","feature-engineering","iqr-method","matplotlib","numpy","outlier-detection","pandas","python","seaborn"],"created_at":"2025-01-27T10:12:43.998Z","updated_at":"2026-04-18T06:33:53.065Z","avatar_url":"https://github.com/ashithapallath.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n# Feature Engineering  \n\nThis repository contains examples and techniques for feature engineering, focusing on improving dataset quality and enhancing model performance. It covers critical aspects such as **Exploratory Data Analysis (EDA)** and **Interquartile Range (IQR) analysis** for outlier detection and handling.  \n\n\n\n## Features  \n\nThis repository includes:  \n- **Exploratory Data Analysis (EDA)**:  \n  - Understanding data distribution.  \n  - Summary statistics and visualizations.  \n  - Insights into data trends and anomalies.  \n- **Outlier Detection using IQR**:  \n  - Identification of outliers based on the interquartile range.  \n  - Strategies for outlier handling (e.g., capping, removal).  \n- **Feature Engineering Techniques**:  \n  - Handling missing values.  \n  - Data normalization and scaling.  \n  - Feature transformation and encoding.  \n\n\n\n## Prerequisites  \n\nEnsure you have the following installed:  \n- Python 3.8+  \n- Required libraries:  \n  - NumPy  \n  - Pandas  \n  - Matplotlib  \n  - Seaborn  \n\nInstall dependencies using:  \n```bash  \npip install numpy pandas matplotlib seaborn  \n```  \n\n\n\n## How to Use  \n\n1. Clone the repository:  \n   ```bash  \n   git clone https://github.com/ashithapallath/Feature-Engineering.git  \n   cd Feature-Engineering  \n   ```  \n\n2. Explore the Jupyter Notebooks (`*.ipynb`):  \n   - Notebooks include step-by-step explanations and implementations.  \n\n3. Run the notebooks using:  \n   ```bash  \n   jupyter notebook  \n   ```  \n\n4. Follow the instructions in each notebook to reproduce the analyses and techniques.  \n\n\n## Techniques Overview  \n\n### **Exploratory Data Analysis (EDA)**  \n- Summarizing data using:  \n  - Descriptive statistics (mean, median, standard deviation, etc.).  \n  - Data visualizations (histograms, box plots, scatter plots).  \n- Identifying patterns, trends, and anomalies in the data.  \n\n### **IQR-Based Outlier Detection**  \n- Calculation of the interquartile range:  \n  ```python  \n  Q1 = data['column'].quantile(0.25)  \n  Q3 = data['column'].quantile(0.75)  \n  IQR = Q3 - Q1  \n  lower_bound = Q1 - 1.5 * IQR  \n  upper_bound = Q3 + 1.5 * IQR  \n  outliers = data[(data['column'] \u003c lower_bound) | (data['column'] \u003e upper_bound)]  \n  ```  \n- Options for handling outliers:  \n  - Removing rows with outliers.  \n  - Capping values at lower and upper bounds.  \n---\n\n\n## Contribution  \n\nContributions are welcome!  \n1. Fork the repository.  \n2. Create a branch for your feature or fix.  \n3. Submit a pull request with a description of your changes.  \n\n\n\n## License  \n\nThis project is licensed under the MIT License.  \n\n\n\n## Acknowledgments  \n\nSpecial thanks to the open-source community for providing the tools and libraries that made this repository possible.  \n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashithapallath%2Ffeature-engineering","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fashithapallath%2Ffeature-engineering","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fashithapallath%2Ffeature-engineering/lists"}