Projects in Awesome Lists tagged with datapreprocessing
A curated list of projects in awesome lists tagged with datapreprocessing .
https://github.com/IngestAI/embedditor
⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.
datapreprocessing datascience embedding-vectors embeddings genai laravel llm markup-language ml nlp nltk php vector-database vector-search vectorization veml
Last synced: 28 Mar 2025
https://github.com/akarazniewicz/cocosplit
Simple tool to split COCO annotations into train/test datasets.
coco datapreprocessing deeplearning
Last synced: 25 Jun 2025
https://github.com/cereja-project/cereja
Cereja is a bundle of useful functions we don't want to rewrite and .. just pure fun!
array-manipulations colab console data-tools datapreprocessing file-converter freq freqitems hacktoberfest hacktoberfest2024 progress-bar progress-view python python-library python3 tfidf tokenizer utilities
Last synced: 06 Apr 2025
https://github.com/nafisalawalidris/fraud-detection-with-supervised-learning
This repository contains a basic fraud detection system utilising supervised learning techniques to identify potentially fraudulent credit card transactions. The project establishes a baseline model that addresses the challenges of credit card fraud in financial institutions.
datacollection datapreprocessing fastapi finance frauddetection machine-learning modeltraining python random-forest-classifier scikitlearn-machine-learning supervisedlearning
Last synced: 14 Jul 2025
https://github.com/karan-malik/prepdata
Automating the process of Data Preprocessing for Data Science
classification data dataanalysis dataframe datapreprocessing datascience machine-learning numpy pandas pip preprocessing pypi-package python python3 random-forest regress sklearn
Last synced: 13 Apr 2025
https://github.com/m-clark/tidyext
Extensions and extras for tidy processing.
datapreprocessing dplyr group-by head missing-data onehot-encoder prediction preprocessing r rounding sparse-matrix summary summary-statistics tail tidyr tidyverse
Last synced: 30 Apr 2025
https://github.com/omarsar/data_mining_lab
Material for Data Mining Lab Session (Fall Semester @ NTHU)
data datamining datapreprocessing datavisualization
Last synced: 14 Jul 2025
https://github.com/m-karthik-kumar/personalized-dietary-guidance-with-gen-ai
The algorithm utilizes Generative AI and Natural Language Processing (NLP) to analyze the nutritional content of packaged food products. The system considers personalized health conditions, such as allergies and dietary needs, to provide tailored recommendations, helping individuals make safer and more informed food choices.
datapreprocessing flask gemini-api generative-ai natural-language-processing nltk python
Last synced: 11 Apr 2025
https://github.com/faisalahmed21/cse422-artificial-intelligence
a-star-algorithm alpha-beta-pruning artificial-intelligence datapreprocessing feature-engineering featurescaling genetic-algorithm linear-regression logistic-regression machine-learning minmax-algorithm regression-analysis
Last synced: 15 Oct 2025
https://github.com/mohammedsaim-quadri/intrusion_detection-system
This project is an Intrusion Detection System (IDS) using machine learning (ML) and deep learning (DL) to detect network intrusions. It leverages the CICIDS2018 dataset to classify traffic as normal or malicious. Key features include data preprocessing, model training, hyperparameter tuning, and Docker containerization for scalable deployment.
bayesian-optimization cicids2018 cybersecurity datapreprocessing deep-learning docker hyperparameter-tuning intrusion-detection machinelearning neural-networks
Last synced: 09 Jul 2025
https://github.com/pavankethavath/car_dekho_car_price_prediction
A Streamlit web app utilizing Python, scikit-learn, and pandas for used car price prediction. Features data preprocessing (scaling, encoding), Random Forest model optimization with GridSearchCV, and interactive user input handling. Achieves high accuracy (R² score: 0.9028), showcasing skills in machine learning, data engineering, and deployment.
dataanalysis datacleaning datapreprocessing eda encoding feature-extraction feature-selection featureimportance fine-tuning machine-learning minmaxscaling normalization pandas pickle prediction-model python random-forest randomsearch-cv regression streamlit
Last synced: 23 Apr 2025
https://github.com/faizanmohd5/web-scraping-iphone-11-reviews
This is a web scraping project that extracts customer reviews for the iPhone 11 from Flipkart.com using Python and BeautifulSoup. The extracted data is saved in a CSV file for further analysis. Use it as a starting point for your own web scraping projects or for analyzing customer reviews of the iPhone 11.
beautifulsoup csv data-visualization dataanalysis dataextraction datainsights datamining datapreprocessing ecommerce-website ipython-notebook jupyter-notebook python reviews reviewscrapper webscraping
Last synced: 13 Jun 2025
https://github.com/hk151109/house-price-prediction_using-linear-regression
This repository contains the implementation of a Linear Regression model to predict house prices based on features such as square footage, number of bedrooms, and bathrooms.
datapreprocessing house-price-prediction linear-regression machine-learning
Last synced: 07 Oct 2025
https://github.com/addytrunks/machine-learning
A comprehensive repository documenting key machine learning algorithms, implementation details, and practical examples.
classfication clustering cnn datapreprocessing deep-learning ml neural-network object-detection python regression yolo
Last synced: 11 Aug 2025
https://github.com/sayanmondal2098/easytoken
Tokenizer is an independent Open Source, Natural Language Processing python library which implements a tokenizer to create token from Both Sentence and Paragraph.
data-science datapreprocessing dataprocessing natural-language natural-language-processing nlp nlp-library nlp-machine-learning python-library python3 text-processing text-summarization token tokenizer
Last synced: 14 Dec 2025
https://github.com/mdalamin5/capstone-project-adaptive-tutoring-system-ai-based-all-experimental-resources
This project is an AI-powered algebra tutor using the Phi-3 Mini model. It provides personalized learning through interactive chat, adapting to the student's level and offering detailed step-by-step solutions. Built with Streamlit for an engaging educational experience.
datapreprocessing graph-database groq-api langgraph llama-index llm-finetuning multi-agent-systems openai-models phi-2 retrieval-augmented-generation streamlit vector-database
Last synced: 23 Aug 2025
https://github.com/bala-1409/foreign-exchange-rate-time-series-data-science-project
This project will use time series analysis to forecast the exchange rate between the euro and the US dollar. The project will use a variety of statistical techniques, such as ARIMA to model the data and forecast the exchange rate.
data-analysis data-science data-visualization datapreprocessing eda exploratory-data-analysis forecasting machine-learning-algorithms model modelfitting predictive-modeling python3 scikit-learn statsmodels time-series time-series-analysis
Last synced: 20 Jul 2025
https://github.com/divithraju/divith-aju-hadoop-pyspark-pipeline
This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.
apache-hadoop-framework apache-spark bigdata client data database dataengineering dataingestionframework datapreprocessing documentation ecommerce-platform hdfs pipeline project project-repository pyspark python3 software-engineering
Last synced: 06 Mar 2025
https://github.com/redayzarra/nlp_yelpreviews
This project covers the topic of natural language processing or NLP to classify user-generated text and determine their intent. The goal of this project is to build a model that can classify 10,000 Yelp reviews into either one-star or 5-star reviews. This project showcases a step-by-step implementation of the model as well as in-depth notes.
datapreprocessing dataprocessing machine-learning multinomial-naive-bayes naive-bayes naive-bayes-algorithm naive-bayes-classifier natural-language-processing nlp sentiment-analysis text-classification
Last synced: 22 Aug 2025
https://github.com/chandkund/spam-email-detection
This project focuses on detecting spam emails using a fine-tuned DistilBERT model, a lighter version of the BERT model. The model is trained to classify email text into two categories: spam (1) and not spam (0). The dataset consists of email texts labeled as either spam or non-spam.
data-visualization datapreprocessing matplotlib pandas python pytorch sklearn transformer
Last synced: 20 Jan 2026
https://github.com/batthulavinay/predictive-maintenance-for-industrial-equipment
This project focuses on Predictive Maintenance for industrial equipment using machine learning. The goal is to predict potential machine failures before they occur, enabling proactive maintenance and reducing downtime.
accuracy data-visualization datapreprocessing exploratory-data-analysis feature-engineering jupyter-notebook machine-learning-algorithms matplotlib modeldevelopment modelevaluation numpy pandas python recall scikit-learn seaborn
Last synced: 30 Dec 2025
https://github.com/MDalamin5/Capstone-Project-Adaptive-Tutoring-System-AI-Based-All-Experimental-Resources
This project is an AI-powered algebra tutor using the Phi-3 Mini model. It provides personalized learning through interactive chat, adapting to the student's level and offering detailed step-by-step solutions. Built with Streamlit for an engaging educational experience.
datapreprocessing graph-database groq-api langgraph llama-index llm-finetuning multi-agent-systems openai-models phi-2 retrieval-augmented-generation streamlit vector-database
Last synced: 14 Sep 2025
https://github.com/varun-khorgade/churnshield-customer-retention-predictor
Built an ML-based classification model to predict customer churn. Applied data preprocessing, feature engineering, and ensemble algorithms to improve prediction accuracy and help businesses implement retention strategies.
classification-algorithm datapreprocessing f1-score feature-engineering hyperparameter-tuning logistic-regression matplotlib model-evaluation numpy pandas python ran roc-auc scikit-learn seaborn xgboost
Last synced: 05 Oct 2025
https://github.com/yash-rewalia/stock-closing-price-prediction-using-regression
The ultimate business objective is to leverage the regression model to provide accurate predictions of the closing price of AMRN stock, enabling stakeholders to make well-informed investment decisions, manage risks effectively, optimize portfolios, Early warning systems to alert any fraud cases and align investment strategies with financial goals.
datapreprocessing eda hypothesis-testing machine-learning numpy pandas python random-forest regression regression-analysis statistics
Last synced: 20 Oct 2025
https://github.com/mdalamin5/machine-learning-2.0
Machine-Learning-2.0: A comprehensive repository documenting my journey to master ML from scratch. It includes core algorithms, advanced techniques, data preprocessing, feature engineering, and real-world projects. Follow my structured approach, inspired by "100 Days of ML," featuring Python implementations, tools, and insightful resources.
data-fetching-from-api datapreprocessing end-to-end-project feature-engineering gradient-descent-optimizers machine-learning-algorithms scikit-learn webscraping-data
Last synced: 25 Feb 2025
https://github.com/engrzulqarnain/scrapysub
ScrapySub is a Python library designed to recursively scrape website content, including subpages. It fetches the visible text from web pages and stores it in a structured format for easy access and analysis. This library is particularly useful for NLP and AI developers who need to gather large amounts of web content for their projects.
crawling datapreparation datapreprocessing python python-package scraper scraping-websites urllib3
Last synced: 30 Mar 2025
https://github.com/michael-insights/portfolio
This repository showcases my projects and skills in Data Analytics, Data Science, and Machine Learning. It includes hands-on work in data analysis, predictive modeling, and machine learning algorithms, aimed at solving real-world problems.
data-analytics data-science data-visualization datapreprocessing jupyter-notebooks machine-learning matplotlib numpy pandas predictive-modeling python scikit-learn sql
Last synced: 30 Dec 2025
https://github.com/mrfoxak/artificial-intelligence
This is All About AI & ML
airtificialintelligence data-science dataanalysis datapreprocessing datavisualization deep-learning feature-engineering feature-extraction feature-selection jyputer-notebook machine-learning machine-learning-algorithms natural-language-processing neural-network python
Last synced: 11 Oct 2025
https://github.com/ayushai/machine-learning-notes
Machine Learning Notes - A comprehensive repository featuring my handwritten notes and code files on machine learning. Explore topics like supervised and unsupervised learning, deep learning, and model evaluation. Perfect for students, professionals, and enthusiasts looking to deepen their understanding.
datapreprocessing datawrangling linear-regression logistic-regression machine-learning-algorithms
Last synced: 27 Mar 2025
https://github.com/kawai-senpai/ultraclean
UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.
aiml cleaner data-science datapreprocessing dataset spamdetection
Last synced: 06 May 2025
https://github.com/mdalamin5/cse499-project-adaptive-tutoring-system-ai-based
This project is an AI-powered algebra tutor using the Phi-3 Mini model. It provides personalized learning through interactive chat, adapting to the student's level and offering detailed step-by-step solutions. Built with Streamlit for an engaging educational experience.
datapreprocessing graph-database groq-api langgraph llama-index llm-finetuning multi-agent-systems openai-models phi-2 retrieval-augmented-generation streamlit vector-database
Last synced: 26 Mar 2025
https://github.com/lijesh010/ml_project_car_price_prediction_using_linearregression
This repository presents a data-driven exploration into predicting car prices using a machine learning model based on linear regression, aimed at aiding a Chinese automobile company's entry into the competitive US market.
car-price-prediction-with-machine-learning data-science datapreprocessing jupyter-notebook linear-regression machine-learning-algorithms python sklearn-library
Last synced: 02 Aug 2025
https://github.com/nikhilt1998/drivendata-dengai-predicting-disease-spread
DengAI: Disease spread prediction(DrivenData Challenge)
competition datapreprocessing drivendata machine-learning mean-absolute-error neural-network regression regression-models timeseries-analysis
Last synced: 20 Aug 2025
https://github.com/kool-cool/kool-cool-movie-recommendations-flask
The provided code snippet performs movie recommendation based on movie metadata using the TMDB Movie Metadata dataset from Kaggle.
countvectorizer datapreprocessing flask machine-learning machine-learning-algorithms moviemetadata movierecommendation movies-recommendation natural-language-processing nlp python reccomendersystem reccommendation tfidfvectorization webapp
Last synced: 31 Dec 2025
https://github.com/monzerdev/fake-news-detection
A robust fake news detection system leveraging machine learning models (SVM and Random Forest) to identify political misinformation. Includes preprocessing, training, and evaluation scripts with datasets available for download.
datapreprocessing fakenewsdetection linearsvc machinelearning naturalanguageprocessing nlp python randomforest svm textclassification
Last synced: 21 Jan 2026
https://github.com/batthulavinay/phone-usage-analysis
This project analyzes phone usage patterns in India and predicts the primary use of mobile devices based on various features. The notebook covers data preprocessing, exploratory data analysis (EDA), and model training using multiple classification algorithms.
accuracy-score adaboost classification-report confusion-matrix datapreprocessing datavisualization exploratory-data-analysis gradient-boosting imbalanced-data jupyter-notebook lightgbm logistic-regression matplotlib numpy pandas random-forest scikit-learn seaborn supportvectormachines xgboost
Last synced: 09 Nov 2025
https://github.com/bilalhameed248/diagnosis-effecting-patients-recovery-detection
A DNN Based Diagnosis Impact Detection Model. - Feb 2022 - Jun 2023
bert classification data-science datapipeline datapreprocessing deep-neural-networks dnn fine-tuning healthcare physiotherapy python transformer
Last synced: 12 Aug 2025
https://github.com/arjunraj79/custom_nids_with_ml
A simple Python script to check the strength of a password based on length, the inclusion of numbers, special characters, and upper/lower case letters.
datapreprocessing dos-attack feature-extraction intrusion-detection malware-analysis malware-detection ml-engineering modeltraining networktrafficanalysis portscanning realtime-detection virtualbox
Last synced: 14 Mar 2025
https://github.com/shoaib-akther-asif/country-wise-quality-of-life-overview
Data scraping with Selenium & visualizing the results through interactive dashboards in Tableau Public.
datanalytics datapreprocessing datascraping datavisualization python selenium-webdriver tableau-public
Last synced: 26 Feb 2025
https://github.com/munawar-code/car_price_predictor
This project is a ML-based car price prediction system. The model is built using Jupyter Notebook for training and evaluation, while a simple one-page website was developed using Pycharm to provide interface for users to input car details and get price predictions.
datapreprocessing datavisualization exploratory-data-analysis feature-engineering flask-application html-css-javascript linear-regression machine-learning-algorithms matplotlib numpy pandas python scikitlearn-machine-learning
Last synced: 21 Jul 2025
https://github.com/amira921/eda-and-visualization-netflix-movies-and-tv-shows
EDA for Netflix Movies and TV Shows
datacleaning datamanipulation datapreprocessing datavisualization exploratory-data-analysis insights matplotlib numpy pandas python
Last synced: 12 Jul 2025
https://github.com/yassin522/text-classification-and-sequence-labelling
Arabic Text Classification and Sequence Labeling
arabic-nlp cnn data-visualization dataanalysis datapreprocessing jupyter-notebook lstm python sequence-labeling text-classification
Last synced: 07 Apr 2025
https://github.com/trilokida/data-pre-processing
Data preprocessing is a data mining technique that is used to transform the raw data into a useful and efficient format.
datapreprocessing feature-scaling feature-selection feature-transformation features function-transformer labelencoder normalization onehot-encoding ordinal-encoding outlier-detection outlier-removal outliers power-transformers sklearn standardization
Last synced: 06 Mar 2025
https://github.com/zulfachafidz/titanic_explorer_predicting_survival_with_classification_using_knn_algorithm
Tracking Life Safety with the KNN Predictive Analysis Approach. Leveraging the Titanic Dataset, we apply classification analysis to predict the fate of passengers based on a variety of features.
algorithm algorithms data data-analysis data-mining data-science datamodeling datapreprocessing dataset knn-algorithm knn-classification machine-learning machine-learning-algorithms prediction-model
Last synced: 01 Sep 2025
https://github.com/nataliabeltranarg/traffic-accident-analysis
Analysis of accidents in the US completed as a final project for Computing for Data Science at BSE.
data-science dataanalysis datapreprocessing exploratory-data-analysis metrics random-forest-classifier unit-testing
Last synced: 01 Sep 2025
https://github.com/dineshkumarkotha/impact-of-alcohol-consumption-on-public-health
Impact of Alcohol Consumption on Public Health
analyzation data datapreprocessing datavisualization tableau
Last synced: 05 Jan 2026
https://github.com/boratechlife/tensorflow-questions-datasets
A Tensorflow questions Datasets to help you practice Machine learning and Train Models
data datapreprocessing datasets machinelearning modeltrain questions tensorflow
Last synced: 23 Mar 2025
https://github.com/mdalamin5/data-science-machine-learning-basics
This repository is a comprehensive guide to Machine Learning algorithms, Python OOP, data preprocessing, and visualization using Pandas, NumPy, Seaborn, Scikit-learn, and more. It includes hands-on Jupyter notebooks, modular Python scripts, and a structured ML pipeline for training and evaluating models. 🚀
data-visualization datapreprocessing machine-learning-algorithms object-oriented-programming
Last synced: 24 Mar 2025
https://github.com/chrispsang/customerchurnanalysis
Predicting customer churn using a RandomForestClassifier with detailed EDA, model evaluation, and visualization. Includes a Tableau dashboard for interactive insights.
customerchurn data-analysis data-visualization datapreprocessing machine-learning python scikit-learn tableau
Last synced: 10 Jun 2025
https://github.com/benzerinsio/seattleweather-powerbi
📊 Análise interativa de dados climáticos de Seattle com Power BI, usando dados preprocessados em SQLite para explorar temperaturas, precipitação e tipos de clima.
dashboard datapreprocessing datavisualization powerbi sql sqlite weatherdata
Last synced: 24 Mar 2025
https://github.com/gurpreet0022/crop-fertilizers-recommendation-system-using-ml-
This repository is a part of AICTE - Shell Internship on 'Green Skills using AI technologies' Cycle 3.
data datapreprocessing datavisualization jupyter-notebook machine-learning python
Last synced: 11 Jun 2025
https://github.com/codeofrahul/credit_card_financial_dashboard
Develop a comprehensive credit card weekly dashboard that provides real-time insights into key performance metrics and trends, enabling stakeholders to monitor and analyze credit card operations effectively.
creditcard database datacleaning datapreprocessing financial-analysis powerbidashboard problems-solving sql
Last synced: 25 Mar 2025
https://github.com/ddeepanshu-997/datascience-e-commerce-shopping-details-
in this project i am going to apply data preprocessing technique on the dataset in order to clean the data using libraries, etc. make some insights/analyses to findout the hotpicks of the shopping along with some data visualsation libraries to get the trends and many more aspects in order to make a small contribution to the field of data science
cleaning-data data data-science data-visualization dataframe datapreprocessing dataset libraries matplotlib-pyplot numpy pandas plots python visualization
Last synced: 28 Feb 2025
https://github.com/junaidsalim/machine_learning_a-z
This repository contains Python implementations of various machine learning models that I studied during the Machine Learning A-Z course.
association associative-learning data-science datapreprocessing jupyter-notebook machine-learning machinelearning machinelearning-python nlp python regression reinforcement-learning
Last synced: 04 Jul 2025
https://github.com/mhmmdrzkya2000/digitalskillfair38_data_science_2025
Titanic EDA - Explanatory Data Analysis Repository ini merupakan hasil pelatihan selama 1 minggu dari DigitalSkillFair38 Data Science yang saya ikuti bersama Dibimbing.id berfokus materi tentang proses Explanatory Data Analysis (EDA) terhadap datasheet Titanic
datacleaning datapreprocessing datavisualization explanatory-data-analysis python
Last synced: 23 Apr 2025
https://github.com/zcebeci/odetector
Outlier Detection Using Cluster Analysis
anomaly-detection cluster-analysis clustering clustering-methods data datapreparation datapreprocessing exception-handling fcm fraud-detection fuzzy-clustering novelty-detection outlier-detection outlier-removal outliers partitioning pcm r surprise-exploration
Last synced: 29 Oct 2025
https://github.com/mannasoumya/imputerapi
Data Imputer API in Python
api data-cleaning data-science datapreprocessing dataprocessing imputer machine-learning machine-learning-algorithms matrix python3
Last synced: 25 Mar 2025
https://github.com/ysayaovong/titanic-dataset-analysis
This project involves the analysis of the Titanic dataset to uncover key insights into the survival patterns based on various demographic and class attributes. The analysis leverages Python and its data visualization libraries to explore relationships and draw meaningful conclusions from the data.
dataanalysis datapreprocessing exploratorydataanalysis machinelearning pandas python scikitlearn seaborn titanicdataset
Last synced: 25 Mar 2025
https://github.com/analyticalnahid/data-preprocessing
Analyze your data by applying pre-processing techniques
dataanalysis datapreprocessing dataprocessing
Last synced: 05 Sep 2025
https://github.com/lourduradjou/employee-salary-predictor
A Linear Regression ML Model which gives employee salary based on their years of experience
datapreprocessing linear-regression machine-learning non-linear-regression regularized-linear-regression
Last synced: 13 Jul 2025
https://github.com/sameer6690/sexism_detection
This project focuses on the classification of sexist and non-sexist language using three machine learning models. The models are Logistic Regression, Support Vector Machine (SVM) and Neural Network which were used after performing preprocessing and feature extraction of the dataset.
datapreprocessing machine-learning-algorithms python
Last synced: 01 Mar 2025
https://github.com/mohamedezzeldeenhassanmohamed/data-mining-project
Data minnig GUI project to predict laptop prices,I uses most of ML algorithmes here
data data-mining-assignments datamining-algorithms datapreprocessing decision-trees entropy gini k-means-clustering knn-classification laptop-dataset laptop-price-prediction linear-regression logistic-regression ml mlalgotithms naive-bayes-classifier pca python svm-classifier visualization
Last synced: 25 Mar 2025
https://github.com/shyamkumarnagilla/cat-and-dog-image-classifier
It is an image classification model to distinguish between images of cats and dogs using data science techniques in Python.
datapreprocessing evaluation modelconstruction prediction training
Last synced: 18 Mar 2025
https://github.com/adithivs/prodigyy_ds_03
data data-visualization datapreprocessing decision-tree-classifier
Last synced: 07 Oct 2025
https://github.com/safwan2003/randomforest_heart_disease_prediction
A machine learning project using Random Forest Classifier to predict heart disease. Includes data preprocessing (with binning), feature selection, and model evaluation.
binning data data-science datapipeline datapreprocessing datavisaulization deep-learning machine-learning python random-forest-classifier streamlit
Last synced: 08 Oct 2025
https://github.com/djdhairya/whatsapp-chat-analysis
WhatsApp chat analysis is a multidimensional process that delves into the content, structure, and dynamics of conversations within the platform. It provides valuable insights for personal reflection, organizational decision-making, and improving communication strategies.
data data-science dataanalytics datapreprocessing machine-learning ml
Last synced: 08 Oct 2025
https://github.com/kevinxaviour/laliga-prediction
A end to end Laliga goals scored and goals conceded prediction using regression model.
datapreprocessing datascience elasticbeanstalk fastapi machine-learning rds-mysql selenium tableau webscra
Last synced: 10 Oct 2025
https://github.com/batthulavinay/house-price-prediction
This project aims to analyze and predict house prices based on various features such as location, size, and amenities. The dataset is processed and explored using Python, and machine learning models are applied to generate accurate price predictions.
datacleaning datapreprocessing exploratory-data-analysis feature-engineering linear-regression modelevaluation performance-metrics random-forest xgboost
Last synced: 14 Oct 2025
https://github.com/bibek36/dialogue-summarization-with-generative-ai
Welcome to the Dialogue Summarization with Generative AI project! In this project, your main goal is to perform dialogue summarization using cutting-edge language models and investigate how different input techniques impact the quality of generated summaries.
datapreprocessing genai jupyter-notebook large-language-models machine-learning prompt-engineering python
Last synced: 18 Oct 2025
https://github.com/islam-hady9/python-scripts-collection
Welcome to the Python Scripts Collection repository! This repository contains a wide range of Python scripts, each designed to perform a specific function. Whether you are automating tasks, processing data, or exploring various Python functionalities, these scripts are here to assist you. Feel free to browse, use, and contribute to the collection.
automation datapreprocessing programming python pythonscripts scripting task-automation utilities
Last synced: 21 Feb 2025
https://github.com/zahit2121/machine-learning-templates-for-data-cleaning-data-visualization-data-preprocessing-model-training
This repository offers templates for machine learning tasks, focusing on data cleaning, visualization, preprocessing, and model training. Each template provides clear steps and code snippets to streamline your workflow and improve project efficiency.
accuracy-score data-visualization datacleaning datapreprocessing modeltraining python sckiit-learn supervised-learning unsupervised-learning
Last synced: 30 Apr 2025
https://github.com/burhanahmed1/cryptosynth
Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis
datafusion datapreprocessing eda explainable-ai featureengineering machinelearning multimodal-deep-learning pca predictive-modeling
Last synced: 28 Oct 2025
https://github.com/jigyasag18/multiple-disease-detection-app
This repository contains the implementation of a Multiple Disease Detection System, which employs advanced machine learning techniques for early detection and prediction of prevalent diseases, including diabetes, heart disease, and Parkinson's disease. The system utilizes a variety of patient health metrics such as demographics and medical history.
data datapreprocessing machine-learning machine-learning-algorithms machinelearningmodel prediction python streamlit streamlit-webapp
Last synced: 03 Mar 2025
https://github.com/swethajoseph/crime-pattern-analysis-project
Analysis and visualization of open-source police data from two areas, Leicestershire Street and Northumbria Street to derive data-driven insights
apachespark datamanipulation datapreprocessing datavisualization exploratory-data-analysis jupyter-notebook pyspark python sql-query
Last synced: 09 Jul 2025
https://github.com/ayushai/machine-learning
A comprehensive collection of my practice exercises and handwritten notes designed to help you prepare for a Machine Learning Engineer role. This repository covers essential ML concepts, algorithms, and techniques, providing both theoretical insights and practical coding examples.
datapreprocessing datawrangling feature-engineering machine-learning-algorithms
Last synced: 27 Mar 2025
https://github.com/ayushai/nextera-supplies-a-business-analytical-case-study
This project demonstrates an end-to-end data analytics workflow for NextEra Supplies, combining MySQL for database management, Power BI for dynamic visualization, and Python for data exploration preprocessing. Advanced deep learning techniques (LSTM) are used for accurate sales forecasting, providing actionable insights to drive strategic decision.
data-visualization datapreprocessing deeplearning mysql-database powerbi python3 timeseries-forecasting
Last synced: 27 Mar 2025
https://github.com/praveendecode/retail-revenue-forecasting
Designed an end-to-end ML model pipeline, forecasting department-wide sales by accounting for holiday markdown effects, spanning data collection to inferencing.
azure collection data datapreprocessing docker exploratory-data-analysis feature-engineering featureimportance model modelbuilding modeldeployment modelselction python report tableau
Last synced: 04 Apr 2025
https://github.com/chaitanyak77/churn_prediction_ann
This repository contains a comprehensive Python script for building and training an Artificial Neural Network (ANN) model to predict customer churn using customer data. The code includes extensive data preprocessing steps, such as scaling, one-hot encoding, and visualization, to ensure the model's accuracy and effectiveness.
artificial-neural-network datapreprocessing deep-learning-algorithms visualizations
Last synced: 10 Jul 2025
https://github.com/sanjay-ar/payment_default_prediction
Payment Default Prediction System is a machine learning pipeline that forecasts the likelihood of payment defaults using historical transaction data and client profiles. Designed to support risk assessment in finance, the system uses classification models to flag high-risk clients for early intervention.
crossvalidation datapreprocessing feature-engineering logistic-regression supervised-machine-learning
Last synced: 19 Jun 2025
https://github.com/shriyaak/machinelearning.studyjournal.1
This repository contains my study and practice of key machine learning concepts, including:
association-rule-learning classification-algorithm clustering-algorithm datapreprocessing regression-models
Last synced: 21 Jun 2025
https://github.com/redayzarra/machine-learning
These are my notes, lessons, models, and code for topics on machine learning. Topics include everything from data pre-processing to logistic regression intuition, and more!
data-preprocessing data-science datapreprocessing linear-models linear-regression linear-regression-models linear-regression-python machine-learning machine-learning-algorithms simple-linear-regression
Last synced: 10 Sep 2025
https://github.com/yadavkaushal/datascience-e-commerce-shopping-details
This project analyzes customer purchase data including details such as location, company, credit card usage, browser info, job roles and purchase price. It explores patterns in payment methods, spending behavior and online transactions. Using Pandas, Matplotlib and Seaborn, we clean analyze and visualize key trends to derive actionable insights.
data datacleaning dataframe datapreprocessing dataset libraries matplotlib numpy pandas plots visulaization
Last synced: 24 Dec 2025
https://github.com/batthulavinay/basic-linear-regression
This project demonstrates Basic Linear Regression using Python. The notebook includes dataset loading, exploratory data analysis, model training, evaluation, and visualization of results.
data-visualization datapreprocessing exploratory-data-analysis linear-regression matplotlib modelevaluation pandas-library
Last synced: 13 Apr 2025
https://github.com/batthulavinay/genz_datingapp-eda-and-ml
This project focuses on analyzing data from a GenZ Dating App to uncover insights, trends, and predictive models. The analysis is conducted using Python in a Jupyter Notebook environment.
classification datacleaning datapreprocessing datavisualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas regression-models scikit-learn seaborn
Last synced: 13 Apr 2025
https://github.com/batthulavinay/ev-population
This repository contains a Jupyter Notebook focused on analyzing Electric Vehicle (EV) population data. The notebook includes data visualizations, exploratory analysis, and key insights.
data-science datacleaning datapreprocessing datavisualization jupyter-notebook matplotlib numpy pandas seaborn
Last synced: 13 Apr 2025
https://github.com/batthulavinay/divorce-status-prediction-eda-and-ml
This project focuses on analyzing data related to divorce status to uncover insights, trends, and predictive models. The analysis is conducted using Python in a Jupyter Notebook environment.
datacleaning datapreprocessing exploratory-data-analysis knn-regression logistic-regression predictive-analytics random-forest xgboost-regression
Last synced: 13 Apr 2025
https://github.com/djdhairya/parkinson-s-disease-detection
datapreprocessing machine-learning modeling numpy pandas scikit-learn svm
Last synced: 24 Feb 2025
https://github.com/raghavendranhp/covid19_vaccine_sentiment_analysis
Explore sentiments in COVID-19 vaccine tweets using NLP. Analyze trends, visualize opinions, and uncover public perceptions.
cumulativedensity datapreprocessing eda kerneldensityestimation nlp-machine-learning nltk sentiment-analysis vader-sentiment-analysis wordcloud
Last synced: 24 Feb 2025
https://github.com/farzeennimran/ai_recipe_generator
A simple AI recipe generator using ML and DL models 🍔🍨🍷
artificial-intelligence data-science datapreprocessing deep-learning feature-selection generator keras machine-learning natural-language-processing nlp-machine-learning python recipe-generator recipes sklearn tensorflow text-generation webscraping-beautifulsoup webscraping-selenium
Last synced: 13 Sep 2025
https://github.com/shimul-zahan/all-practices-tukitaki
This is repository for all the practice tasks or learning new things. Cause environment are setup and no need to setup a new project or environments.
data data-science datapreprocessing deep-learning machine-learning neural-network practice python visualization
Last synced: 12 Jan 2026
https://github.com/sarahloree/project-3--credit-card-user-churn-prediction
This is the third project I completed as part of the Advanced Machine Learning module from my post-graduate certification in AI/ Machine Learning from University of Texas' McCombs School of Business.
bagging bagging-classifier boosting boosting-classifier cross-validation datapreprocessing eda exploratory-data-analysis hyperparameter-optimization hyperparameter-tuning random-forest random-forest-classifier sampling smote
Last synced: 24 Feb 2025
https://github.com/iv4n-ga6l/functional-dataprocessing-pipeline
A functional data processing pipeline that accepts an input file, allows specifying both input and output formats, applies specified transformations, and produces a resulting output file.
csv data datapreprocessing excel json pandas parquet pipeline python
Last synced: 24 Dec 2025
https://github.com/compcode1/depression-screening-ml
The goal of this project was to build a machine learning model capable of accurately predicting depression in a population where the incidence rate is 15%.
auc datapreprocessing f1-score hyperparameter-tuning threshold xgboost-classifier
Last synced: 18 Mar 2025
https://github.com/bala-1409/milk-production-time-series-forecasting-datascience-project
This project uses time series forecasting to predict future milk production. The data used in this project is monthly milk production data from January 1962 to December 1975. The ARIMA (autoregressive integrated moving average) model is used to forecast the milk production. The model is evaluated using various metric.
acf adf arima-model data-analysis data-science data-visualization datapreprocessing eda exploratory-data-analysis forecasting machine-learning-algorithms pacf python python3 sarimax-model seasonality seasonality-analysis time-series time-series-forecasting trends
Last synced: 22 Mar 2025
https://github.com/priyanshu7639/data_visualization_dashboard
An Interactive data visualization tool that combines traditional plotting capabilities with modern AI assistance. It allows users to create and modify visualizations through natural language commands, making data exploration accessible to users of all skill levels.
business-analytics data-analysis data-engineering data-exploration data-science data-visualization datapreprocessing datascience interactive-visualizations matplotlib plotly plotting python research-tool streamlit
Last synced: 24 Feb 2025
https://github.com/muthukumar0908/airbnb-analysis
In This project aims to analyze Airbnb data, perform data cleaning and preparation, develop interactive geospatial visualizations and suing for MySql , availability patterns, and location-based trends with help of Streamlit app.
datapreprocessing eda mongodb pythonscripting streamlit visualization
Last synced: 30 Mar 2025