An open API service indexing awesome lists of open source software.

Projects in Awesome Lists tagged with datapreprocessing

A curated list of projects in awesome lists tagged with datapreprocessing .

https://github.com/IngestAI/embedditor

⚡ GUI for editing LLM vector embeddings. No more blind chunking. Upload content in any file extension, join and split chunks, edit metadata and embedding tokens + remove stop-words and punctuation with one click, add images, and download in .veml to share it with your team.

datapreprocessing datascience embedding-vectors embeddings genai laravel llm markup-language ml nlp nltk php vector-database vector-search vectorization veml

Last synced: 28 Mar 2025

https://github.com/akarazniewicz/cocosplit

Simple tool to split COCO annotations into train/test datasets.

coco datapreprocessing deeplearning

Last synced: 25 Jun 2025

https://github.com/nafisalawalidris/fraud-detection-with-supervised-learning

This repository contains a basic fraud detection system utilising supervised learning techniques to identify potentially fraudulent credit card transactions. The project establishes a baseline model that addresses the challenges of credit card fraud in financial institutions.

datacollection datapreprocessing fastapi finance frauddetection machine-learning modeltraining python random-forest-classifier scikitlearn-machine-learning supervisedlearning

Last synced: 14 Jul 2025

https://github.com/omarsar/data_mining_lab

Material for Data Mining Lab Session (Fall Semester @ NTHU)

data datamining datapreprocessing datavisualization

Last synced: 14 Jul 2025

https://github.com/m-karthik-kumar/personalized-dietary-guidance-with-gen-ai

The algorithm utilizes Generative AI and Natural Language Processing (NLP) to analyze the nutritional content of packaged food products. The system considers personalized health conditions, such as allergies and dietary needs, to provide tailored recommendations, helping individuals make safer and more informed food choices.

datapreprocessing flask gemini-api generative-ai natural-language-processing nltk python

Last synced: 11 Apr 2025

https://github.com/mohammedsaim-quadri/intrusion_detection-system

This project is an Intrusion Detection System (IDS) using machine learning (ML) and deep learning (DL) to detect network intrusions. It leverages the CICIDS2018 dataset to classify traffic as normal or malicious. Key features include data preprocessing, model training, hyperparameter tuning, and Docker containerization for scalable deployment.

bayesian-optimization cicids2018 cybersecurity datapreprocessing deep-learning docker hyperparameter-tuning intrusion-detection machinelearning neural-networks

Last synced: 09 Jul 2025

https://github.com/pavankethavath/car_dekho_car_price_prediction

A Streamlit web app utilizing Python, scikit-learn, and pandas for used car price prediction. Features data preprocessing (scaling, encoding), Random Forest model optimization with GridSearchCV, and interactive user input handling. Achieves high accuracy (R² score: 0.9028), showcasing skills in machine learning, data engineering, and deployment.

dataanalysis datacleaning datapreprocessing eda encoding feature-extraction feature-selection featureimportance fine-tuning machine-learning minmaxscaling normalization pandas pickle prediction-model python random-forest randomsearch-cv regression streamlit

Last synced: 23 Apr 2025

https://github.com/faizanmohd5/web-scraping-iphone-11-reviews

This is a web scraping project that extracts customer reviews for the iPhone 11 from Flipkart.com using Python and BeautifulSoup. The extracted data is saved in a CSV file for further analysis. Use it as a starting point for your own web scraping projects or for analyzing customer reviews of the iPhone 11.

beautifulsoup csv data-visualization dataanalysis dataextraction datainsights datamining datapreprocessing ecommerce-website ipython-notebook jupyter-notebook python reviews reviewscrapper webscraping

Last synced: 13 Jun 2025

https://github.com/hk151109/house-price-prediction_using-linear-regression

This repository contains the implementation of a Linear Regression model to predict house prices based on features such as square footage, number of bedrooms, and bathrooms.

datapreprocessing house-price-prediction linear-regression machine-learning

Last synced: 07 Oct 2025

https://github.com/addytrunks/machine-learning

A comprehensive repository documenting key machine learning algorithms, implementation details, and practical examples.

classfication clustering cnn datapreprocessing deep-learning ml neural-network object-detection python regression yolo

Last synced: 11 Aug 2025

https://github.com/sayanmondal2098/easytoken

Tokenizer is an independent Open Source, Natural Language Processing python library which implements a tokenizer to create token from Both Sentence and Paragraph.

data-science datapreprocessing dataprocessing natural-language natural-language-processing nlp nlp-library nlp-machine-learning python-library python3 text-processing text-summarization token tokenizer

Last synced: 14 Dec 2025

https://github.com/mdalamin5/capstone-project-adaptive-tutoring-system-ai-based-all-experimental-resources

This project is an AI-powered algebra tutor using the Phi-3 Mini model. It provides personalized learning through interactive chat, adapting to the student's level and offering detailed step-by-step solutions. Built with Streamlit for an engaging educational experience.

datapreprocessing graph-database groq-api langgraph llama-index llm-finetuning multi-agent-systems openai-models phi-2 retrieval-augmented-generation streamlit vector-database

Last synced: 23 Aug 2025

https://github.com/bala-1409/foreign-exchange-rate-time-series-data-science-project

This project will use time series analysis to forecast the exchange rate between the euro and the US dollar. The project will use a variety of statistical techniques, such as ARIMA to model the data and forecast the exchange rate.

data-analysis data-science data-visualization datapreprocessing eda exploratory-data-analysis forecasting machine-learning-algorithms model modelfitting predictive-modeling python3 scikit-learn statsmodels time-series time-series-analysis

Last synced: 20 Jul 2025

https://github.com/divithraju/divith-aju-hadoop-pyspark-pipeline

This project demonstrates the creation of a scalable data processing pipeline for handling and analyzing log data from a hypothetical e-commerce platform. Leveraging Hadoop and PySpark, the pipeline is designed to process large volumes of log files, providing meaningful insights into user behavior, system performance, and sales metrics.

apache-hadoop-framework apache-spark bigdata client data database dataengineering dataingestionframework datapreprocessing documentation ecommerce-platform hdfs pipeline project project-repository pyspark python3 software-engineering

Last synced: 06 Mar 2025

https://github.com/redayzarra/nlp_yelpreviews

This project covers the topic of natural language processing or NLP to classify user-generated text and determine their intent. The goal of this project is to build a model that can classify 10,000 Yelp reviews into either one-star or 5-star reviews. This project showcases a step-by-step implementation of the model as well as in-depth notes.

datapreprocessing dataprocessing machine-learning multinomial-naive-bayes naive-bayes naive-bayes-algorithm naive-bayes-classifier natural-language-processing nlp sentiment-analysis text-classification

Last synced: 22 Aug 2025

https://github.com/chandkund/spam-email-detection

This project focuses on detecting spam emails using a fine-tuned DistilBERT model, a lighter version of the BERT model. The model is trained to classify email text into two categories: spam (1) and not spam (0). The dataset consists of email texts labeled as either spam or non-spam.

data-visualization datapreprocessing matplotlib pandas python pytorch sklearn transformer

Last synced: 20 Jan 2026

https://github.com/batthulavinay/predictive-maintenance-for-industrial-equipment

This project focuses on Predictive Maintenance for industrial equipment using machine learning. The goal is to predict potential machine failures before they occur, enabling proactive maintenance and reducing downtime.

accuracy data-visualization datapreprocessing exploratory-data-analysis feature-engineering jupyter-notebook machine-learning-algorithms matplotlib modeldevelopment modelevaluation numpy pandas python recall scikit-learn seaborn

Last synced: 30 Dec 2025

https://github.com/MDalamin5/Capstone-Project-Adaptive-Tutoring-System-AI-Based-All-Experimental-Resources

This project is an AI-powered algebra tutor using the Phi-3 Mini model. It provides personalized learning through interactive chat, adapting to the student's level and offering detailed step-by-step solutions. Built with Streamlit for an engaging educational experience.

datapreprocessing graph-database groq-api langgraph llama-index llm-finetuning multi-agent-systems openai-models phi-2 retrieval-augmented-generation streamlit vector-database

Last synced: 14 Sep 2025

https://github.com/varun-khorgade/churnshield-customer-retention-predictor

Built an ML-based classification model to predict customer churn. Applied data preprocessing, feature engineering, and ensemble algorithms to improve prediction accuracy and help businesses implement retention strategies.

classification-algorithm datapreprocessing f1-score feature-engineering hyperparameter-tuning logistic-regression matplotlib model-evaluation numpy pandas python ran roc-auc scikit-learn seaborn xgboost

Last synced: 05 Oct 2025

https://github.com/yash-rewalia/stock-closing-price-prediction-using-regression

The ultimate business objective is to leverage the regression model to provide accurate predictions of the closing price of AMRN stock, enabling stakeholders to make well-informed investment decisions, manage risks effectively, optimize portfolios, Early warning systems to alert any fraud cases and align investment strategies with financial goals.

datapreprocessing eda hypothesis-testing machine-learning numpy pandas python random-forest regression regression-analysis statistics

Last synced: 20 Oct 2025

https://github.com/mdalamin5/machine-learning-2.0

Machine-Learning-2.0: A comprehensive repository documenting my journey to master ML from scratch. It includes core algorithms, advanced techniques, data preprocessing, feature engineering, and real-world projects. Follow my structured approach, inspired by "100 Days of ML," featuring Python implementations, tools, and insightful resources.

data-fetching-from-api datapreprocessing end-to-end-project feature-engineering gradient-descent-optimizers machine-learning-algorithms scikit-learn webscraping-data

Last synced: 25 Feb 2025

https://github.com/engrzulqarnain/scrapysub

ScrapySub is a Python library designed to recursively scrape website content, including subpages. It fetches the visible text from web pages and stores it in a structured format for easy access and analysis. This library is particularly useful for NLP and AI developers who need to gather large amounts of web content for their projects.

crawling datapreparation datapreprocessing python python-package scraper scraping-websites urllib3

Last synced: 30 Mar 2025

https://github.com/michael-insights/portfolio

This repository showcases my projects and skills in Data Analytics, Data Science, and Machine Learning. It includes hands-on work in data analysis, predictive modeling, and machine learning algorithms, aimed at solving real-world problems.

data-analytics data-science data-visualization datapreprocessing jupyter-notebooks machine-learning matplotlib numpy pandas predictive-modeling python scikit-learn sql

Last synced: 30 Dec 2025

https://github.com/ayushai/machine-learning-notes

Machine Learning Notes - A comprehensive repository featuring my handwritten notes and code files on machine learning. Explore topics like supervised and unsupervised learning, deep learning, and model evaluation. Perfect for students, professionals, and enthusiasts looking to deepen their understanding.

datapreprocessing datawrangling linear-regression logistic-regression machine-learning-algorithms

Last synced: 27 Mar 2025

https://github.com/kawai-senpai/ultraclean

UltraClean is a fast and efficient Python library for cleaning and preprocessing text data for AI/ML tasks and data processing.

aiml cleaner data-science datapreprocessing dataset spamdetection

Last synced: 06 May 2025

https://github.com/mdalamin5/cse499-project-adaptive-tutoring-system-ai-based

This project is an AI-powered algebra tutor using the Phi-3 Mini model. It provides personalized learning through interactive chat, adapting to the student's level and offering detailed step-by-step solutions. Built with Streamlit for an engaging educational experience.

datapreprocessing graph-database groq-api langgraph llama-index llm-finetuning multi-agent-systems openai-models phi-2 retrieval-augmented-generation streamlit vector-database

Last synced: 26 Mar 2025

https://github.com/lijesh010/ml_project_car_price_prediction_using_linearregression

This repository presents a data-driven exploration into predicting car prices using a machine learning model based on linear regression, aimed at aiding a Chinese automobile company's entry into the competitive US market.

car-price-prediction-with-machine-learning data-science datapreprocessing jupyter-notebook linear-regression machine-learning-algorithms python sklearn-library

Last synced: 02 Aug 2025

https://github.com/monzerdev/fake-news-detection

A robust fake news detection system leveraging machine learning models (SVM and Random Forest) to identify political misinformation. Includes preprocessing, training, and evaluation scripts with datasets available for download.

datapreprocessing fakenewsdetection linearsvc machinelearning naturalanguageprocessing nlp python randomforest svm textclassification

Last synced: 21 Jan 2026

https://github.com/batthulavinay/phone-usage-analysis

This project analyzes phone usage patterns in India and predicts the primary use of mobile devices based on various features. The notebook covers data preprocessing, exploratory data analysis (EDA), and model training using multiple classification algorithms.

accuracy-score adaboost classification-report confusion-matrix datapreprocessing datavisualization exploratory-data-analysis gradient-boosting imbalanced-data jupyter-notebook lightgbm logistic-regression matplotlib numpy pandas random-forest scikit-learn seaborn supportvectormachines xgboost

Last synced: 09 Nov 2025

https://github.com/arjunraj79/custom_nids_with_ml

A simple Python script to check the strength of a password based on length, the inclusion of numbers, special characters, and upper/lower case letters.

datapreprocessing dos-attack feature-extraction intrusion-detection malware-analysis malware-detection ml-engineering modeltraining networktrafficanalysis portscanning realtime-detection virtualbox

Last synced: 14 Mar 2025

https://github.com/shoaib-akther-asif/country-wise-quality-of-life-overview

Data scraping with Selenium & visualizing the results through interactive dashboards in Tableau Public.

datanalytics datapreprocessing datascraping datavisualization python selenium-webdriver tableau-public

Last synced: 26 Feb 2025

https://github.com/munawar-code/car_price_predictor

This project is a ML-based car price prediction system. The model is built using Jupyter Notebook for training and evaluation, while a simple one-page website was developed using Pycharm to provide interface for users to input car details and get price predictions.

datapreprocessing datavisualization exploratory-data-analysis feature-engineering flask-application html-css-javascript linear-regression machine-learning-algorithms matplotlib numpy pandas python scikitlearn-machine-learning

Last synced: 21 Jul 2025

https://github.com/zulfachafidz/titanic_explorer_predicting_survival_with_classification_using_knn_algorithm

Tracking Life Safety with the KNN Predictive Analysis Approach. Leveraging the Titanic Dataset, we apply classification analysis to predict the fate of passengers based on a variety of features.

algorithm algorithms data data-analysis data-mining data-science datamodeling datapreprocessing dataset knn-algorithm knn-classification machine-learning machine-learning-algorithms prediction-model

Last synced: 01 Sep 2025

https://github.com/nataliabeltranarg/traffic-accident-analysis

Analysis of accidents in the US completed as a final project for Computing for Data Science at BSE.

data-science dataanalysis datapreprocessing exploratory-data-analysis metrics random-forest-classifier unit-testing

Last synced: 01 Sep 2025

https://github.com/boratechlife/tensorflow-questions-datasets

A Tensorflow questions Datasets to help you practice Machine learning and Train Models

data datapreprocessing datasets machinelearning modeltrain questions tensorflow

Last synced: 23 Mar 2025

https://github.com/mdalamin5/data-science-machine-learning-basics

This repository is a comprehensive guide to Machine Learning algorithms, Python OOP, data preprocessing, and visualization using Pandas, NumPy, Seaborn, Scikit-learn, and more. It includes hands-on Jupyter notebooks, modular Python scripts, and a structured ML pipeline for training and evaluating models. 🚀

data-visualization datapreprocessing machine-learning-algorithms object-oriented-programming

Last synced: 24 Mar 2025

https://github.com/chrispsang/customerchurnanalysis

Predicting customer churn using a RandomForestClassifier with detailed EDA, model evaluation, and visualization. Includes a Tableau dashboard for interactive insights.

customerchurn data-analysis data-visualization datapreprocessing machine-learning python scikit-learn tableau

Last synced: 10 Jun 2025

https://github.com/benzerinsio/seattleweather-powerbi

📊 Análise interativa de dados climáticos de Seattle com Power BI, usando dados preprocessados em SQLite para explorar temperaturas, precipitação e tipos de clima.

dashboard datapreprocessing datavisualization powerbi sql sqlite weatherdata

Last synced: 24 Mar 2025

https://github.com/gurpreet0022/crop-fertilizers-recommendation-system-using-ml-

This repository is a part of AICTE - Shell Internship on 'Green Skills using AI technologies' Cycle 3.

data datapreprocessing datavisualization jupyter-notebook machine-learning python

Last synced: 11 Jun 2025

https://github.com/codeofrahul/credit_card_financial_dashboard

Develop a comprehensive credit card weekly dashboard that provides real-time insights into key performance metrics and trends, enabling stakeholders to monitor and analyze credit card operations effectively.

creditcard database datacleaning datapreprocessing financial-analysis powerbidashboard problems-solving sql

Last synced: 25 Mar 2025

https://github.com/ddeepanshu-997/datascience-e-commerce-shopping-details-

in this project i am going to apply data preprocessing technique on the dataset in order to clean the data using libraries, etc. make some insights/analyses to findout the hotpicks of the shopping along with some data visualsation libraries to get the trends and many more aspects in order to make a small contribution to the field of data science

cleaning-data data data-science data-visualization dataframe datapreprocessing dataset libraries matplotlib-pyplot numpy pandas plots python visualization

Last synced: 28 Feb 2025

https://github.com/junaidsalim/machine_learning_a-z

This repository contains Python implementations of various machine learning models that I studied during the Machine Learning A-Z course.

association associative-learning data-science datapreprocessing jupyter-notebook machine-learning machinelearning machinelearning-python nlp python regression reinforcement-learning

Last synced: 04 Jul 2025

https://github.com/mhmmdrzkya2000/digitalskillfair38_data_science_2025

Titanic EDA - Explanatory Data Analysis Repository ini merupakan hasil pelatihan selama 1 minggu dari DigitalSkillFair38 Data Science yang saya ikuti bersama Dibimbing.id berfokus materi tentang proses Explanatory Data Analysis (EDA) terhadap datasheet Titanic

datacleaning datapreprocessing datavisualization explanatory-data-analysis python

Last synced: 23 Apr 2025

https://github.com/ysayaovong/titanic-dataset-analysis

This project involves the analysis of the Titanic dataset to uncover key insights into the survival patterns based on various demographic and class attributes. The analysis leverages Python and its data visualization libraries to explore relationships and draw meaningful conclusions from the data.

dataanalysis datapreprocessing exploratorydataanalysis machinelearning pandas python scikitlearn seaborn titanicdataset

Last synced: 25 Mar 2025

https://github.com/analyticalnahid/data-preprocessing

Analyze your data by applying pre-processing techniques

dataanalysis datapreprocessing dataprocessing

Last synced: 05 Sep 2025

https://github.com/lourduradjou/employee-salary-predictor

A Linear Regression ML Model which gives employee salary based on their years of experience

datapreprocessing linear-regression machine-learning non-linear-regression regularized-linear-regression

Last synced: 13 Jul 2025

https://github.com/sameer6690/sexism_detection

This project focuses on the classification of sexist and non-sexist language using three machine learning models. The models are Logistic Regression, Support Vector Machine (SVM) and Neural Network which were used after performing preprocessing and feature extraction of the dataset.

datapreprocessing machine-learning-algorithms python

Last synced: 01 Mar 2025

https://github.com/shyamkumarnagilla/cat-and-dog-image-classifier

It is an image classification model to distinguish between images of cats and dogs using data science techniques in Python.

datapreprocessing evaluation modelconstruction prediction training

Last synced: 18 Mar 2025

https://github.com/safwan2003/randomforest_heart_disease_prediction

A machine learning project using Random Forest Classifier to predict heart disease. Includes data preprocessing (with binning), feature selection, and model evaluation.

binning data data-science datapipeline datapreprocessing datavisaulization deep-learning machine-learning python random-forest-classifier streamlit

Last synced: 08 Oct 2025

https://github.com/djdhairya/whatsapp-chat-analysis

WhatsApp chat analysis is a multidimensional process that delves into the content, structure, and dynamics of conversations within the platform. It provides valuable insights for personal reflection, organizational decision-making, and improving communication strategies.

data data-science dataanalytics datapreprocessing machine-learning ml

Last synced: 08 Oct 2025

https://github.com/kevinxaviour/laliga-prediction

A end to end Laliga goals scored and goals conceded prediction using regression model.

datapreprocessing datascience elasticbeanstalk fastapi machine-learning rds-mysql selenium tableau webscra

Last synced: 10 Oct 2025

https://github.com/batthulavinay/house-price-prediction

This project aims to analyze and predict house prices based on various features such as location, size, and amenities. The dataset is processed and explored using Python, and machine learning models are applied to generate accurate price predictions.

datacleaning datapreprocessing exploratory-data-analysis feature-engineering linear-regression modelevaluation performance-metrics random-forest xgboost

Last synced: 14 Oct 2025

https://github.com/bibek36/dialogue-summarization-with-generative-ai

Welcome to the Dialogue Summarization with Generative AI project! In this project, your main goal is to perform dialogue summarization using cutting-edge language models and investigate how different input techniques impact the quality of generated summaries.

datapreprocessing genai jupyter-notebook large-language-models machine-learning prompt-engineering python

Last synced: 18 Oct 2025

https://github.com/islam-hady9/python-scripts-collection

Welcome to the Python Scripts Collection repository! This repository contains a wide range of Python scripts, each designed to perform a specific function. Whether you are automating tasks, processing data, or exploring various Python functionalities, these scripts are here to assist you. Feel free to browse, use, and contribute to the collection.

automation datapreprocessing programming python pythonscripts scripting task-automation utilities

Last synced: 21 Feb 2025

https://github.com/zahit2121/machine-learning-templates-for-data-cleaning-data-visualization-data-preprocessing-model-training

This repository offers templates for machine learning tasks, focusing on data cleaning, visualization, preprocessing, and model training. Each template provides clear steps and code snippets to streamline your workflow and improve project efficiency.

accuracy-score data-visualization datacleaning datapreprocessing modeltraining python sckiit-learn supervised-learning unsupervised-learning

Last synced: 30 Apr 2025

https://github.com/burhanahmed1/cryptosynth

Bitcoin Sentiment Forecast is a Multimodal approach to Bitcoin price forecasting using NLP and Time Series Analysis

datafusion datapreprocessing eda explainable-ai featureengineering machinelearning multimodal-deep-learning pca predictive-modeling

Last synced: 28 Oct 2025

https://github.com/jigyasag18/multiple-disease-detection-app

This repository contains the implementation of a Multiple Disease Detection System, which employs advanced machine learning techniques for early detection and prediction of prevalent diseases, including diabetes, heart disease, and Parkinson's disease. The system utilizes a variety of patient health metrics such as demographics and medical history.

data datapreprocessing machine-learning machine-learning-algorithms machinelearningmodel prediction python streamlit streamlit-webapp

Last synced: 03 Mar 2025

https://github.com/swethajoseph/crime-pattern-analysis-project

Analysis and visualization of open-source police data from two areas, Leicestershire Street and Northumbria Street to derive data-driven insights

apachespark datamanipulation datapreprocessing datavisualization exploratory-data-analysis jupyter-notebook pyspark python sql-query

Last synced: 09 Jul 2025

https://github.com/ayushai/machine-learning

A comprehensive collection of my practice exercises and handwritten notes designed to help you prepare for a Machine Learning Engineer role. This repository covers essential ML concepts, algorithms, and techniques, providing both theoretical insights and practical coding examples.

datapreprocessing datawrangling feature-engineering machine-learning-algorithms

Last synced: 27 Mar 2025

https://github.com/ayushai/nextera-supplies-a-business-analytical-case-study

This project demonstrates an end-to-end data analytics workflow for NextEra Supplies, combining MySQL for database management, Power BI for dynamic visualization, and Python for data exploration preprocessing. Advanced deep learning techniques (LSTM) are used for accurate sales forecasting, providing actionable insights to drive strategic decision.

data-visualization datapreprocessing deeplearning mysql-database powerbi python3 timeseries-forecasting

Last synced: 27 Mar 2025

https://github.com/praveendecode/retail-revenue-forecasting

Designed an end-to-end ML model pipeline, forecasting department-wide sales by accounting for holiday markdown effects, spanning data collection to inferencing.

azure collection data datapreprocessing docker exploratory-data-analysis feature-engineering featureimportance model modelbuilding modeldeployment modelselction python report tableau

Last synced: 04 Apr 2025

https://github.com/chaitanyak77/churn_prediction_ann

This repository contains a comprehensive Python script for building and training an Artificial Neural Network (ANN) model to predict customer churn using customer data. The code includes extensive data preprocessing steps, such as scaling, one-hot encoding, and visualization, to ensure the model's accuracy and effectiveness.

artificial-neural-network datapreprocessing deep-learning-algorithms visualizations

Last synced: 10 Jul 2025

https://github.com/sanjay-ar/payment_default_prediction

Payment Default Prediction System is a machine learning pipeline that forecasts the likelihood of payment defaults using historical transaction data and client profiles. Designed to support risk assessment in finance, the system uses classification models to flag high-risk clients for early intervention.

crossvalidation datapreprocessing feature-engineering logistic-regression supervised-machine-learning

Last synced: 19 Jun 2025

https://github.com/shriyaak/machinelearning.studyjournal.1

This repository contains my study and practice of key machine learning concepts, including:

association-rule-learning classification-algorithm clustering-algorithm datapreprocessing regression-models

Last synced: 21 Jun 2025

https://github.com/redayzarra/machine-learning

These are my notes, lessons, models, and code for topics on machine learning. Topics include everything from data pre-processing to logistic regression intuition, and more!

data-preprocessing data-science datapreprocessing linear-models linear-regression linear-regression-models linear-regression-python machine-learning machine-learning-algorithms simple-linear-regression

Last synced: 10 Sep 2025

https://github.com/yadavkaushal/datascience-e-commerce-shopping-details

This project analyzes customer purchase data including details such as location, company, credit card usage, browser info, job roles and purchase price. It explores patterns in payment methods, spending behavior and online transactions. Using Pandas, Matplotlib and Seaborn, we clean analyze and visualize key trends to derive actionable insights.

data datacleaning dataframe datapreprocessing dataset libraries matplotlib numpy pandas plots visulaization

Last synced: 24 Dec 2025

https://github.com/batthulavinay/basic-linear-regression

This project demonstrates Basic Linear Regression using Python. The notebook includes dataset loading, exploratory data analysis, model training, evaluation, and visualization of results.

data-visualization datapreprocessing exploratory-data-analysis linear-regression matplotlib modelevaluation pandas-library

Last synced: 13 Apr 2025

https://github.com/batthulavinay/genz_datingapp-eda-and-ml

This project focuses on analyzing data from a GenZ Dating App to uncover insights, trends, and predictive models. The analysis is conducted using Python in a Jupyter Notebook environment.

classification datacleaning datapreprocessing datavisualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas regression-models scikit-learn seaborn

Last synced: 13 Apr 2025

https://github.com/batthulavinay/ev-population

This repository contains a Jupyter Notebook focused on analyzing Electric Vehicle (EV) population data. The notebook includes data visualizations, exploratory analysis, and key insights.

data-science datacleaning datapreprocessing datavisualization jupyter-notebook matplotlib numpy pandas seaborn

Last synced: 13 Apr 2025

https://github.com/batthulavinay/divorce-status-prediction-eda-and-ml

This project focuses on analyzing data related to divorce status to uncover insights, trends, and predictive models. The analysis is conducted using Python in a Jupyter Notebook environment.

datacleaning datapreprocessing exploratory-data-analysis knn-regression logistic-regression predictive-analytics random-forest xgboost-regression

Last synced: 13 Apr 2025

https://github.com/raghavendranhp/covid19_vaccine_sentiment_analysis

Explore sentiments in COVID-19 vaccine tweets using NLP. Analyze trends, visualize opinions, and uncover public perceptions.

cumulativedensity datapreprocessing eda kerneldensityestimation nlp-machine-learning nltk sentiment-analysis vader-sentiment-analysis wordcloud

Last synced: 24 Feb 2025

https://github.com/shimul-zahan/all-practices-tukitaki

This is repository for all the practice tasks or learning new things. Cause environment are setup and no need to setup a new project or environments.

data data-science datapreprocessing deep-learning machine-learning neural-network practice python visualization

Last synced: 12 Jan 2026

https://github.com/sarahloree/project-3--credit-card-user-churn-prediction

This is the third project I completed as part of the Advanced Machine Learning module from my post-graduate certification in AI/ Machine Learning from University of Texas' McCombs School of Business.

bagging bagging-classifier boosting boosting-classifier cross-validation datapreprocessing eda exploratory-data-analysis hyperparameter-optimization hyperparameter-tuning random-forest random-forest-classifier sampling smote

Last synced: 24 Feb 2025

https://github.com/iv4n-ga6l/functional-dataprocessing-pipeline

A functional data processing pipeline that accepts an input file, allows specifying both input and output formats, applies specified transformations, and produces a resulting output file.

csv data datapreprocessing excel json pandas parquet pipeline python

Last synced: 24 Dec 2025

https://github.com/compcode1/depression-screening-ml

The goal of this project was to build a machine learning model capable of accurately predicting depression in a population where the incidence rate is 15%.

auc datapreprocessing f1-score hyperparameter-tuning threshold xgboost-classifier

Last synced: 18 Mar 2025

https://github.com/bala-1409/milk-production-time-series-forecasting-datascience-project

This project uses time series forecasting to predict future milk production. The data used in this project is monthly milk production data from January 1962 to December 1975. The ARIMA (autoregressive integrated moving average) model is used to forecast the milk production. The model is evaluated using various metric.

acf adf arima-model data-analysis data-science data-visualization datapreprocessing eda exploratory-data-analysis forecasting machine-learning-algorithms pacf python python3 sarimax-model seasonality seasonality-analysis time-series time-series-forecasting trends

Last synced: 22 Mar 2025

https://github.com/priyanshu7639/data_visualization_dashboard

An Interactive data visualization tool that combines traditional plotting capabilities with modern AI assistance. It allows users to create and modify visualizations through natural language commands, making data exploration accessible to users of all skill levels.

business-analytics data-analysis data-engineering data-exploration data-science data-visualization datapreprocessing datascience interactive-visualizations matplotlib plotly plotting python research-tool streamlit

Last synced: 24 Feb 2025

https://github.com/muthukumar0908/airbnb-analysis

In This project aims to analyze Airbnb data, perform data cleaning and preparation, develop interactive geospatial visualizations and suing for MySql , availability patterns, and location-based trends with help of Streamlit app.

datapreprocessing eda mongodb pythonscripting streamlit visualization

Last synced: 30 Mar 2025