An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/mohnish88/e-commerce-data-analysis

I analyzed sales data to identify trends and patterns, which significantly enhanced decision-making processes. Additionally, I created interactive visualizations to present these insights clearly and effectively, facilitating better understanding and communication of the data's implications.

data-analysis data-cleaning jupyter-notebook pandas plotly python python-library sales sales-analysis visulaization

Last synced: 03 May 2026

https://github.com/ababic/dumpling

Fast, flexibile, powerful static data anonymisation for SQL dumps

anonymisation cli data-analysis data-science pii pii-redaction postgres privacy rust rust-lang scrubber scrubbing security tooling

Last synced: 03 May 2026

https://github.com/syed-m-nofel/python-data-science-fundamentals

Python notebooks for data manipulation (Pandas/NumPy) and API workflows – from basics to practical examples.

api beginner-friendly data-analysis data-science http-requests jupyter-notebook numpy pandas pandas-dataframe python tutorial

Last synced: 03 May 2026

https://github.com/ggarciajavier/udacity-dalf-project4-identify-fraud-enron-email

Work performed for the 4th project of the Udacity Data Analyst Nanodegree: machine learning classifier for identifying fraud in Enron email corpus.

data-analysis data-science machine-learning nlp-machine-learning python python27

Last synced: 03 May 2026

https://github.com/nurulashraf/logistic-regression-loan-prediction

Loan approval prediction using logistic regression based on applicant data, including income, credit history, and property details, after data preparation and feature engineering.

data-analysis data-science loan-prediction logistic-regression machine-learning predictive-modeling python sklearn

Last synced: 03 May 2026

https://github.com/iguptashubham/ev-market-exploration

So, market size analysis is a crucial aspect of market research that determines the potential sales volume within a given market

data-analysis data-analysis-projects data-science-project forecast projects python

Last synced: 03 May 2026

https://github.com/bpkaur/whats-in-a-name

Exploring dataset of first names of babies born in the US in order to uncover interesting stories

data-analysis datacamp numpy pandas python3

Last synced: 04 May 2026

https://github.com/sanchittechnogeek/rental-data-visualization_python

Statistics and visualization of rental data with python

data-analysis data-science data-visualization statistics

Last synced: 04 May 2026

https://github.com/balajimohan18/foreign-exchange-rate-time-series-datascience-project

This project will use time series analysis to forecast the exchange rate between the euro and the US dollar. The project will use a variety of statistical techniques, such as ARIMA to model the data and forecast the exchange rate.

data-analysis data-analytics data-preprocessing data-science data-transformation data-visualization eda exploratory-data-analysis foreign-exchange-rates machine-learning model-fitting predictive-modeling python3 time-series time-series-analysis

Last synced: 14 May 2026

https://github.com/fatihilhan42/book-recommendation-system-with-python

In this project, we are making a book recommendation system that recommends similar books according to the genres or ratings that the user enters, using a large book dataset. The link of the dataset is given below. Happy reading...

books data-analysis data-science data-visualization kaggle python recommendation-engine recommendation-system

Last synced: 04 May 2026

https://github.com/hyperplasma/olympic-visualization-analysis

Multidimensional analysis and visualization of Olympic medals, economy, and happiness index.

data-analysis data-visualization matplotlib numpy pandas python wordcloud

Last synced: 04 May 2026

https://github.com/ljadhav25/logistic-regression-data-science-

Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote, based on a given data set of independent variables.

data-analysis data-science data-visualization logestic-regression machine-learning

Last synced: 04 May 2026

https://github.com/drod75/nyc-arrests-analysis

This is a simple Data Science Project made to analyze and display data and trends found within the NYC Arrests Year to Date Dataset.

data-analysis data-visualization folium jupyter-notebook matplotlib-pyplot nyc-opendata nypd python scikit-learn seaborn

Last synced: 04 May 2026

https://github.com/jacktheprogrammer/time-series-forecasting-and-analysis

My personal project consisting of my personally created notebooks to work with time series forecasting and analysis. In these projects, I've used deep learning using tensorflow, xgboost, statsmodels and scipy libraries of python. The series were of weather, energy consumption and that of stocks.

data-analysis data-science deep-neural-networks energy-consumption machine-learning portfolio prophet-facebook prophet-model python python3 scipy statsmodels stocks tensorflow time-series time-series-analysis timeseries-forecasting weather xgboost

Last synced: 05 May 2026

https://github.com/as16082023/motor-vehicle-thefts

Using SQL to analyze vehicle theft patterns across New Zealand, focusing on trends related to specific times and locations.

data-analysis mysql sql

Last synced: 10 Apr 2025

https://github.com/esther-poniatowski/multitask-context-dependent-behavior

Data analysis of neuronal recordings in naive and trained animals performing multiple tasks in active and passive attentional states

cognitive-neuroscience computational-neuroscience data-analysis data-visualization information-processing

Last synced: 26 Mar 2025

https://github.com/mattholy/haka

HaKa is an out-of-the-box tool system designed for data engineers and data analysts in medium-sized enterprises. It is easy to deploy and scale.

celery data-analysis data-engineering fastapi python uvicorn-gunicorn

Last synced: 19 May 2026

https://github.com/mainak-97/weather-data-analysis-using-python

A comprehensive analysis of time-series weather data using Python and Pandas, focusing on data exploration, cleaning, and uncovering insights.

data-analysis jupyter-notebook pandas pandas-dataframe python python3 time-series-analysis

Last synced: 08 May 2026

https://github.com/samruddhi3012/public-health-data-analysis

Hi! This repo involves analyzing the Healthcare analytics using Advanced Microsoft Excel.

dashboard data-analysis data-visualization healthcare microsoft-excel pivot-chart pivot-tables vlookup

Last synced: 05 Feb 2026

https://github.com/chen0040/spark-tabular-analytics

Spark statistical inference framework for performing column pair-wise data analytics for large data table

anova chi-square-test confidence-intervals data-analysis hypothesis-testing spark statistical-inference tabular-data

Last synced: 07 Jul 2025

https://github.com/aalekhpatel07/statcan

StatCAN dataset fetcher and cleaner.

census data-analysis data-science statcan

Last synced: 02 Apr 2025

https://github.com/beyzabasarir/brazilian-e-commerce-analysis

Brazilian E-Commerce Dataset By Olist PostgreSQL Analysis

data-analysis data-visualization sql

Last synced: 08 Jan 2026

https://github.com/anburocky3/cbse-schools-data

Fetch CBSE Schools in seconds and use it for your data projects

cbse data data-analysis data-science grabber nextjs

Last synced: 24 Jun 2026

https://github.com/grandechowhiskey/fcc-data_analysis-projects

A collection of projects completed as part of the FreeCodeCamp "Data Analysis with Python" certification. These projects cover statistical calculations, data visualization, and trend analysis using real-world datasets.

data-analysis data-visualization matplotlib pandas python3 scikit-learn seaborn

Last synced: 01 May 2026

https://github.com/nero103/airbnb-destination

This is and end-to-end project to uncover the ideal destination based on listings and hosts. Strategy included: Data workflow-SQL analysis-Data modeling-Data Visualization-Findings

data-analysis data-modeling data-visualization etl etl-pipeline excel microsoft-sql-server powerpoint sql tableau

Last synced: 27 Mar 2026

https://github.com/codesaadumair/pandas_exercises_personal

Personalized enhancements to pandas exercises with comprehensive solutions and practical insights for mastering data analysis in Python.

data-analysis data-science pandas python

Last synced: 09 May 2026

https://github.com/harkishen/Agriculture-DS

An Agricultural based Mtech project, on Data Science, which predicts the growth of crops based on previous year records.

data-analysis pandas python

Last synced: 11 Dec 2025

https://github.com/hyperentangledqubit/shellplot

shellplot -- Generate plot(s) directly from terminal via matplotlib or ggplot2 (plotnine)!

data-analysis ggplot2 graphics matplotlib plotnine plotting pyplot terminal

Last synced: 10 May 2026

https://github.com/luizassimoes/q5ga-latency-and-throughput

Quick 5G Analyser: PyQT5 software developed to help with simple graphical analysis and chart generating for ping and iperf3 tests.

data-analysis data-visualization pyqt5 python

Last synced: 13 Jun 2026

https://github.com/marielachirinosr/analysis-urgencias-hospital-pitalito

This project involves analyzing emergency room admission data from the E.S.E Hospital Departamental de Pitalito using a star schema model.

bigquery data data-analysis etl-pipeline tableau

Last synced: 21 Jan 2026

https://github.com/data-edd/mastering_sql

This is a repo documenting me mastering sql

data-analysis mysql mysql-database sql

Last synced: 06 Oct 2025

https://github.com/fer-aguirre/cookiecutter-data-analysis-lite

A cookiecutter template for data journalism projects that offers a simplified and beginner-friendly structure.

cookiecutter data-analysis data-journalism project-template python

Last synced: 14 Jun 2025

https://github.com/sadia-khan13/data-preprocessing

Welcome to the Data preprocessing Repository! This repository is dedicated to showcase the comprehensive resources and implementations related to Data Preprocessing using Python and Jupyter Notebook.

artificial-intelligence data-analysis data-mining data-preprocessing data-science jupyter-notebook matplotlib numpy pandas python seaborn-python sklearn

Last synced: 11 Apr 2026

https://github.com/diligencefrozen/dcinside-data

Analyzing the Dcinside Frozen Gallery Dataset. #디시

data-analysis dataset

Last synced: 30 May 2026

https://github.com/myles/notebooks

Some of my random Jupyter Notebooks.

data-analysis data-science jupyter-notebooks

Last synced: 18 Jan 2026

https://github.com/nuriadevs/informes-powerbi

Este repositorio contiene informes elaborados con Power BI.

data-analysis powerbi

Last synced: 18 Feb 2026

https://github.com/michaelcurrin/yahoo-finance-reports

Use the Yahoo Finance API to get info on shares of interest and report on them

data-analysis data-science python reporting shares stock-market yahoo-finance yahoo-finance-api

Last synced: 07 Oct 2025

https://github.com/prarthana-singh/bangalore-house-price-predictor

🏡 Bangalore House Price Prediction – A Machine Learning model to predict house prices in Bangalore using real estate data. Built with Linear Regression, Python, Pandas, NumPy, and Scikit-Learn.

data-analysis eda house-price-prediction linear-regression machine-learning numpy pandas python real-estate regression scikit-learn

Last synced: 19 Apr 2026

https://github.com/chiragkumargohil/co2-emissions-data-analysis

A Python programme that analyses CO2 emission data from 1997 to 2010. This programme prints data, provides brief of a given year, displays and compares Year vs. Emission graphs for chosen countries, and generates a separate data file for chosen countries. It was a self-paced project that Guru 99 provided.

co2-emission data-analysis matplotlib python

Last synced: 28 Aug 2025

https://github.com/rusiru-erandaka/pupil-dilation-signal-classification-pipeline-with-noise-filtering-feature-extraction

In this repository I have worked on Pupil Diameter Time series Dataset. here I have worked on data sampling, Blink detection and Noise Handling, Stimulus Onset Alignment & Ensemble Averaging, Baseline correction, Feature Extraction and finally create a Patient classification ML pipeliner

anomaly-detection classification-pipeline data-analysis data-preprocessing data-science time-series

Last synced: 08 Oct 2025

https://github.com/shellynagar27/mobile-sales-analysis

Analyzed 2024 mobile sales data to uncover product trends, customer behavior, and regional insights using Power BI dashboards and structured data modeling.

cleaning-data data-analysis data-visualization dax eda figma modelling powerbi powerquery storytelling wireframe

Last synced: 16 May 2025

https://github.com/maccccd/sql-proficiency-journey

A technical journey of my SQL understanding.

data-analysis sql systems-analysis-and-design uml-class-diagram

Last synced: 15 Feb 2026

https://github.com/allanotieno254/powerbi-dax-filter-context

This repository contains a Power BI project that explores **DAX Filter Context**, a crucial concept in DAX calculations. The project focuses on **Bank Loan Analysis**, demonstrating how different filter contexts affect DAX formulas.

business-intelligence data data-analysis dax dax-functions powerbi powerbi-visuals visualization

Last synced: 08 Jan 2026

https://github.com/aryar-06/linear-regression

A Python project demonstrating basic linear regression with gradient descent and matrix operations, alongside scikit-learn comparison.

data-analysis data-preprocessing educational-project gradient-descent linear-regression machine-learning python regression-algorithms scikit-learn

Last synced: 05 May 2026

https://github.com/ndiplacide7/r-project

Explore diverse data analysis techniques using R programming combined with advanced machine learning algorithms to uncover insights and create powerful predictive models.

data-analysis data-visualization machine-learning-algorithms r

Last synced: 25 Mar 2025

https://github.com/dcs-training/exploratory-data-analysis-and-visualisation-with-observable-plot

This two-hour workshop will teach you how to follow an exploratory data analysis pipeline with Observable Plot, a new JavaScript library based on the Grammar of Graphics, that proposes a simple yet expressive interface to create powerful graphics easily shareable on the web. Go to the Readme file

d3 data-analysis data-visualisation javascript observable-notebook

Last synced: 17 May 2026

https://github.com/mehedi-hassan81/mastercourse

Data analysis project analysing renewable energy production across 212 countries, visualizing trends with Tableau. Highlights China's dominance (2,894 TWh) and Paraguay's 100% renewable share.

data-analysis pandas python renewable-energy selenium tableau-dashboards tableau-public web-scraping

Last synced: 08 May 2026

https://github.com/chitranjan806/predicting-on-time-premium-deposits

A Predictive analysis project to predict the success rate of On-Time deposits of Premiums by Policy Holders.

analytics-vidhya analytics-vidhya-competition catboostregressor data-analysis data-science linear-regression logistic-regression python3

Last synced: 16 May 2026

https://github.com/shellynagar27/candy-market-share-analysis

Candy Market Share Analysis explores confectionery sales data using Power BI, Python, and Power Query. It uncovers key market trends, top-selling candies, manufacturer performance, and packaging preferences to support data-driven decision-making for industry researchers.

critical-thinking data-analysis data-visualization exploratory-data-analysis powerbi powerquery problem-solving sales-analysis

Last synced: 03 Feb 2026

https://github.com/takshshah-16/pizza_sales_sql

SQL-powered pizza sales analytics project using MySQL Workbench to derive business insights through data exploration and queries.

business-intelligence data-analysis database-management mysql sql

Last synced: 09 Oct 2025

https://github.com/debjyotisaha/sql-projects

Designed and implemented SQL-based projects to analyse and manage datasets efficiently. Demonstrated expertise in writing complex queries, optimizing database performance, and performing data extraction, transformation, and loading (ETL) processes.

data-analysis database sql

Last synced: 09 Oct 2025

https://github.com/priyanshubiswas-tech/priyanshubiswas-tech

SWE-Data Engineer @ EDN | Kubeflow-MLOps | Kubernetes | Databricks | AWS EMR-Lambda-Glue, Eventbridge, SQS-SNS | OCI Multi-Cloud Architect Professional | GCP GA4 | Gen AI | IEEE Brand Amb. | Ex-Chair, PES | Ex-Sec, SB

apache-spark aws data-analysis data-engineering data-visualization dbt hadoop kubernetes python3 sql

Last synced: 21 Jan 2026

https://github.com/ninadpatil09/hospital_emergency_room_analysis

This comprehensive analysis delves into the performance and characteristics of the hospital's emergency room over the past year. By scrutinizing key metrics and patient demographics, this study aims to provide valuable insights for optimizing patient care, resource allocation, and overall operational efficiency.

data-analysis tableau-public visualization

Last synced: 15 Feb 2026

https://github.com/shellynagar27/good-cabs-data-analysis-project

This project is part of CodeBasics Challenge #13, where the goal was to provide actionable insights to the Chief of Operations at Goodcabs, a cab service provider in tier-2 cities of India. The project focused on analyzing key metrics like trip volume, repeat passenger rate, and passenger satisfaction.

critical-thinking data-analysis data-visualization excel exploratory-data-analysis power-bi presentation problem-solving sql storytelling

Last synced: 25 Jan 2026

https://github.com/ibromeat/road-accident-risk

Exploratory Data Analysis of road accident risk predictions — visualizing model stability and distribution of predicted probabilities.

data-analysis jupyter-notebook matplotlib python traffic-data visualization

Last synced: 18 May 2026

https://github.com/anjasfedo/data-analysis

Repo to Explore Data Analysis

data-analysis numpy

Last synced: 13 Apr 2026

https://github.com/donmaruko/python-eda-toolkit

CLI-runned EDA with 30 commands utilizing text-related functions, statistical calculations, data visualization, and data manipulation.

data data-analysis data-science data-visualization matplotlib pandas scipy seaborn statistical-analysis statistics wordcloud

Last synced: 06 May 2026

https://github.com/pkjjoshi/behind-the-menu-uncovering-insights-from-restaurant-data

Discover hidden patterns in dining data — from popular cuisine pairings to geographic restaurant clusters

data-analysis data-visualization insights jupyter-notebook pandas python restaurant-data

Last synced: 05 Jul 2025

https://github.com/scarlet-enlight/ml_project

Comparison of different classifiers (KNN, Naive Bayes, Decision Tree) on Sleep Health and Lifestyle Dataset

data-analysis machine-learning

Last synced: 13 Mar 2026

https://github.com/pranav016/exploratory-data-analysis-of-google-app-store-dataset

This is a data analysis done on the Google app store dataset to answer a few questions related to the data through data visualization techniques.

data-analysis

Last synced: 11 Oct 2025

https://github.com/azaz9026/email-spam-detection

Welcome to the Email Spam Detection project! This repository provides a machine learning model for detecting spam emails using a Naive Bayes classifier and a simple web interface built with Streamlit.

data-analysis data-cleaning data-structures data-visualization deep-learning machine-learning python sql streamlit

Last synced: 14 Apr 2026

https://github.com/ryuzen6/bangalore-real-estate-price-prediction

This is a Data Science Project which predicts the cost of Real Estate in Bangalore. Requirements: Jupyter Notebook (for Data Cleaning and creating the Linear Regression using various python libraries) , Pycharm (python IDE for creating Python Flask Server), Visual Studio Code (to create the UI with HTML, CSS and Javascript).

css3 data-analysis data-science html5 javascript jupyter-notebook machine-learning python3

Last synced: 06 May 2026

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

📊 analyzing NYC school test scores with python 🐍 to spot top performers 🏆 & trends 📈

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/hemangsharma/streamingcontentanalyzer

This Streamlit application provides an interactive dashboard for analyzing streaming content data. It allows users to explore movie and TV show ratings, distributions, temporal trends, and genre breakdowns through various visualizations and filters.

dashboard data-analysis data-science data-visualization python streamlit-dashboard streamlit-webapp

Last synced: 02 Apr 2025

https://github.com/silvermete0r/sdu_hackathon_uss_db_analysis

Smart Data Ukimet Hackathon - "Data Modeling" case Solution - Topic: Store Analysis based on Unified Star Schema

data-analysis data-modeling postgresql python sql unified-star-schema

Last synced: 14 Apr 2026

https://github.com/gintuvedula/crime-data-analysis-with-mysql-and-python

This project aims to analyze crime data using MySQL for database management and Python for data analysis and visualization. The objective is to uncover crime trends, hotspots, and patterns to support law enforcement and urban planning efforts.

data-analysis data-exploration database mysql python

Last synced: 05 May 2026

https://github.com/abhinav330/customer-behavior-analysis-linear-regression

This repository explores customer behavior data for an NYC clothing company with both a mobile app and website. They want to understand which platform drives higher sales.

data-analysis data-science data-visualization eda exploratory-data-analysis jupyter jupyter-notebook linear-regression machine-learning machine-learning-algorithms machinelearning-python numpy pandas python regression-analysis

Last synced: 06 May 2026

https://github.com/singhrdeep/croppilot

CropPilot is a lightweight, Python-based command-line tool designed to help small-scale farmers, gardeners, and students manage crop data, track profits, and explore sustainable practices. Built for usability and extensibility.

agriculture data-analysis farm-management open-source python

Last synced: 25 Apr 2025

https://github.com/ksm26/ml-ai-data-science-jobs-in-canada

Explore the latest machine learning, artificial intelligence, and data science job opportunities in Canada. Stay informed about Canadian tech job market trends and find your next career move.

ai-canada ai-careers canada canadian-tech-companies canadian-tech-job-market data data-analysis data-engineering data-science data-science-careers machine-learning prompt-engineering robotics

Last synced: 06 May 2026

https://github.com/evan-dg31/data-science

Exploratory Data Analysis (EDA), Predictive Modeling (Supervised and Unsupervised), Regression, Classification, Clustering

classification clustering data-analysis data-science data-visualization machine-learning matplotlib numpy pandas python regression-analysis seaborn

Last synced: 13 Apr 2026

https://github.com/agb2k/twitter-analyzer

Project to extract tweets based on searches, analyze it's data and autocorrect potentially incorrect words

data-analysis python tweepy twitter

Last synced: 13 Oct 2025

https://github.com/mr-chang95/udacity_movie_project

Movie Data Analysis and Visualization Project for Udacity's Data Analyst Program. Using Python in Jupyter Notebook.

data-analysis data-visualization jupyter-notebook movie python

Last synced: 13 Apr 2026

https://github.com/samruddhi3012/tata-data-visualization

Hi! This repo contains the dashboard I created using Tableau for TATA Data Visualization Training!

data-analysis data-visualization tableau tata

Last synced: 07 Jan 2026

https://github.com/jsimell/sleepanalysis

A Python data analysis project analyzing the sleep quality affecting factors and temporal patterns in the sleeping data of a single subject.

data-analysis matplotlib numpy pandas python scikit-learn seaborn

Last synced: 14 Apr 2026

https://github.com/sco1/xbmini-py

Python Toolkit for the GCDC HAM

data-analysis data-visualization python python3

Last synced: 07 May 2025

https://github.com/sumit9000/submission-of-web-server-log-analysis-assessment

This project analyzes one year of real-world HTTP access logs from the University of Calgary’s computer science server. Using Python, pandas, and regular expressions, we clean and parse the data to extract meaningful insights and answer 10 analytical questions.

data-analysis data-cleaning eda jupyter-notebook log-parsing pandas python realworld-data regex web-log-analysis

Last synced: 14 Apr 2026

https://github.com/ireneflorez/nypd-mvc

Analysis of NYPD Motor Vehicle Collisions

basemap data-analysis folium jupyter-notebook matplot pandas python

Last synced: 08 May 2026