An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/soumya-thoutam/covid-19-impact-on-u.s.-states-and-colleges

Covid-19 analysis and impact on United States Colleges and States using SQL and Tableau.

covid-19 dashboard data-analysis data-visualization dataset sql sql-server tableau

Last synced: 04 Sep 2025

https://github.com/stoll-jonathan/sorting_algorithm_analyzer

C++ program which analyses the performance of different sorting algorithms on a dataset of random numbers

bubble-sort data-analysis insertion-sort merge-sort sorting-algorithms

Last synced: 01 Apr 2025

https://github.com/satvikpraveen/rsvp_case_study

A comprehensive IMDB dataset analysis using SQL. Includes database setup, advanced queries, and actionable insights. Organized with files for database creation, queries, and solutions. Features an Entity-Relationship Diagram (ERD), executive summary, and SQL scripts. Perfect for SQL workflows and business intelligence in the film industry.

aggregate-functions business-intelligence common-table-expressions data-analysis data-driven-decisions data-querying database-design entity-relationship-diagram imdb-dataset relational-database sql subqueries-and-joins

Last synced: 11 Jan 2026

https://github.com/apoorvalal/misc_stata_ados

Misc Utility programs in Stata.

data-analysis stata stata-command

Last synced: 04 Feb 2026

https://github.com/hemangsharma/breast-cancer-patient-dashboard

This interactive Streamlit dashboard visualizes insights from the SEER Breast Cancer Dataset (2006-2010)

data-analysis streamlit streamlit-dashboard streamlit-webapp

Last synced: 05 May 2026

https://github.com/skivhisink/econometricnsu

Семестровый магистерский курс по эконометрике на первом курсе магистратуры экономического факультета НГУ

data-analysis econometrics economics education nsu r

Last synced: 09 Apr 2025

https://github.com/quantumudit/groceries-basket-analysis

This project performs market basket analysis using Power BI and Python to reveal associations between grocery items. It involves transforming raw transaction data into a processed dataset, creating interactive Power BI reports, and generating key insights through Python, enabling data-driven decision-making.

data-analysis data-visualization pandas powerbi python

Last synced: 12 Apr 2026

https://github.com/bpkaur/exploring-the-evolution-of-linux

This project explores the evolution of the Linux kernel by finding top 10 contributors and visualization of commits over the years.

data-analysis data-science datacamp ipynb-jupyter-notebook python3

Last synced: 21 Feb 2026

https://github.com/borjamome/soho_cholera

Cholera deaths in the Soho District (London)

data-analysis data-visualization london r

Last synced: 04 Sep 2025

https://github.com/sreekar0101/electric-vehicle-market-growth-and-incentive-impact-analysis-dashboard

About This project involves the development of a comprehensive Tableau dashboard to analyze the growth and market dynamics of electric vehicles (EVs). The dashboard reveals key insights, including a 20% increase in EV adoption over five years, the dominance of Battery Electric Vehicles (BEVs) which make up 60% of the market

data-analysis data-visualization tableau-desktop

Last synced: 07 Jan 2026

https://github.com/kernix13/github-readme-seo-analysis

A Jupyter Notebook GitHub README and Repo SEO Analysis to determine what makes a repo rank in the SERPS

accessibility data-analysis readme seo seo-analysis

Last synced: 29 May 2026

https://github.com/kseniatyschuk/excel-data-matcher

Compare and match Excel files via a simple Python GUI

automation data-analysis etl excel gui pandas python3 tkinter

Last synced: 23 Apr 2025

https://github.com/ehsan-behzadi/online-retail-data-analysis-and-preprocessing

This project analyzes and preprocesses the Online Retail dataset to uncover insights into customer purchasing behaviors, sales trends, and product performance. It includes data cleaning, exploration, and visualization, with the goal of enhancing understanding of online retail dynamics.

cohort-analysis data-analysis data-cleaning data-exploration duplicate-detection exploratory-data-analysis-eda feature-encoding feature-engineering handling-missing-values online-retail outlier-detection preprocessing trends-visualization visualization z-score-method

Last synced: 16 Apr 2026

https://github.com/nilayhangarge/data-analysis-with-python

This repository provides a practical introduction to data acquisition and analysis using Pandas. It covers loading datasets, exploring data, manipulating data, and gaining insights through statistical summaries. Ideal for beginners, it offers code examples and explanations to enhance your data manipulation skills using Pandas for Python.

data-acquisition data-analysis data-analytics data-binning data-cleaning data-engineering data-fundamentals data-insights data-integration data-preprocessing data-science data-wrangling numpy pandas python

Last synced: 12 Apr 2026

https://github.com/muthukumar0908/imdb_movie_analysis_with_powerbi

The project aim is to analyze the dataset using Power Bi, The dataset is related to IMDB Movies.

data-analysis data-visualization powerbi

Last synced: 12 Jun 2025

https://github.com/wisdom-osborn/data-analytics-course-online-

🔍 Data Analytics with Python — Hands-on Course Materials Jupyter notebooks, projects, and datasets based on the freeCodeCamp Data Analysis with Python certification. Learn NumPy, Pandas, data cleaning, and visualization through real-world examples

data data-analysis data-science data-visualization freecodecamp numpy pandas pandas-dataframe project python

Last synced: 19 Apr 2026

https://github.com/bkataru/physics-e.e

Project repository for IB physics extended essay. Topic: Predictive data modeling of a variable binary star’s brightness over a period of time using astrostatistics.

astrometry astronomical-algorithms astronomical-images astronomy astrophotography astrostatistics data-analysis data-science data-visualization modeling physics polynomial-regression regression-analysis

Last synced: 09 Apr 2025

https://github.com/rohitblaze10/netflix_analysis_using_tableau

The Netflix dashboard in Tableau provides a professional and visually captivating interface for users to explore a vast collection of TV shows and series. With seamless navigation and interactive filters, users can easily personalize their recommendations based on release year, genre, duration, and rating.

data data-analysis data-science data-visualization netflix tableau

Last synced: 04 Feb 2026

https://github.com/pngo1997/life-expectancy-logistic-regression

Life expectancy analysis project using logistic regression.

data-analysis logistic-regression r rmarkdown

Last synced: 10 Jun 2026

https://github.com/bhaskarbharati/ibm-datascience-hands-on-lab

This is the basic hands-on exercise using Jupyter Notebook. This lab is done in the process of learning course Tools For Data Science | IBM

data-analysis data-science data-visualization datawrangling eda machine-learning

Last synced: 23 Apr 2025

https://github.com/victorlcastro-dsa/pbl-datacamp

This repository features projects from DataCamp's Project-Based Learning (PBL) courses, showcasing practical applications of data analysis, machine learning, and visualization. Explore real-world datasets and interactive results that highlight the skills gained through hands-on learning.

data-analysis data-science data-visualization datacamp-projects hypothesis-testing machine-learning project-based-learning

Last synced: 29 Nov 2025

https://github.com/abhash-rai/analyzing-credit-card-eligibility

This work was performed as part of BCU undergraduate course.

data-analysis data-visualization ggplot ggplot2 latex r

Last synced: 20 Jan 2026

https://github.com/faysalalmahmud/bd-med-professional-analysis

Analysis of healthcare professionals in Bangladesh through web scraping, data processing, and interactive visualization.

data-analysis data-visualization jupyter-notebook python scraper selenium selenium-webdriver tableau

Last synced: 04 Sep 2025

https://github.com/prakshal0809/power-bi-analytics-dashboard

I have developed a dashboard in Power BI utilizing data from an Excel file. The dashboard effectively visualizes and analyzes the given data.

data-analysis powerbi

Last synced: 22 Feb 2026

https://github.com/ymorsi7/caliwageanalysis

California employment and wage analysis on data from the past decade.

data-analysis data-science ipynb jupyter-notebook

Last synced: 21 Jan 2026

https://github.com/camara94/data_analyse_series_temporelles

Dans ce tutoriel, nous allons répondre aux questions suivantes: 1. Lire les données Microsoft à l'aide du package **Pandas Data reader** 2. Obtenez le **prix maximum** de l'action de **2017 à 2022** 3. Quelle est la **date du cours le plus élevé** de l'action ? 4. Quelle est la **date du cours le plus bas** de l'action ?

data-analysis data-analysis-python data-science data-structures-and-algorithms data-visualization serie series-forecasting

Last synced: 09 Apr 2025

https://github.com/doughtnerd/pod-old

Read and write Excel data

data data-analysis excel poi-library workbook

Last synced: 21 Jan 2026

https://github.com/anudeepkaddala/bankds

This repository contains a Python-based solution for cleaning, matching, and formatting bank data. The primary goal is to match banks from two datasets based on their names and associate each bank with its respective asset size. The final output is a cleaned dataset with asset sizes in Indian-style currency format.

data-analysis data-science fuzzy-matching pandas python

Last synced: 12 Apr 2026

https://github.com/nsandoya/python_scrp_project

This is a tool specially made for Dipaso ecommerce website. You can extract data from there, analyze it and see keywords, brands, and categories frecuency, prices distribution and other market tendencies as well —all in a group of friendly stadistic tables and graphics (exported from a Jupyter notebook) :)

beautifulsoup4 data data-analysis jupyter-notebook pandas python3

Last synced: 28 Apr 2026

https://github.com/sumit0ubey/internship

This repository showcases the tasks and projects I completed during various internships. It includes work across diverse domains such as: Data Analysis: Exploratory data analysis, data visualization, and insights generation using Python and libraries like Pandas, Matplotlib, and Seaborn. Backend Development: Designing and implementing RESTful API

backend-development data-analysis python-developer

Last synced: 05 Sep 2025

https://github.com/siddhant2105s/airman-database-system

This repository contains the design and implementation of the AirMan System for managing airport operations at London Biggin Hill Airport. It includes an ERD diagram, MySQL scripts for database creation, data insertion, and queries, as well as detailed data definitions and system requirements documentation.

data-analysis database-design database-normalization entity-relationship-diagram entity-relationship-models mysql relational-databases sql-queries

Last synced: 25 Mar 2025

https://github.com/fbarffmann/nosql-challenge

Analyzed 28,000+ UK restaurant records using MongoDB and PyMongo. Queried hygiene scores, location data, and customer ratings.

data-analysis data-cleaning database-analysis json mongodb nosql pymongo python restaurant-data

Last synced: 13 Apr 2026

https://github.com/fbarffmann/sqlalchemy-challenge

Built a Flask API with SQLAlchemy to analyze and visualize Hawaii climate data. Automated data extraction and developed database queries for temperature and precipitation insights.

api climate-data data-analysis data-visualization flask orm python sql sqlalchemy sqlite

Last synced: 13 Apr 2026

https://github.com/alinenog/desenvolve_gb_2022

Formação Desenvolve 2022 do Grupo Boticário na área de dados

data-analysis data-science googlesheet machine-learning numpy pandas python

Last synced: 13 Apr 2026

https://github.com/hazim-hf/data-science

This course covers basic data science principles, Python programming, and the concept of big data and its types. It explores algorithms, methods, and analyses in data science with practical Python examples. Additionally, it highlights current data technologies for storing and archiving.

data-analysis data-wrangling time-series

Last synced: 04 Jul 2025

https://github.com/nafiealhilaly/first-dash-app

A simple dash plotly app to explore and analyze imagined students assessment dataset

data-analysis data-analytics data-visualization eda plotly-dash python

Last synced: 02 Apr 2025

https://github.com/ironlegion88/media_bias

An end-to-end NLP pipeline to analyze ideological bias in online news media during elections. Uses sentiment analysis, topic modeling (LDA/NMF), and NER to quantify media framing.

data-analysis machine-learning media-bias nlp nltk political-science python scikit-learn sentiment-analysis spacy topic-modeling

Last synced: 13 Apr 2026

https://github.com/lexiortiz/advanced-data-analytics

Structured learning notes, code snippets, and key takeaways from the Google Advanced Data Analytics Professional Certificate. Serves as a personal reference for reinforcing concepts and as a resource for others on a similar learning journey.

data data-analysis data-engineering google python-3 sql

Last synced: 29 May 2026

https://github.com/darrenjolson/pba-analysis-app

Data analysis and visualization tool for professional bowling tournaments, predicting performance across different oil patterns and venues.

bowling data-analysis data-visualization flask pba predictive-analytics python reactjs sports-analytics

Last synced: 13 Apr 2026

https://github.com/nurulashraf/polynomial-regression-manufacturing

A Python project implementing polynomial regression to analyse and predict manufacturing-related data. Features include data preprocessing, model training, and visualisation of results. Ideal for exploring machine learning applications in manufacturing process optimisation.

data-analysis data-visualization machine-learning manufacturing polynomial-regression predictive-modeling process-optimization python regression-models scikit-learn

Last synced: 16 Apr 2026

https://github.com/aimin-nur/visualisasi_bikestore

Data Analyst - Dashboard Bike Store

data-analysis sql visualization

Last synced: 29 Jan 2026

https://github.com/borjamome/accidentes_madrid

Análisis de Accidentes en Madrid en SQL (2023)

accidentes-coche data-analysis madrid sql

Last synced: 17 Jan 2026

https://github.com/parthds02/e-commerce-data-analysis-with-python

This project focuses on analyzing an e-commerce dataset using Python. The goal is to derive meaningful insights through exploratory data analysis (EDA) and uncover trends and patterns that can drive business decisions.

data-analysis ecommerce exploratory-data-analysis jupyter-notebook pytho sales-analysis visualization

Last synced: 13 Jun 2025

https://github.com/ireneflorez/nypd-mvc

Analysis of NYPD Motor Vehicle Collisions

basemap data-analysis folium jupyter-notebook matplot pandas python

Last synced: 08 May 2026

https://github.com/hassanislam463/data-cleaning-and-modelling-top-5-categories-analysis-forage

This project involves cleaning, merging, and analyzing datasets to identify the top 5 performing categories based on aggregate popularity scores. It includes cleaned datasets, a final merged dataset, visualizations, and a presentation summarizing the tasks and results. Tools used: Microsoft Excel, Python, and PowerPoint.

data-analysis data-visualization microsoft-excel

Last synced: 07 Jan 2026

https://github.com/zen204/renewable-energy-usage-v-electricity-access

Interactive data visualization project created for COSI 116A: Introduction to Information Visualization at Brandeis University (Fall 2024). The project showcases data-driven insights using advanced visualization techniques and user interactivity. Hosted on GitHub Pages.

d3js data-analysis data-visualization electricity github-pages html-css-javascript information-visualization interactive python renewable-energy tableau web-development

Last synced: 08 Feb 2026

https://github.com/sco1/xbmini-py

Python Toolkit for the GCDC HAM

data-analysis data-visualization python python3

Last synced: 07 May 2025

https://github.com/ljadhav25/data-engineering-poc

This repository contains a beginner-level Data Engineering Proof of Concept (POC) project designed for practice. The objective is to provide hands-on experience with data engineering concepts, including data extraction, transformation, loading (ETL), and basic data analysis. This project is ideal for those looking to build foundational skills in da

data-analysis etl matplotlib numpy pandas python

Last synced: 13 Apr 2026

https://github.com/cezlul/analyse-ventes-immobilier

Solution ML d'analyse immobilière parisienne : classification automatique appartements vs commerces (K-means, 91%) et prédiction prix (régression, R²=0.98) sur 26K transactions. Valorise portefeuille 169M€ avec recommandations stratégiques data-driven.

data-analysis jupyter-notebook machine-learning matplotlib numpy pandas python sklearn

Last synced: 13 Apr 2026

https://github.com/extwiii/datascience-jhu

Ask the right questions, manipulate data sets, and create visualizations to communicate results - Coursera

biostatistics data-analysis data-science linear-regression multivariate-regression r r-programming toolbox visualization

Last synced: 05 Jul 2025

https://github.com/jameswrigley/laph

A node-based data analysis program.

cpp data-analysis nodes qml

Last synced: 05 Jun 2026

https://github.com/spacebakery/variance-in-weather-project

Statistics for Data Analysis | Variance and Standard Deviation

data-analysis python standard-deviation statistics variance

Last synced: 05 Jul 2025

https://github.com/athari22/investigating-netflix-movies-and-guest-stars-in-the-office

Apply basic Python skills in Introduction to Python and Intermediate Python by processing and visualizing film and television data.

data-analysis data-science data-visualization loop loops matplotlib matplotlib-pyplot netflix numpy office pandas python

Last synced: 11 Apr 2026

https://github.com/mr-chang95/udacity_movie_project

Movie Data Analysis and Visualization Project for Udacity's Data Analyst Program. Using Python in Jupyter Notebook.

data-analysis data-visualization jupyter-notebook movie python

Last synced: 13 Apr 2026

https://github.com/abhisek-13/fake_news_classifier

The Fake News Classifier is a TensorFlow-based machine learning project that detects and classifies fake news with 97% accuracy. The repository includes a single Python file with complete code for building and training the model, which you can use to create and deploy your own model.

colab-notebook data-analysis data-engineering deep-learning eda kaggle keras machine-learning nlp pandas python tensorflow

Last synced: 13 Apr 2026

https://github.com/ray-chew/pycsam

pyCSAM is a robust approach for approximating geodesic subgrid-scale orographic spectra with applications to weather forecasting and broader data analysis

data-analysis gmted icon-model merit-dem orographic spectral-analysis topography weather-forecast

Last synced: 28 Feb 2025

https://github.com/badranalyst/tips-dataset-analysis-dashboard-with-streamlit-and-plotly

Interactive Streamlit dashboard analyzing the Seaborn 'tips' dataset, which records information on restaurant bills, including total bill amounts, tips, customer demographics (e.g., gender, smoking status), and dining details (e.g., day, time). Visualized with Plotly for insights into tipping patterns.

data-analysis data-analytics data-visualization dataset eda exploratory-data-analysis matplotlib matplotlib-pyplot numpy pandas plotly python seaborn streamlit

Last synced: 13 Apr 2026

https://github.com/busradeveci/student-performance-prediction

A machine learning project to predict student exam performance based on academic, social, and personal features. Built with Python and scikit-learn.

data-analysis kaggle linear-regression machine-learning predictive-modeling python scikit-learn student-performance

Last synced: 25 Apr 2025

https://github.com/singhrdeep/croppilot

CropPilot is a lightweight, Python-based command-line tool designed to help small-scale farmers, gardeners, and students manage crop data, track profits, and explore sustainable practices. Built for usability and extensibility.

agriculture data-analysis farm-management open-source python

Last synced: 25 Apr 2025

https://github.com/shivamsharma32/customer-churn-analysis-power-bi-

This project is about analyzing and visualizing customer churn data using Power BI. Customer churn is the percentage of customers who stop doing business with a company over a given period of time. It is an important metric for businesses to understand why customers leave and how to retain them.

data-analysis dataanalytics datavisualization powerbi

Last synced: 15 Jan 2026

https://github.com/nmelgar/lego_my_data

Data visualization project to sell LEGO bulks.

csv data-analysis data-visualization data-viz google-sheets tableau

Last synced: 08 Jan 2026

https://github.com/kittonn/data-analysis-freecodecamp

freecodecamp - data analysis projects.

data-analysis freecodecamp

Last synced: 05 Apr 2025

https://github.com/anthonytlei/graphsql

Lightweight SQL-to-GraphQL connector for querying GraphQL endpoints using SQL syntax.

connector data-analysis dbapi graphql graphsql python sql sqlalchemy superset

Last synced: 09 Apr 2026

https://github.com/prady2309/car-price-prediction

Multiple Linear Regression Project

data-analysis data-science machine-learning python

Last synced: 20 May 2026

https://github.com/pkjjoshi/behind-the-menu-uncovering-insights-from-restaurant-data

Discover hidden patterns in dining data — from popular cuisine pairings to geographic restaurant clusters

data-analysis data-visualization insights jupyter-notebook pandas python restaurant-data

Last synced: 05 Jul 2025

https://github.com/aravindnathan02/bi-projects

Data Analysis and Visualization projects involving only BI tools (Power BI, Tableau, MS Excel).

data-analysis data-visualisation ms-excel powerbi tableau

Last synced: 08 Jan 2026

https://github.com/joaquinmoron/airbnb-eda-python

EDA de Airbnb — limpieza, exploración y visualización en Python (pandas, matplotlib, seaborn).

airbnb data-analysis eda matplotlib pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/manojrathod0777/loan-prediction

Predict loan approval status using machine learning techniques. This project demonstrates data preprocessing, feature engineering, model training, and evaluation, along with an interactive Streamlit app for real-time predictions. Ideal for financial decision-making.

classification-models data-analysis data-science financial-analytics jupyter-notebook loan-prediction machine-learning predictive-modeling python streamlit-app

Last synced: 13 Apr 2026

https://github.com/marianamartiyns/rfm-cluster-analysis

Customer behavior and sales analysis, including data cleaning, RFM calculation, churn analysis and customer clustering.

cluster-analysis data-analysis data-cleaning data-visualization pyhton

Last synced: 16 Mar 2025

https://github.com/luminati-io/Indeed-dataset-samples

A sample dataset of over 1000 Indeed job listings, extracted using the Bright Data API, ideal for market analysis and growth.

api data-analysis datasets indeed jobs web-scraping

Last synced: 09 Apr 2025

https://github.com/luminati-io/Walmart-dataset-samples

A sample dataset of over 1000 Walmart products, extracted using the Bright Data API, ideal for consumer market insights and competitor analysis.

api data-analysis dataset walmart walmart-scraper web-scraping

Last synced: 09 Apr 2025

https://github.com/luminati-io/Target-dataset-samples

A sample dataset of over 1000 target products, extracted using the Bright Data API, ideal for brand reputation, tracking inventory, and optimizing prices.

api data-analysis data-mining datasets target web-scraper web-scraping

Last synced: 09 Apr 2025

https://github.com/leandrocollares/nyc-film-permits

NYC film permits: an exploratory data analysis

data-analysis data-visualization pandas plotly

Last synced: 05 Jul 2025

https://github.com/khushi-sabarad/8-week-sql-challenge

Case studies' solutions for the #8WeekSQLChallenge by Danny Ma

8weeksqlchallenge case-study data-analysis mysql sql

Last synced: 06 Sep 2025

https://github.com/hari7261/playwithdata-python

This is one of the repository where I have put lot of data science and machine learning related questions on their solutions I hope you will find something better than some other platforms. Thank you Happy exploring

data-analysis data-science data-science-learning machienlearning matplotlib matplotlib-python ml numpy numpy-arrays numpy-library pandas pandas-dataframe pandas-library python python-script sklearn

Last synced: 13 Apr 2026

https://github.com/tj2904/lfb-callout-analysis

An investigation into London Fire Brigade's callout data.

data-analysis decsion-tree kmeans lfb-incidents london-fire-brigade pandas python seaborn

Last synced: 13 Apr 2026

https://github.com/shellynagar27/business-insights-360-project

A comprehensive Dashboard which provides better understanding of the business's market standing, key focus areas for optimization, underperforming customers, and year-wise financial insights, aiding in better inventory planning and performance tracking. Further it can be used in answering n number of why questions based on the situations.

dashboard data-analysis data-visualization dax-languague dax-studio excel performance-optimization power-bi reporting sql storage-manager

Last synced: 27 Jan 2026

https://github.com/alanjamlu34/bike-dataset

Ini adalah tugas akhir dari kelas Dicoding Menjadi Data Analist

data-analysis streamlit-dashboard

Last synced: 19 Oct 2025