An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/nishnash54/sidba

CSV to MongoDB with type conversion

csv-converter data-analysis mongodb statistics

Last synced: 02 May 2026

https://github.com/shreeparab1890/unicorns-of-india-till-sep-2022-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Unicorns of India till Sep 2022.

analysis data-analysis eda exploratory-data-analysis matplotlib-pyplot numpy pandas plotly

Last synced: 02 May 2026

https://github.com/isaqueiros/motorpremium-predictions-mlpclassifier

This Jupyter Notebooks is an initial study of the application of sklearn neural network MLP Classifier model. The model is applied to dataset MotorPremiums, which is supplied separately in .csv format.

data-analysis data-science machine-learning neural-network python sklearn-library

Last synced: 02 May 2026

https://github.com/chuxinh/our-data-manual

All in one place for our data science learning journey by Chuxin and Melody

data-analysis data-science machine-learning python

Last synced: 09 Jun 2026

https://github.com/benzerinsio/breastcancer-eda

📊 Análise Exploratória de Dados (EDA) - Câncer de Mama | Exploração de características clínicas para identificar padrões e relações no diagnóstico de câncer de mama.

analise-de-dados analise-exploratoria analise-exploratoria-de-dados data-analysis data-visualization diagnosis eda exploratory-data-analysis health-care medical-data python seaborn

Last synced: 02 May 2026

https://github.com/badranalyst/movie-correlation-analysis-in-python

This project analyzes movie data correlations using Python libraries like Pandas, NumPy, Seaborn, and Matplotlib. It examines relationships between attributes such as ratings, genres, and box office performance to uncover trends that inform recommendations and enhance understanding of movie success factors.

data data-analysis dataset jupyter jupyter-notebook matplotlib matplotlib-pyplot numpy pandas python seaborn

Last synced: 03 May 2026

https://github.com/bhavna-kale/cars-eda-project

Project analyzing used car market data to identify high-impact price drivers and depreciation curves, presented through an interactive web application.

data-analysis excel matplotlib numpy pandas python3 searborn streamlit

Last synced: 03 May 2026

https://github.com/monteirooscar98/tarifas-publicas-sp-dieese

Extração de dados através de WebScraping no site do Dieese e Analise em relação as Tarifas Públicas do Município de São Paulo.

data-analysis data-visualization python webscraping

Last synced: 03 May 2026

https://github.com/zients/tw-lottery-recommandation

Taiwan lottery draw analyzer & number recommender with Transformer ML model. Supports 539, 649, 638, 3D, and 4D lotteries.

cli data-analysis lottery machine-learning python pytorch taiwan transformer

Last synced: 03 May 2026

https://github.com/rohitinu6/tesla-price-prediction

A machine learning project that predicts future stock price movements using Logistic Regression, SVC, and XGBoost with engineered financial features.

data-analysis data-visualization feature-engineering financial-analysis logistic-regression machine-learning matplotlib python scikit-learn seaborn stock-market stock-price-prediction support-vector-machine time-series xgboost

Last synced: 03 May 2026

https://github.com/maddieemihle/python-challenge

Creating a Python script that analyzes financial records and election results

data-analysis python

Last synced: 09 Jun 2026

https://github.com/vipulbunny/restaurant-insight-analysis

A comprehensive data analysis project exploring restaurant ratings, locations, and customer sentiments. This project includes data preprocessing, descriptive analysis, geospatial mapping, sentiment analysis, and price-rating correlations using Python and visualization tools.

data-analysis data-preprocessing data-visualization folium geospatial geospatial-analysis geospatial-visualization machine-learning nlp pandas python restaurant-insights seaborn sentiment-analysis

Last synced: 03 May 2026

https://github.com/devlucho/modelos-predictivos

Modelos predictivos utilizando los algoritmos de Regresión Lineal, Regresión Logística y Árboles de Decisión.

data-analysis jupyter-notebook python3

Last synced: 03 May 2026

https://github.com/syed-m-nofel/python-data-science-fundamentals

Python notebooks for data manipulation (Pandas/NumPy) and API workflows – from basics to practical examples.

api beginner-friendly data-analysis data-science http-requests jupyter-notebook numpy pandas pandas-dataframe python tutorial

Last synced: 03 May 2026

https://github.com/ggarciajavier/udacity-dalf-project4-identify-fraud-enron-email

Work performed for the 4th project of the Udacity Data Analyst Nanodegree: machine learning classifier for identifying fraud in Enron email corpus.

data-analysis data-science machine-learning nlp-machine-learning python python27

Last synced: 03 May 2026

https://github.com/iguptashubham/ev-market-exploration

So, market size analysis is a crucial aspect of market research that determines the potential sales volume within a given market

data-analysis data-analysis-projects data-science-project forecast projects python

Last synced: 03 May 2026

https://github.com/bpkaur/whats-in-a-name

Exploring dataset of first names of babies born in the US in order to uncover interesting stories

data-analysis datacamp numpy pandas python3

Last synced: 04 May 2026

https://github.com/mindlessmuse666/titanic-data-visualization

Проект по визуализации данных о пассажирах Титаника с использованием библиотек Python Matplotlib, Seaborn и Plotly.

data-analysis data-visualization matplotlib pandas plotly python seaborn titanic

Last synced: 04 May 2026

https://github.com/damisparks/become_data_analyst

Are you new to Data Analysis ? Here you will find simple notebook that will help through your journey. These are personal projects I work on and still working.

data data-analysis data-visualization matplotlib numpy pandas-tutorial

Last synced: 04 May 2026

https://github.com/rtgrt5645/numpy-lab

🧮 Explore, manipulate, and visualize data with NumPy to enhance your Python skills in scientific computing and data analysis.

array-operations data-analysis data-science jupyter-notebook machine-learning numerical-computing numpy numpy-arrays numpy-library numpy-python python python3 scientific-computing

Last synced: 04 May 2026

https://github.com/abhinav330/customer-behavior-analysis-linear-regression

This repository explores customer behavior data for an NYC clothing company with both a mobile app and website. They want to understand which platform drives higher sales.

data-analysis data-science data-visualization eda exploratory-data-analysis jupyter jupyter-notebook linear-regression machine-learning machine-learning-algorithms machinelearning-python numpy pandas python regression-analysis

Last synced: 06 May 2026

https://github.com/ascender1729/leetcode_scraper

Extract topic tags from LeetCode problems to streamline interview preparation.

beautifulsoup coding-interview data-analysis graphql leetcode python scraper web-scraping

Last synced: 20 Jun 2026

https://github.com/dcs-training/spatial_dynamics

Use of QGIS and R to analyse first and second order geospatial effects. Go to the Readme file

data-analysis geographical-data gis qgis r statistics

Last synced: 23 Oct 2025

https://github.com/jigyasag18/bird-strikes-in-aviation-project

This project analyzes over a decade of U.S. bird strike data (2000–2011) to evaluate safety risks, damage trends, and cost implications in aviation. Using PostgreSQL for database management and Power BI for dashboard visualization, it uncovers critical insights into when, where, and how wildlife impacts aircraft. Key findings inform strategically.

bird-strike-prevention bird-strike-prevention-in-real-airport data data-analysis data-analysis-project data-visualisation data-visualization data-visualization-project data-visualizations database dataset dax-query postgresql postgresql-database powerbi powerbi-desktop powerbi-report powerbi-visuals sql sql-database

Last synced: 09 May 2026

https://github.com/prathmesh2507/global-stock-intelligence-dashboard

Interactive Global Stock Market Analytics Dashboard built using Python, YFinance, Pandas, Streamlit, and Plotly. Analyze 20+ countries and 400+ top stocks with advanced visualizations and financial insights.

dashboard data-analysis data-visualization python stock-analysis streamlit

Last synced: 15 Jun 2026

https://github.com/badranalyst/tips-dataset-analysis-dashboard-with-streamlit-and-plotly

Interactive Streamlit dashboard analyzing the Seaborn 'tips' dataset, which records information on restaurant bills, including total bill amounts, tips, customer demographics (e.g., gender, smoking status), and dining details (e.g., day, time). Visualized with Plotly for insights into tipping patterns.

data-analysis data-analytics data-visualization dataset eda exploratory-data-analysis matplotlib matplotlib-pyplot numpy pandas plotly python seaborn streamlit

Last synced: 13 Apr 2026

https://github.com/gunifiri/duckdb-ghw

🦆 Accelerate analytics with DuckDB's integration for GitHub workflows, enabling efficient data handling and processing directly within your repositories.

analytics analytics-engine big-data columnar-storage data-analysis data-science database duckdb in-memory-database open-source parquet python query-planner r sql

Last synced: 29 Apr 2026

https://github.com/browndwarf/contracosta

Wavelength dependent starspot contrast with Kepler/K2 and TESS

astronomy data-analysis

Last synced: 23 Jan 2026

https://github.com/ksm26/ml-ai-data-science-jobs-in-canada

Explore the latest machine learning, artificial intelligence, and data science job opportunities in Canada. Stay informed about Canadian tech job market trends and find your next career move.

ai-canada ai-careers canada canadian-tech-companies canadian-tech-job-market data data-analysis data-engineering data-science data-science-careers machine-learning prompt-engineering robotics

Last synced: 06 May 2026

https://github.com/bala-1409/tableau-visualization-viz.-project

This repository contains Visualization Projects which is visualized through Tableau Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and also it provides social values in some cases to calculate damages and intensity by calamities.

dashboard data-analysis data-science data-visualization exploratory-data-analysis tableau tableau-dashboards tableau-public visualization

Last synced: 04 Feb 2026

https://github.com/lu-m-dev/biostatistics-eda

Exploratory data analysis and visualization system for biostatistical research

biostatistics data-analysis data-visualization eda

Last synced: 25 Jun 2026

https://github.com/sugumarsrinivasan/sql-datawarehouse-project

Building Mordern datawarehouse with SQL Server, including ETL Processes, data modeling, and data analytics.

data-analysis data-analytics data-engineering data-lake data-science data-warehouse datawarehousing etl etl-pipeline medallion-architecture sql sql-query sql-server

Last synced: 19 Jun 2026

https://github.com/ray-chew/pycsam

pyCSAM is a robust approach for approximating geodesic subgrid-scale orographic spectra with applications to weather forecasting and broader data analysis

data-analysis gmted icon-model merit-dem orographic spectral-analysis topography weather-forecast

Last synced: 28 Feb 2025

https://github.com/abhisek-13/fake_news_classifier

The Fake News Classifier is a TensorFlow-based machine learning project that detects and classifies fake news with 97% accuracy. The repository includes a single Python file with complete code for building and training the model, which you can use to create and deploy your own model.

colab-notebook data-analysis data-engineering deep-learning eda kaggle keras machine-learning nlp pandas python tensorflow

Last synced: 13 Apr 2026

https://github.com/jofaval/sonar

Binary Classification of Sonar Signals of Rocks and Metal cylinders in 1987

data-analysis data-science data-visualization machine-learning python scikit-learn sonar uci

Last synced: 09 Apr 2026

https://github.com/pinedah/sleep-data-analysis-exercise

Análisis de un dataset médico sobre el sueño, explorando duración, calidad y factores relacionados. Incluye limpieza de datos, EDA y visualizaciones con Python (pandas, numpy, matplotlib, seaborn, scipy).

data-analysis data-science escom numpy pandas python school-project scipy

Last synced: 13 Apr 2026

https://github.com/marielachirinosr/nyc-taxi-trip-exploration-2019-2020

Explores passenger behavior & impact of COVID-19 on NYC taxi industry (Q1 2019-2020).

bigquery data data-analysis data-visualization python sql tableau

Last synced: 15 Jun 2026

https://github.com/athari22/investigating-netflix-movies-and-guest-stars-in-the-office

Apply basic Python skills in Introduction to Python and Intermediate Python by processing and visualizing film and television data.

data-analysis data-science data-visualization loop loops matplotlib matplotlib-pyplot netflix numpy office pandas python

Last synced: 11 Apr 2026

https://github.com/spacebakery/variance-in-weather-project

Statistics for Data Analysis | Variance and Standard Deviation

data-analysis python standard-deviation statistics variance

Last synced: 05 Jul 2025

https://github.com/abhiram-kandiyana/us-bikeshare-analysis

Explorative analsis on a bike-share system (Motivate) to understand it's pain points

data-analysis data-visualization

Last synced: 26 Mar 2025

https://github.com/namratha2301/python-dashboard-streamlit

Experimenting with Streamlit. Streamlit app provides an interactive visualization of the best-selling books, showcasing trends, top-selling books, top authors, genre distributions, and sales by decade.

css dashboard data-analysis pandas plotly python seaborn streamlit

Last synced: 05 May 2026

https://github.com/kishorep26/school-recommendation-system

Intelligent school recommendation system that matches students with suitable educational institutions based on preferences and performance metrics

bootstrap data-analysis decision-support edtech education education-technology flask matching-algorithm python recommendation-system school-finder school-search student-portal web-application

Last synced: 06 May 2026

https://github.com/psychelzh/cogstruct-old

Data Analysis on Cognitive Structure

cognition data-analysis intelligence psychology

Last synced: 25 Oct 2025

https://github.com/drill-n-bass/dealavo-project

Cartesian product from dictionary to list of dictionaries and faster methods for finding index than the `index` method.

data-analysis data-analysis-python matplotlib pandas python python3 random timeit

Last synced: 06 May 2026

https://github.com/jbn/vaquero

A Python library for iterative and interactive data wrangling at laptop-scale.

data data-analysis data-cleaning data-mining dirty-data elt etl etl-framework

Last synced: 10 Jun 2026

https://github.com/extwiii/datascience-jhu

Ask the right questions, manipulate data sets, and create visualizations to communicate results - Coursera

biostatistics data-analysis data-science linear-regression multivariate-regression r r-programming toolbox visualization

Last synced: 05 Jul 2025

https://github.com/madhursinghbhadoriya/data_analysis_fifa-players

• Using NumPy, Matplotlib, Pandas, etc processed important Information and Characteristic traits on Jupyter Notebook.

analysis data-analysis data-science graphs jupyter-notebook pandas python

Last synced: 07 May 2026

https://github.com/edanur-y/variable-analysis-of-banks-ratio-data

Testing variables for multicollinearity, multivariate normality and analyzing outliers and missing values. ⭕SPSS 🔵R

data-analysis log-transformation missing-values-analysis multicollinearity normality-test r spss

Last synced: 10 Jun 2026

https://github.com/srinibas-masanta/infosys-springboard-internship

An interactive Power BI dashboard developed during my Infosys Springboard Internship to visualize Indian election trends. It integrates historical and live API data to analyze vote shares, turnout patterns, and demographic insights across constituencies, helping news agencies report results in real time.

dashboard data-analysis data-cleaning data-collection data-visualization dax-functions powerbi

Last synced: 25 Feb 2026

https://github.com/vimlesh-gupta/blinkit_data_analytics_project

End-to-end Blinkit data analytics project using Python, SQL Server & Power BI

blinkit data-analysis eda pandas powerbi python sql-server

Last synced: 06 May 2026

https://github.com/korniichuk/pydatan-homework

Python Data Analysis course homework

course data-analysis data-analysis-python python python3

Last synced: 06 May 2026

https://github.com/cezlul/analyse-ventes-immobilier

Solution ML d'analyse immobilière parisienne : classification automatique appartements vs commerces (K-means, 91%) et prédiction prix (régression, R²=0.98) sur 26K transactions. Valorise portefeuille 169M€ avec recommandations stratégiques data-driven.

data-analysis jupyter-notebook machine-learning matplotlib numpy pandas python sklearn

Last synced: 13 Apr 2026

https://github.com/campagnucci/exercitando_pandas

Exercícios práticos de pandas com dados abertos da educação de São Paulo

data-analysis data-science education-data exercises pandas-tutorial

Last synced: 28 Jan 2026

https://github.com/limatix/limatix

Limatix datacollect and processtrak tools

data-analysis python scientific-workflows

Last synced: 23 Jan 2026

https://github.com/DCS-training/IntroToStatistics

This is a repository which contains all the materials to be used in the introduction to statistics course. Go to the readme file

data-analysis r rmarkdown statistics

Last synced: 25 Apr 2025

https://github.com/fbarffmann/home_sales

Analyzed 25,000+ home sales using PySpark and SparkSQL. Identified pricing trends by year built, home features, and view rating. Optimized query run-time by 70% using caching.

aws big-data data-analysis home-sales parquet pyspark python spark spark-sql sql

Last synced: 06 May 2026

https://github.com/sco1/xbmini-py

Python Toolkit for the GCDC HAM

data-analysis data-visualization python python3

Last synced: 07 May 2025

https://github.com/himanshubhosale25/ai-insightful-quiz-analytics

This project analyzes student quiz performance data, providing visualizations and AI-generated feedback. It uses FastAPI for the backend, React for the frontend, and OpenAI LLMs to deliver personalized insights and actionable recommendations for students.

data-analysis fastapi openai-api react student-performance

Last synced: 11 Mar 2025

https://github.com/9dl/usbfalcon

Automatically copies files from plugged USB drives to a specified location, enabling quick data retrieval for analysis.

automation data-analysis data-retrieval ethical-hacking file-copying usb

Last synced: 27 Oct 2025

https://github.com/rlalpha49/anisearch-model

AniSearchModel leverages Sentence-BERT (SBERT) models to generate embeddings for synopses, enabling the calculation of semantic similarities between descriptions. This allows users to find the most similar anime or manga based on a given description.

anime api data-analysis data-merging embeddings flask hugging-face-datasets kaggle-datasets machine-learning manga natural-language-processing nlp python sentence-bert similarity-search

Last synced: 06 May 2026

https://github.com/anderson-andre-p/uber-data-analysis

This repository contains a comprehensive data analysis project focused on Uber rides. The dataset used in this project is a spreadsheet obtained from Uber, containing data related to ride details, such as pick-up and drop-off locations, date and time of the ride, and the fare amount.

data-analysis data-science data-visualization python

Last synced: 15 Jun 2026

https://github.com/farzeen-2001/blinkit-sales-analysis-using-powerbi

The project provides an overview about the BlinkIt Sales performances

data-analysis data-visualization datacleaning excel powerbi

Last synced: 24 Jan 2026

https://github.com/code-jl/nfl-kicker-predictor

A sophisticated Python application that provides real-time NFL kicker statistics and performance analysis with an intuitive graphical interface.

beautifulsoup data-analysis data-visualization espn football gui nfl prediction python real-time-analytics real-time-data sport-analytics sports-data statistics tkinter web-scraping

Last synced: 01 Jun 2026

https://github.com/ireneflorez/nypd-mvc

Analysis of NYPD Motor Vehicle Collisions

basemap data-analysis folium jupyter-notebook matplot pandas python

Last synced: 08 May 2026

https://github.com/nature40/casestudies

Case studies for testing the functionality of database systems, sensors, etc

casestudies data-analysis data-visualization database

Last synced: 02 May 2026

https://github.com/gaurabkundu1/road-accident-data-analysis

This is an Excel project on Road Accident Data Analysis in the form of an interactive Dashboard.

dashboard data-analysis data-vizualisation excel road-accidents

Last synced: 24 Jan 2026

https://github.com/garcane/unicorn-companies-analysis

Tracking unicorn startups (valued at $1B+) provides valuable insights for investors and analysts to identify high-growth industries and emerging trends.

data-analysis exploratory-data-analysis financial-analysis investor postgresql sql

Last synced: 24 Jan 2026

https://github.com/amoghkori/working-with-apache-spark-mllib

Implemented Apache Spark MLLib to analyze a large car dataset, predict car selling prices, and gain insights into the car market.

amazon-web-services data-analysis data-visualization exploratory-data-analysis linear-regression machine-learning model-selection pyspark python random-forest sagemaker spark

Last synced: 13 Apr 2026

https://github.com/aneeshmurali-n/global-superstore-sales-dashboard---power-bi-stunning-dark-theme

This Power BI dashboard provides a comprehensive view of sales data, enabling users to analyze sales trends, identify top-performing regions, and gain insights into customer behavior.

dark-theme dashboard data-analysis data-science data-visualization powerbi salesdashboard

Last synced: 28 Jan 2026

https://github.com/rahulchouhan1/sql-data-warehouse-project

Building a modern data warehouse with SQL Server, including ETL Processes, data modeling, and analytics.

data-analysis data-cleaning data-engineering data-science data-warehouse datascience etl etl-pipeline sql sql-query sql-server

Last synced: 24 Jan 2026

https://github.com/hdgiacon/power_bi_projects

Repositório contendo cursos, dashboards e projeto relacionados à análise de dados e Power BI.

data-analysis data-engineering data-visualization microsoft-power-bi

Last synced: 24 Jan 2026

https://github.com/mysto-007/cyclistic-bike-share-analysis

Analyzed the dataset of Cyclistic Rental Service as the Capstone project for Google Data Analytics SpecializationAnalyzed the dataset of Cyclistic bike-share (Capstone project for Google Data Analytics Specialization)

bigquery data-analysis excel ms-sql-server sql tableau tableau-public

Last synced: 16 Mar 2026

https://github.com/parthds02/e-commerce-data-analysis-with-python

This project focuses on analyzing an e-commerce dataset using Python. The goal is to derive meaningful insights through exploratory data analysis (EDA) and uncover trends and patterns that can drive business decisions.

data-analysis ecommerce exploratory-data-analysis jupyter-notebook pytho sales-analysis visualization

Last synced: 13 Jun 2025

https://github.com/urbanekda/upwork_dashboard

A data analysis project examining trends and patterns in the data science job market on Upwork. This project analyzes job postings, requirements, and market demands to provide insights into the freelance data science ecosystem.

data-analysis data-science data-science-projects data-visualization freelance jupyter-notebook python streamlit

Last synced: 07 May 2026

https://github.com/borjamome/accidentes_madrid

Análisis de Accidentes en Madrid en SQL (2023)

accidentes-coche data-analysis madrid sql

Last synced: 17 Jan 2026

https://github.com/annnieglez/fraud-detection-eda

Fraud Detection - Exploratory Data Analysis (EDA). Analyzing financial transactions to detect fraud patterns using Python and Tableau. Libraries: Pandas, Seaborn and Matplotlib. Key Focus: Data cleaning, fraud trends, high-risk transactions, time-based patterns

data-analysis data-science data-visualization eda fraud-detection fraud-prevention matplotlib seaborn

Last synced: 28 Jan 2026

https://github.com/yash1882/music-store-data-analysis

A project focuses on analyzing music store data using SQL ♬

begineer-friendly data-analysis music music-store-data music-store-data-analysis sql-project

Last synced: 28 Jan 2026

https://github.com/karlyndiary/coffee-shop-sales-analysis

Comprehensive analysis of coffee shop sales utilizing Pandas for data cleaning and exploratory data analysis (EDA), complemented by Streamlit for creating interactive data visualization dashboards.

data-analysis data-cleaning data-preprocessing data-visualization eda pandas streamlit streamlit-dashboard

Last synced: 07 May 2026

https://github.com/anurag-ghosh-12/library_management_system_sql

This project showcases the development of a comprehensive Library Management System utilizing Structured Query Language (SQL). It demonstrates a practical application of relational database principles to efficiently manage library resources, member information, and borrowing/returning transactions.

data-analysis data-visualisation dbms-project sql

Last synced: 29 Jan 2026