An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/monteirooscar98/tarifas-publicas-sp-dieese

Extração de dados através de WebScraping no site do Dieese e Analise em relação as Tarifas Públicas do Município de São Paulo.

data-analysis data-visualization python webscraping

Last synced: 03 May 2026

https://github.com/zients/tw-lottery-recommandation

Taiwan lottery draw analyzer & number recommender with Transformer ML model. Supports 539, 649, 638, 3D, and 4D lotteries.

cli data-analysis lottery machine-learning python pytorch taiwan transformer

Last synced: 03 May 2026

https://github.com/syed-m-nofel/python-data-science-fundamentals

Python notebooks for data manipulation (Pandas/NumPy) and API workflows – from basics to practical examples.

api beginner-friendly data-analysis data-science http-requests jupyter-notebook numpy pandas pandas-dataframe python tutorial

Last synced: 03 May 2026

https://github.com/ankitgmishra/machinelearning

Continuously deep diving in understanding & advancing my expertise in Machine Learning through ongoing education and hands on experience with practical learning.

artificial-intelligence data-analysis data-cleaning data-gathering machine-learning machinel-learning-algorithms matplotlib numpy pandas python seaborn

Last synced: 03 May 2026

https://github.com/joelfaldin/data-analysis

A collection of data-analysis projects I've built over time! ✨⛏️

data-analysis python r

Last synced: 03 May 2026

https://github.com/devesh8423/machine_learning

Machine Learning practice projects, Jupyter notebooks, and datasets for learning regression, classification, and data analysis.

classification data-analysis data-science data-visualization jupyter-notebook machine-learning matplotlib ml-project numpy-library pandas python regression sckit-learn seaborn

Last synced: 03 May 2026

https://github.com/donmaruko/flask-data-analysis

Flask API for statistical calculations. Data analysis, cleansing, visualization, and manipulation. Documented by Swagger.

api api-rest data-analysis data-science data-visualization datascience flasgger matplotlib pandas seaborn sqlite wordcloud

Last synced: 03 May 2026

https://github.com/bpkaur/whats-in-a-name

Exploring dataset of first names of babies born in the US in order to uncover interesting stories

data-analysis datacamp numpy pandas python3

Last synced: 04 May 2026

https://github.com/r13i/cheapest-phone-call

Small challenge to find the best phone operator to use based on call price

big-data big-data-analytics cheapest data-analysis data-cruncher pandas phone-number pricelist

Last synced: 04 May 2026

https://github.com/soham7998/data-analysis-projects

My Data Analysis Projects which are completed by me and gain a hands on Experience from each project. the project showcase different Concepts , Visualization and many things.

data data-analysis data-science machine-learning nlp python soham visualization

Last synced: 04 May 2026

https://github.com/mr-chang95/sf_data_visualization

In this personal project, I am interested in examining all of the active businesses in the San Francisco Bay Area while performing some simple data visualizations, mainly on categorical variables.

business data-analysis data-visualization jupyter-notebook pandas python san-francisco

Last synced: 04 May 2026

https://github.com/sagarprajapat2004/data-analysis-visualization

Downloaded and analyzed a dataset from Kaggle using NumPy and Pandas created visualizations with Matplotlib and Seaborn developed a Flask web application to showcase data insights and conclusions.

data-analysis data-modeling data-visualization exploratory-data-analysis flask python statical-analysis

Last synced: 04 May 2026

https://github.com/ljadhav25/logistic-regression-data-science-

Logistic regression estimates the probability of an event occurring, such as voted or didn’t vote, based on a given data set of independent variables.

data-analysis data-science data-visualization logestic-regression machine-learning

Last synced: 04 May 2026

https://github.com/tomijuarez/lemmatisation

Lemmatisation fully implemented in Java.

algorithms data-analysis data-science java-8 lemmatization oop

Last synced: 08 Apr 2025

https://github.com/jendives2000/regressions

Performing of a Linear Regression analysis to determine the strength of the relationship between the number of reviews and sales for a retail company.

data-analysis linear-regression pearson-correlation-coefficient regression

Last synced: 04 May 2026

https://github.com/flytomarsz/bike-sharing-system-analysis

This analysis project aim to identify bike rental's behavior in 2012 from Capital Bikeshare system, Washington D.C., USA. This project is part of my Data Analysis study at Dicoding.

data-analysis data-visualization jupyter-notebook python streamlit

Last synced: 04 May 2026

https://github.com/tasosfotiadis/time-series-analysis-and-forecasting-of-cryptocurrency-prices

Forecasted Cardano (ADA) cryptocurrency prices using time series analysis. The project involved data preprocessing, trend and seasonality analysis, and model building with ARIMA, SARIMA, and LSTM. Models were evaluated using metrics like MAE and MAPE, providing insights for financial decision-making.

applied-st classical-statistical-models data-analysis deep-learning lstm machine-learning neural-network python r time-series

Last synced: 05 May 2026

https://github.com/zafir100100/cancer-stage-prediction

This code predicts cancer data using various regression models, calculates their average R-squared scores, and prints the best model.

cross-validation data-analysis data-preprocessing decision-trees gradient-boosting linear-regression machine-learning-algorithms numpy pandas random-forest regression scikit-learn

Last synced: 05 May 2026

https://github.com/cicku/en.650.672

HW of EN.650.672

analytics data-analysis numpy pandas

Last synced: 05 May 2026

https://github.com/monish-nallagondalla/universal-bank

Credit Card Ownership Prediction A machine learning project that predicts credit card ownership using features like age and income, balancing class distributions for improved accuracy.

classification-models credit-card-prediction data-analysis data-classification decision-tree-classifier imbalanced-datasets machine-learning model-evaluation python scikit-learn

Last synced: 05 May 2026

https://github.com/aryar-06/linear-regression

A Python project demonstrating basic linear regression with gradient descent and matrix operations, alongside scikit-learn comparison.

data-analysis data-preprocessing educational-project gradient-descent linear-regression machine-learning python regression-algorithms scikit-learn

Last synced: 05 May 2026

https://github.com/meinhere/dicoding-analisis-data

Submission Analisis Data dengan tema E-Commerce Streamlit App

data-analysis data-mining e-commerce python streamlit

Last synced: 05 May 2026

https://github.com/benjaminrose/data-analysis-book

A Jupyter Book for my Spring 2025 PHY 5381 class on Data Analysis

book data-analysis data-science data-visualization jupyter-book open-book python r statistics-course

Last synced: 06 May 2026

https://github.com/ryuzen6/bangalore-real-estate-price-prediction

This is a Data Science Project which predicts the cost of Real Estate in Bangalore. Requirements: Jupyter Notebook (for Data Cleaning and creating the Linear Regression using various python libraries) , Pycharm (python IDE for creating Python Flask Server), Visual Studio Code (to create the UI with HTML, CSS and Javascript).

css3 data-analysis data-science html5 javascript jupyter-notebook machine-learning python3

Last synced: 06 May 2026

https://github.com/syarwinaaa09/exploring-nyc-public-school-test-result-scores

📊 analyzing NYC school test scores with python 🐍 to spot top performers 🏆 & trends 📈

data-analysis education pandas python visualization

Last synced: 06 May 2026

https://github.com/sankaran-s2001/us-traffic-accidents-analysis-python-eda

Exploratory data analysis of US traffic accidents from 2016-2023, analyzing patterns by time, location, weather, and severity using Python data science libraries.

data-analysis data-science data-visualization eda matplolib numpy pandas python

Last synced: 06 May 2026

https://github.com/abhinav330/customer-behavior-analysis-linear-regression

This repository explores customer behavior data for an NYC clothing company with both a mobile app and website. They want to understand which platform drives higher sales.

data-analysis data-science data-visualization eda exploratory-data-analysis jupyter jupyter-notebook linear-regression machine-learning machine-learning-algorithms machinelearning-python numpy pandas python regression-analysis

Last synced: 06 May 2026

https://github.com/harryrlk/data_analysis_showcase

This repository showcases my data analysis and visualization projects using Excel, Python, R, and Tableau. Some projects are under NDA, so key figures and specific numbers are not included, but brief overviews and methodologies are provided. Feel free to explore and contact me for further details.

data-analysis data-science data-visualization excel portfolio python r tableau

Last synced: 06 May 2026

https://github.com/superpandas-ai/superpandas

Adding LLM integration to Pandas library

ai data-analysis llm pandas

Last synced: 06 May 2026

https://github.com/josepablodmg/python--linear-regression-advertising

A linear regression analysis to predict sales based on advertising spending across TV, radio, and newspaper channels. The project includes exploratory data analysis, model training, coefficient visualization, and residual analysis.

advertising data-analysis exploratory-data-analysis linear-regression machine-learning python regression scikit-learn visualization

Last synced: 06 May 2026

https://github.com/korniichuk/pydatan-homework

Python Data Analysis course homework

course data-analysis data-analysis-python python python3

Last synced: 06 May 2026

https://github.com/karlyndiary/coffee-shop-sales-analysis

Comprehensive analysis of coffee shop sales utilizing Pandas for data cleaning and exploratory data analysis (EDA), complemented by Streamlit for creating interactive data visualization dashboards.

data-analysis data-cleaning data-preprocessing data-visualization eda pandas streamlit streamlit-dashboard

Last synced: 07 May 2026

https://github.com/badranalyst/exploratory-data-analysis-on-salaries-dataset

Performing EDA on a dataset related to salaries, exploring relationships between factors like job titles, industries, and locations. Insights are visualized with plots to identify trends and disparities in salary data.

data-analysis dataset eda exploratory-data-analysis pandas python

Last synced: 07 May 2026

https://github.com/rohansoni45/whatsapp-chat-analysis

This project involves analyzing WhatsApp chat data to extract valuable insights. Using Python and various libraries like Pandas and Matplotlib, the project processes and visualizes chat statistics such as message frequency, most active participants, and sentiment analysis.

chat-analysis data-analysis data-science matplotlib pandas python sentiment-analysis streamlit visualization web-app word-cloud

Last synced: 07 May 2026

https://github.com/warazkhan/airplane-crashes-and-fatalities-since-1908-

This project analyzes airplane crash data (1908 - 2008)✈️📊 to uncover trends in aviation accidents, fatalities, and safety improvements. Using exploratory data analysis (EDA) and data visualization, we examine key factors influencing crashes, identify high-risk regions, and explore advancements in aviation safety.

data-analysis data-visualization exploratory-data-analysis

Last synced: 10 Jun 2026

https://github.com/jpgiant/gujaratrainfallanalysis_2021

Analysis about the rainfall that occurred in the districts of Gujarat state in 2021

data-analysis exploratory-data-analysis exploratory-data-visualizations matplotlib numpy pandas-python python

Last synced: 07 May 2026

https://github.com/satyam4229/identify-employee-attrition

This is the model where we predict the attrition of the employees of the company by checking there records and all. In the given dataset, we have the features like salary, environment, age, gender and their experience.

data-analysis data-science data-visualization jupyter-notebook kaggle python

Last synced: 08 May 2026

https://github.com/fahamidur/cuisine-analysis

This project analyzes recipes from AllRecipes.com to reveal global cooking patterns, nutritional trends, and cultural food differences, offering data-driven insights for food enthusiasts and researchers.

beautifulsoup data-analysis datavisualization pandas selenium tableau-public webscraping

Last synced: 08 May 2026

https://github.com/samjoesilvano/password_strength_prediction_using_nlp

Developed a predictive model to categorize passwords as Strong, Good, or Weak, enhancing security and reducing breach risks. The project involves cleaning and analyzing data from an SQL database, using the TF-IDF technique for transformation, and implementing a Logistic Regression model to achieve accurate classifications.

data-analysis data-classification data-cleaning data-visualization logistic-regression machine-learning natural-language-processing pandas password-security password-strength python scikit-learn sql tf-idf

Last synced: 08 May 2026

https://github.com/shelton-beep/trading-algorithm

A simple trading algorithm for SPY ETF using a moving average crossover strategy. This project analyzes SPY weekly price data, implements a buy/sell algorithm, and tracks performance metrics to evaluate profitability and risk. Ideal for learning algorithmic trading basics and financial data analysis.

data-analysis financial-analysis investment-strategy jupyter-notebook pandas python quantitative-finance technical-analysis time-series-analysis trading-strategies

Last synced: 08 May 2026

https://github.com/sumit-sinha9/sales-analysis

Analyzing 12 months worth fo Sales data

data-analysis pandas python visualization

Last synced: 08 May 2026

https://github.com/satvikpraveen/numpymasterpro

A hands-on, production-ready toolkit to master NumPy — from first principles to real-world applications. Includes modular Jupyter notebooks, reusable utility scripts, cheatsheets, and advanced projects like K-Means clustering from scratch.

broadcasting data-analysis data-science data-source data-visualization jupyter-notebook kmeans-clustering linear-algebra machine-learning matrix-algebra numerical-computation numpy numpy-broadcasting numpy-examples numpy-tutorial open-source python scientific-computing standardization vectorization

Last synced: 08 May 2026

https://github.com/l1ght14/customer-churn-prediction

Predict customer churn using machine learning models like Logistic Regression and Random Forest. Includes data preprocessing, model evaluation, feature importance, and insights to drive retention strategies.

churn-prediction classification customer-churn customer-churn-prediction data-analysis logistic-regression machine-learning python random-forest scikit-learn telecom

Last synced: 09 May 2026

https://github.com/drod75/burger_king_analysis

A simple analysis on a burger king dataset.

data-analysis data-visualization jupyter-notebook pandas python seaborn

Last synced: 09 May 2026

https://github.com/rizkipragustono/data_analysis_spark

Exploration: Data Analysis using Spark

apache-spark data-analysis pyspark python spark-sql sql

Last synced: 09 May 2026

https://github.com/emanoelcampos/python-onemonth

This repository contains educational materials and projects developed during a Python course offered by OneMonth. It covers Python basics, intermediate concepts, web development with Flask, and data analysis with pandas. The course is structured into weeks, each focusing on a different aspect of Python programming and its applications.

data-analysis flask jupyter-notebook onemonth python python3

Last synced: 09 May 2026

https://github.com/zxjahid/matplotlib

A comprehensive guide to mastering data visualization with Matplotlib through hands-on examples and advanced techniques. 🚀📊

candlestick candlestick-chart cheatsheet data-analysis data-visualization gtk jupyter-notebook maps matplotlib-python pandas thesis-template tk tutorial wx

Last synced: 09 May 2026

https://github.com/master-helix/ibm-data-analyst-certification-stock-analysis-project

This is a mini project repository of my IBM Certification involving stock analysis and plotting of Tesla and GameStop

analytics data data-analysis data-visualization ibm matplotlib pandas python web-scraping

Last synced: 09 May 2026

https://github.com/zenithclown/finfolio

A Personal Finance Management Tool for the Developers, by the Developer

data-analysis data-science finance finance-application finance-management good-habits personal-finance portfolio

Last synced: 04 Feb 2026

https://github.com/gaboelc/analysis-of-the-employment-situation-in-costa-rica-2018-2022

This is an analysis with data extracted from the INEC in order to identify the changes that occurred in the Costa Rican labor market before, during and after the COVID-19 pandemic.

costa-rica data-analysis empleo employment

Last synced: 24 Mar 2025

https://github.com/marvinmarnold/oipm_stop_search

OIPM's analysis on Stop & Search (frisk) activity by the New Orleans Police Department.

data-analysis frisk new-orleans oipm police search stop

Last synced: 22 Jul 2025

https://github.com/datastalker/survival-cox

This repository contains an R script for performing survival analysis on breast cancer surgery data from the University of Chicago's Billings Hospital. The analysis includes Kaplan-Meier estimation and Cox Proportional Hazards modeling to assess patient survival.

breast-cancer-prediction cox-model data-analysis data-science data-visualization epidemiology kaplan-meier r survival-analysis

Last synced: 02 Apr 2025

https://github.com/salma-mamdoh/exploring-the-evolution-of-linux-project

My Project to learn the Basics of Analysis on DataCamp

data-analysis datacamp pandas python time-series-analysis

Last synced: 09 May 2026

https://github.com/sdley/cas_pratiques_a_rendre

Exercices pratiques de traitement de données avec python.

data-analysis pandas python

Last synced: 09 May 2026

https://github.com/monish-nallagondalla/algerian_forest_fires

This project predicts forest fires in Algeria using machine learning models . The dataset includes various meteorological and environmental features such as temperature, humidity, and wind speed. The app cleans the data and builds models to predict the likelihood of forest fires based on historical data and environmental conditions.

data-analysis data-science datacleaning flask forest-fire-prediction machine-learning meteorological-data python regression-models ridge-regression

Last synced: 09 May 2026

https://github.com/prgermux/defect-finder

Defect Finder is an interactive Python-based GUI application for detecting and analyzing mechanical and non-mechanical defects in data. It provides defect visualization, periodicity analysis, and statistical insights, making it ideal for research and quality control workflows.

data-analysis defect-detection gui pyqt5 python quality-control statistics visualization

Last synced: 24 Mar 2025

https://github.com/yeopster/datascience_notebook

Compilation of my Notebook based on Kaggle Dataset

data-analysis data-science kaggle notebook python

Last synced: 10 May 2026

https://github.com/shridhar1504/rafik-s-kitchen-data-analysis

The Project is about the Analysis of the Sales and Expenses Data of a Famous Fast-food Restaurant. This mainly focuses on gaining Insights that will boost the Future Sales and also Business Strategies it Improve the Profit Margins. Handled Tools are SQL, Python, Power BI, MS Office Tools.

business-analytics business-intelligence data-analysis data-analytics data-visualization eda ms-office powerbi-report powerpoint-presentations python sql-server

Last synced: 10 May 2026

https://github.com/imrandil/sql_practice_with_analysis

SQL practice using postgres db and docker as a tool to setup postgres, loving the sql way

data-analysis docker markdown postgres sql

Last synced: 10 May 2026

https://github.com/vikktor93/datascience-spotify

Analysis of Spotify dataset containing the top songs currently trending for over 70 countries.

data-analysis data-science data-scientist jupyter-notebook kaggle matplotlib pandas seaborn

Last synced: 10 May 2026

https://github.com/greenpau/esqrunner

Run Elasticsearh queries and create metrics based on the result of the queries in Elasticsearch database.

data-analysis elasticsearch query-builder querydsl

Last synced: 10 May 2026

https://github.com/szuzick/us-immigration-presidential-analysis

Power BI dashboard analyzing 40 years of U.S. immigration data across presidential administrations (1981-2020)

dashboard data-analysis data-visualization government-data immigration powerbi powerbi-dashboards powerbi-visuals presidential-analysis

Last synced: 10 Jun 2026

https://github.com/vikpires/ds_tips-dataset

Projeto individual do bootcamp de ciência de dados avanti 2024.2, com o objetivo de analisar e observar padrões no conjunto de dados "Tips".

data-analysis data-science data-visualization exploratory-data-analysis jupyter-notebook matplotlib numpy pandas python seaborn tips

Last synced: 17 Sep 2025

https://github.com/pipe199x/end-to-end-prediction-california

End-to-end prediction project using various technologies to predict housing prices in California.

california-housing data-analysis machine-learning python

Last synced: 11 May 2026

https://github.com/monarch1108/customerinsights-kmeans

understanding customers using KMeans and RFM(recency, frequency & monetary) analysis

data-analysis data-visualization kmeans-clustering machine-learning matplotlib numpy pandas scikit-learn

Last synced: 11 May 2026

https://github.com/szuzick/hr-analytics-pipeline

End-to-end HR analytics solution using PostgreSQL, dbt, and Power BI

data-analysis data-visualization database-maintenance dbt hr-analytics insights postgresql powerbi sql

Last synced: 10 Jun 2026

https://github.com/parthds02/customer-segmentation-with-kmeans-clustering

Analyze customer behavior using Python and KMeans Clustering on transactional data. Features RFM analysis, data cleaning, clustering insights, and actionable visualizations to support business decision-making.

data-analysis data-visualization feature-engineering kmeans-clustering numpy pandas vscode

Last synced: 11 May 2026

https://github.com/hrosicka/czechpopulationestimation

This GitHub repository contains Python code for data analysis and population prediction in the Czech Republic up to the year 2050. The code is written in Python and utilizes the Pandas and Matplotlib libraries.

data-analysis data-visualization matplotlib matplotlib-figures matplotlib-pyplot pandas pandas-dataframe pandas-library pandas-python python python3

Last synced: 11 May 2026

https://github.com/analysisbyvivek/crime-data

Analyzes crime patterns across different areas, exploring factors such as crime type, weapon usage, demographic influences, and geographic distribution to uncover trends in frequency, correlations, and hotspots.

apache-superset data-analysis eda jupyter-notebook python

Last synced: 11 May 2026

https://github.com/soyuid/bakery-data-analyst

# About the Project This Bakery Data Analysis project was created to help bakery owners understand their sales patterns. With in-depth data analysis, it is expected to provide useful insights to improve sales and operational strategies.

bakery data-analysis python sales visualization

Last synced: 24 Mar 2025

https://github.com/OdessaZ/Portfolio-Projects

This is a repository I have created to showcase skills, share projects and track my progress in Data Analytics and Data Science

applied-mathematics data-analysis data-science excel jupyter-notebook matplotlib-pyplot pandas portfolio python r r-studio seaborn sql statistics

Last synced: 12 May 2026

https://github.com/jayita11/customer-engagement-insights-for-yelp-restaurant-business-success

This project analyzes Yelp restaurant data using SQLite, Python, and Tableau to explore user engagement, reviews, and ratings. It provides insights into restaurant success across cities, regions, and user behavior.

customer-engagement data-analysis interactive-visualizations json python ratings review sqlite3 tableau-dashboards-for-data-visualization yelp-restaurants

Last synced: 12 May 2026

https://github.com/ggarciajavier/udacity-dalf-project2-wrangle-openstreetmap-data

Work performed for the 2nd project of Udacity Data Analyst Nanodegree: OpenStreetMap data wrangling and analysis.

data-analysis openstreetmap python sql

Last synced: 12 May 2026

https://github.com/min-thway-htut/r-programming

Repository for R-Programming

data-analysis r-programming

Last synced: 10 Jun 2026

https://github.com/leopeng1995/neuralsql

Make DataStore More Intelligent

data-analysis mongodb sql

Last synced: 12 May 2026