An open API service indexing awesome lists of open source software.

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/lyubov0406/data_analyst_portfolio

В репозитории собраны пет-проекты, демонстрирующие мои навыки в аналитике данных

data-analysis matplotlib numpy pandas portfolio python scipy seaborn sql tableau visualization

Last synced: 09 Apr 2026

https://github.com/PanosChatzi/Healthcare_and_Bioinformatics_Analyses

This repo contains the final assignments of the Data Analyst bootcamp by Workearly. Python and SQL were used to complete the assignments.

data-analysis data-cleaning data-visualisation jupyter matplotlib pandas python seaborn

Last synced: 05 Aug 2025

https://github.com/badranalyst/restaurant-reviews-sentiment-analysis-nlp-case-study

This project analyzes restaurant reviews using Natural Language Processing (NLP) for sentiment analysis. It covers data exploration, pre-processing (NLTK text cleaning), model building, prediction, and deployment. The goal is to predict sentiment from reviews using Python libraries such as Pandas, NumPy, Matplotlib, and Seaborn.

data-analysis data-science eda exploratory-data-analysis matplotlib-pyplot model model-building numpy pandas pre-processing predictive-modeling python seaborn

Last synced: 13 Apr 2026

https://github.com/guilherme-marcello/r-data-analysis-piechart

Reading RDS files, processing and presentation in pie charts

data-analysis data-visualization pie-chart r

Last synced: 13 Jul 2025

https://github.com/deliprofesor/customerseg-customer-segmentation-and-shopping-analysis

This project performs data exploration, segmentation, and modeling of wholesale customer data using clustering algorithms, PCA, and decision trees to analyze purchasing behavior and predict customer channel preferences.

clustering customer-segmentation data-analysis data-visualization dbscan decision-tree gmm kmeans machine-learning pca

Last synced: 24 Jun 2025

https://github.com/techshot25/graduateadmissions

Looking at the probability of being accepted in a graduate program using a machine learning model

bayesian-regression correlation-matrices data-analysis data-science linear-regression machie-learning random-forest-regression regression ridge-regression

Last synced: 25 Feb 2025

https://github.com/srinibas-masanta/hotel-revenue-analysis-dashboard

This project focuses on analyzing hotel booking data to uncover key metrics and insights that drive revenue management decisions. By creating an interactive Power BI dashboard, the project aims to improve strategic decision-making, optimize occupancy rates, and enhance overall financial performance within the hospitality industry.

business-analytics data-analysis data-science data-visualization dax-functions hospitality powerbi

Last synced: 12 Jan 2026

https://github.com/jhermienpaul/google-data-analytics-program

Hands-on learning materials from the 8-course Google Data Analytics Professional Certificate program, covering foundational data skills, tools, and real-world business problem-solving

bigquery dashboard data-analysis data-analytics data-modeling data-storytelling data-visualization data-wrangling descriptive-analytics diagnostic-analytics etl-pipeline r-programming rstudio sql tableau

Last synced: 13 Jul 2025

https://github.com/kiran-kumar-k3/sales-performance-dashboard

The Sales Performance Dashboard is an interactive Python-based web application that visualizes and analyzes sales data, providing actionable insights through dynamic charts and metrics.

data-analysis python streamlit

Last synced: 20 May 2026

https://github.com/ashwin331133/sql-healthcare-data

This repository contains SQL queries designed to analyze health care data. The queries focus on patient demographics, encounter costs, and flu shot statistics, aiming to provide insights into patient behavior and financial impacts. The datasets include information on patient encounters, flu shots, and hospital admissions.

data-analysis mysql sql

Last synced: 29 Oct 2025

https://github.com/uofuepibio/intro-r-ggplot2-quarto

Introduction to R via ggplot2 and quarto

data-analysis ggplot2 quarto r r-programming rmarkdown workshop

Last synced: 29 Jun 2026

https://github.com/scailfin/rob-webapi-flask

Default RESTful Web API implementation for the Reproducible Open Benchmarks for Data Analysis Platform (ROB) using the Flask web framework.

benchmarks data-analysis reproducibility webapi

Last synced: 17 Mar 2026

https://github.com/archanakokate/bank_term_deposit_prediction

Build a Decision Tree classifier to predict if the client will subscribe to a Term Deposit based on their demographic and behavioral data.

data-analysis data-visualization exploratory-data-analysis machine-learning

Last synced: 14 Sep 2025

https://github.com/myktorijus/retention-cohort

Extracted cohort data using SQL in BigQuery focusing on weekly retention from week 0 to week 6

bigquery data-analysis data-visualization powerbi sql

Last synced: 13 Jul 2025

https://github.com/georgiifirsov/educational-research-work

Educational research project on 3rd year (6th semester). Topic: ARMA models in time series analysis

arma data-analysis jupyter-notebook python time-series time-series-analysis tsa

Last synced: 27 Apr 2026

https://github.com/gui-sitton/carsells

In this project I am an analyst on the Crankshaft List. Hundreds of free vehicle advertisements are published on the site every day. I need to study the data collected over the last few years and determine which factors influence the price of a vehicle.

data data-analysis data-analysis-python data-science data-visualization python

Last synced: 20 May 2026

https://github.com/karanch10/fraudshield

FraudShield is a machine learning credit card fraud detection system that analyzes transaction attributes to identify suspicious activities in real time. Built with Python, SQL, and Django, it provides a user-friendly interface for fraud prediction using OpenBanking APIs and advanced detection techniques. Ideal for businesses and individuals.

data-analysis data-science data-visualization machine-learning python3

Last synced: 20 May 2026

https://github.com/jiteshshelke/codsoft

A repository showcasing three machine learning projects—Titanic Survival Prediction, Movie Rating Prediction, and Iris Flower Classification—completed during CodSoft's Data Science Internship. 🚀

codsoft codsoftinternship data-analysis data-science linear-regression logistic-regression machine-learning machine-learning-algorithms python

Last synced: 20 May 2026

https://github.com/tabibyte/azerbaijani-rapper-lyrics-data-analysis

Lyrics Data Analysis of Azerbaijani Rappers

azerbaijan data-analysis rappers

Last synced: 22 Jul 2025

https://github.com/patricksferraz/aqw-madrid-data-analysis

Interactive analysis and visualization of Madrid's air quality and weather data (2001-2016) using Python, Dash, and Jupyter. Features interactive maps, statistical analysis, and data visualization tools.

air-quality dash data-analysis data-engineering data-science data-visualization data-wrangling environmental-data environmental-science interactive-dashboard jupyter jupyter-notebook madrid open-data pandas plotly python statistical-analysis time-series weather-data

Last synced: 30 Jan 2026

https://github.com/fer-aguirre/cookiecutter-data-analysis-extensive

A cookiecutter template for data analysis projects using Python.

cookiecutter data-analysis project-template python

Last synced: 09 Apr 2025

https://github.com/iwasakiyuuki/data-analysis-platform-airflow-dag

A collection of Airflow DAGs for automating data collection into our on-premises data analysis platform.

airflow airflow-dags data-analysis data-collection

Last synced: 13 May 2025

https://github.com/steviecurran/prediction-plot

Code to performs machine learning (k-nearest neighbours regression) and plot the predicted versus measured values

astrophysics c data-analysis high-redshift machine-learning pgplot python statistics tensorflow visualization

Last synced: 20 May 2026

https://github.com/gappeah/credit-card-transactions-fraud-detection-project

The Credit Card Transactions Fraud Detection Project repository is designed to analyse and detect fraudulent transactions in credit card data.

data-analysis postgresql sql

Last synced: 12 Jul 2025

https://github.com/shrutiijoshi/corporate-campus-hiring-analysis

This project analyzes corporate campus hiring trends for fresh graduates in India.

dashboard data-analysis data-visualization excel powerbi

Last synced: 09 Mar 2026

https://github.com/vetrivel07/flight-price-prediction

Developed a flight price prediction model using Python, analyzing historical data to forecast airfare prices and help travelers make informed booking decisions

data-analysis data-visualization jupyter-notebook numpy pandas python

Last synced: 15 Jun 2025

https://github.com/ahnaf19/clean_bankingdata

Here I tried to practice simple ETL tasks. I know how to perform these tasks in SQL, here just explored my way around using pandas as well.

data-analysis data-cleaning pandas python

Last synced: 19 Apr 2026

https://github.com/lunarwhite/lake-george-viz

Geroge Lake data analysis and visualization, ANU COMP1730/6730

data-analysis python

Last synced: 01 Nov 2025

https://github.com/elissorokin/data-analyst-portfolio

Это репозиторий, в котором я демонстрирую свои навыки, делюсь проектами и отслеживаю прогресс в области анализа данных и Data Science.

ab-testing data data-analysis datalense matplotlib numpy pandas plotly portfolio postgresql python scipy seaborn sql statistical-analysis

Last synced: 09 Apr 2026

https://github.com/farzeennimran/fashion-mnist-dataset-classification-using-neural-network

Implementation of a Multi-layer Perceptron classifier with hyperparameter tuning and k-fold cross-validation employing GridSearchCV for classifying images on the Fashion MNIST dataset 👗👚👖

artificial-intelligence data-analysis data-mining data-science dataset deep-learning fashion-mnist-dataset gridsearchcv hyperparameter-tuning kfold-cross-validation machine-learning multilayer-perceptron-network neural-network numpy pandas python sklearn

Last synced: 03 Apr 2026

https://github.com/nemat-al/multivariate_data_analysis

Tasks for Multivariate Data Analysis Course @ ITMO University

data-analysis multivariate-analysis python

Last synced: 20 May 2026

https://github.com/acerbilab/svbmc

Stacking Variational Bayesian Monte Carlo (S-VBMC) algorithm for combining Variational Bayesian Monte Carlo (VBMC) posteriors to boost inference performance.

bayesian-inference data-analysis machine-learning model-fitting python stacking variational-inference

Last synced: 20 Jan 2026

https://github.com/silasberger/charts-analysis

Data set collection, preprocessing and analysis of singles- and album charts

charts data-analysis data-mining data-science dataset music

Last synced: 14 Sep 2025

https://github.com/buildwithlal/introduction-to-data-science-in-python-coursera

introduction to data science in python, part of Applied Data Science using Python Specialization from University of Michigan offered by Coursera

data-analysis matplotlib numpy pandas

Last synced: 03 May 2026

https://github.com/ranxi2001/predicting-mental-health-risk

数据分析案例-精神健康预测(数据来源kaggle)

data-analysis data-visualization eda

Last synced: 27 Jun 2025

https://github.com/cassiofb-dev/fide-rating-analysis

The plot speaks for itself

chess data-analysis fide hans rating

Last synced: 15 Jun 2025

https://github.com/samruddhi3012/rfm-analysis

Hi there! In this project I have performed Sales Analysis (RFM Analysis) using SQL and Tableau.

data-analysis data-visualization mssqlserver rfm-analysis segmentation tableau

Last synced: 27 Jun 2025

https://github.com/kineticloom/plydb-fun-nfl-analyst

Analyze NFL data with your AI agent

data-analysis football-analytics nfl

Last synced: 15 May 2026

https://github.com/srummanf/elnino-anomaly-study

Study on El Niño’s impact on Chennai groundwater sustainability

data-analysis machine-learning python satellite-imagery-analysis

Last synced: 15 May 2026

https://github.com/theashishmavii/job-trends-analyzer-automation

End-to-end automation: job scraping, data analysis, and trends reporting for job seekers and researchers.

automation beautifulsoup data-analysis open-source pandas python selenium webscraping

Last synced: 07 Aug 2025

https://github.com/245839/automobile-analysis

Analysis of data on imported cars to the USA performed in Python using libraries for data analysis in the Jupyter environment.

data-analysis jupyter-notebook python

Last synced: 20 May 2026

https://github.com/fortunewalla/birdstrikes

birdstrikes database created for postgresql with simple sample queries

birdstrikes csv data-analysis data-science database dataset pgsql postgresql practice sample sql sql-query workshop

Last synced: 02 Oct 2025

https://github.com/vlad1343/data-visualisation

Python project showcasing interactive and static visualizations using Plotly and Matplotlib. It includes analysis of CSV, JSON, and API data, turning complex datasets into clear, insightful charts.

anova api csv-files data-analysis data-visualization json matplotlib matplotlib-pyplot pandas pandas-python plotly python3 seaborn seaborn-python

Last synced: 08 Apr 2026

https://github.com/faizantkhan/python_matplotlib

Matplotlib is a powerful Python library for creating visualizations and plots. It’s widely used for data representation, making complex information more accessible and interpretable. It offers various types of plots, including line graphs, scatter plots, bar charts, histograms, and more

data-analysis data-analytics data-engineering data-science data-visualization deep-learning graphs line machine-learning machine-learning-algorithms matplotlib matplotlib-pyplot matplotlib-python python

Last synced: 20 May 2026

https://github.com/vzamboulingame/data-portfolio

This repository showcases my projects in Python and SQL, highlighting my skills in data analysis & visualization.

data-analysis data-portfolio data-science data-science-portfolio data-science-projects data-visualization jupyter-notebook portfolio python sql

Last synced: 20 May 2026

https://github.com/farhad-here/tegenx

TeGenX: Multilingual Text Generation App.TeGenX is a lightweight, interactive text generation application built with Streamlit. It leverages multiple pre-trained transformer models to generate text in both English and Persian.

data-analysis data-science deep-learning happytransformer huggingface nlp python stream text-generation text-generator textgeneration transformer web-application

Last synced: 25 Jan 2026

https://github.com/gabrielramirezv/rnaseq_2025_notas

Repository for RNA-seq class from the Undergraduate Program in Genomic Sciences.

data-analysis r rna

Last synced: 29 Mar 2025

https://github.com/samruddhi3012/health-care-analytics

Hi! This repo involves analyzing the Healthcare analytics using Advanced Microsoft Excel.

dashboard data-analysis data-visualization healthcare microsoft-excel pivot-chart pivot-tables vlookup

Last synced: 29 Mar 2025

https://github.com/saravanansuriya/energy-consumption-analysis

Project will analyze energy usage and greenhouse gas (GHG) emissions of Ontario's Broader Public Sector (BPS) organizations, leveraging a comprehensive database of reported data in Power Bi

data-analysis data-cleaning powerbi python-script

Last synced: 22 Mar 2025

https://github.com/12danielll/neurogenomics_project

This project focuses on analyzing sequencing data to understand molecular mechanisms of neurological diseases and predict the effectiveness of immunotherapy in breast cancer patients. It integrates Python and R scripts for data processing, statistical analysis, and visualization, alongside a comprehensive report detailing methods and findings.

bioinformatics biostatistics clustering clustering-algorithms data-analysis data-visualization deseq2 differential-gene-expression functional-analysis immune-therapy machine-learning neurological-disease neuroscience pca-analysis python r seurat single-cell-analysis

Last synced: 06 Apr 2026

https://github.com/s-narasimman/zepto_inventory_sql_data_analysis

This project focuses on data cleaning, exploration, and analysis of product information from the Zepto dataset using SQL. It provides actionable insights into pricing, stock availability, discounts, and category-level performance.

aggregation categorization csv data-analysis data-cleaning kaggle postgresql sql zepto

Last synced: 16 May 2026

https://github.com/deborangueira/campeonado_kaggle_2025

Desenvolvimento de um modelo de machine learning para prever o sucesso de startups. O objetivo é identificar quais empresas têm maior probabilidade de se tornarem casos de sucesso no mercado.

computacao data-analysis desafio kaggle modulo3 ponderada

Last synced: 16 May 2026

https://github.com/pabi1234810/data_analysis_zepto

A comprehensive SQL-based business intelligence solution for analyzing grocery store product data, inventory management, and pricing strategies. This project demonstrates end-to-end data analysis workflow from raw data exploration to actionable business insights.

analytics csv data-analysis data-science database excel kaggle kaggle-dataset mathematics pgadmin4 sql utf-8 zepto

Last synced: 01 Nov 2025

https://github.com/alinababer/data-science-and-insight-agent-rag-llama3-lava-llm

Data-Science-and-Insight-Agent-RAG-LLama3-Lava-LLM-Django-WebApplication is an advanced AI-driven chatbot designed to assist in data science, document analysis, and image interpretation. This repository contain the Datascience Agent of this project.

artificial-neural-networks classifcation data-analysis data-engineering data-visualization datascience large-language-models llama2 lstm machine-learning python random-forest regression

Last synced: 01 Jan 2026

https://github.com/an0n1mity/spamclassifiereval

A repository for evaluating the misclassification rate of spam classification models using a threshold-based approach.

data-analysis machine-learning natural-language-processing python-programming spam-classification text-classification

Last synced: 02 Nov 2025

https://github.com/omdoshi13/pricing-of-laptops-using-ml

Data Analysis, training Machine Learning models, and Model Evaluation and Refinement for Pricing of Laptops dataset.

data-analysis data-analysis-project datascience google-colab jupyter-notebook machine-learning matplotlib model-evaluation model-refinement numpy pandas python scikit-learn

Last synced: 09 Apr 2026

https://github.com/jakobzmrzlikar/pca-on-genomes

An analysis of human genome mutations from different populations.

data-analysis genome-analysis pca-analysis

Last synced: 16 May 2025

https://github.com/edanur-y/abalone-age-prediction-with-regression-models

Comparing the performances of simple linear, multiple linear, multi-layer perceptron and k-nearest neighbors regressions on abalone data to predict the age.

data-analysis hyperparameter-tuning missing-values-analysis outlier-analysis python recursive-feature-elimination

Last synced: 20 May 2026

https://github.com/ksharma67/eda-on-ipl

In this python notebook, analysis of IPL matches from 2008 to 2020 is done using python packages like pandas, matplotlib and seaborn.

data-analysis data-science eda matplotlib numpy pandas python seaborn

Last synced: 07 May 2026

https://github.com/jwt218/sinc

MATLAB Standardization and Isotope Normalization for CSIA (with integrated correction and uncertainty quantification)

data-analysis geochemistry isotopes matlab

Last synced: 23 Jun 2025

https://github.com/sebastianofazzino/ibm-data-science-professional-certificate

In this repository I've stored exercises and projects I've been working on while attending IBM Data Science Professional Certificate, using Python and its libraries.

data-analysis data-mining data-science data-structures data-visualization database machine-learning matplotlib numpy pandas python regression seaborn sql

Last synced: 09 Apr 2026

https://github.com/pentalpha/eu-car-emissions-analysis-2015

Analysis of CO² Emissions on Passenger Cars at the E.U. Contries, Year 2015.

data-analysis data-science dataset jupyter-notebook python python3

Last synced: 15 May 2026

https://github.com/pentalpha/bti-performance-study

A series of analysis on a large amount of data about the grades of students in the Technology Information course at UFRN

analysis big-data clustering data-analysis data-science data-visualization ipynb ipython jupyter-notebook performance-analysis plot python python3

Last synced: 15 May 2026

https://github.com/mtholahan/advanced-mysqlquery-tuning-mini-project

Analyzed EuroCup 2016 data with advanced SQL queries. Imported CSV datasets into MySQL, designed schema with match, player, and referee details, and implemented queries covering match outcomes, penalty shootouts, player stats, bookings, substitutions, and referee activity to explore tournament dynamics.

bootcamp data-analysis data-engineering data-modeling database eurocup football mysql queries soccer sports springboard sql

Last synced: 15 May 2026

https://github.com/adrianlardies/multi-asset-financial-analysis

Comparative analysis of bitcoin, gold and S&P 500 in relation to macroeconomic indicators (VIX, interest rate, CPI). We explore the evolution of a $100 monthly investment in these assets, presenting visualizations to evaluate their performance and potential as financial diversification tools.

data-analysis data-science matplotlib pandas python seaborn

Last synced: 09 May 2026

https://github.com/hemant-kumar786/heart-disease-prediction

Heart Disease Analysis project in RStudio using statistical methods and data visualization. Includes data cleaning, exploratory data analysis (EDA), correlation study, and insights on key health indicators influencing heart disease.

correlation-study data-analysis data-visualization eda healthcare heart-disease r rstudio statical-analysis

Last synced: 02 Nov 2025

https://github.com/namratagulati/tweets_analysis

This repository focuses on sentiment analysis of Twitter data using Python, Natural Language Processing (NLP), and the Natural Language Toolkit (NLTK). The goal is to extract valuable insights from social media discussions, such as word frequency, hashtag trends, and sentiment patterns.

analysis data-analysis natural-language-processing nlp-machine-learning nltk-corpus nltk-python sentiment-analysis twitter-sentiment-analysis

Last synced: 07 Aug 2025

https://github.com/jofaval/boston-housing

Regression Analysis into the Boston Housing in-demand pricing in 1978

boston-housing data-analysis data-science data-visualization machine-learning python regression

Last synced: 16 May 2026