Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/mumtaz4118/scraping-medium-and-data-analytics

The file DataExtraction.py extracts information from the json files scrapped by the scrapper medium_scrapper_post.py. To extract information from json files scrapped by medium_scrapper_tag_archive.py (scrapping from tags archive) then use Data_Extraction_Archive_Tags.py

data data-analysis data-analytics data-extraction data-preprocessing data-science data-scraping deep-learning machine-learning python

Last synced: 07 Nov 2024

https://github.com/mumtaz4118/nlp-course

Programming Assignments and Lectures for Stanford's CS 224: Natural Language Processing with Deep Learning

course data data-analysis data-analytics data-science data-visualization deep-learning education machine-learning natural-language-processing neural-network transfer-learning

Last synced: 07 Nov 2024

https://github.com/mumtaz4118/amazon-iphone-12-data-scrapped

Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages.

data-analysis data-extraction data-science data-scraping html mark-up python

Last synced: 07 Nov 2024

https://github.com/mateusoliveira30/top-intelligent-people

This project performs an exploratory analysis of the top_intelligent_people_in_the_world_5000.csv dataset, featuring some of the world's most intelligent individuals. Using pandas and matplotlib, the analysis includes checking for missing values, describing variables, and visualizing data.

data-analysis graphics kaggle-dataset python3

Last synced: 08 Nov 2024

https://github.com/tmmvn/analytics-notebooks

A bunch of data analytics notebooks done testing out JetBrains DataLore

ai algorithms data-analysis datalore elements-of-ai helsinki-university-mooc python

Last synced: 07 Nov 2024

https://github.com/chrispsang/healthcare-dataanalysis

Analyze synthetic patient data to identify trends, improve healthcare delivery, and predict patient outcomes using machine learning models. Includes data exploration, preprocessing, model building, and visualizations.

data-analysis data-science data-visualization healthcare jupyter-notebook machine-learning python

Last synced: 06 Nov 2024

https://github.com/thisisashukla/survival-analysis

Hands-On Survival Analysis in Python

data-analysis data-science survival-analysis

Last synced: 07 Nov 2024

https://github.com/kheriberto/pandas_and_seabron_project

In this project I showcase my ability using pandas and seaborn to mold, transform and plot data.

data-analysis pandas python seaborn

Last synced: 10 Nov 2024

https://github.com/vidyadnina/cyclistic-sql-tableau-project

Trip data analysis for a bike-sharing service company using SQL and Tableau.

bigquery dashboard data-analysis data-analytics-sql data-cleaning data-visualization sql

Last synced: 12 Oct 2024

https://github.com/manjit-baishya-datascience/flipkart-laptop-listing-eda

This project analyzes laptop price data from Flipkart using AutoScraper for web scraping. It includes data loading, EDA, cleaning, statistical analysis, and visualization. The goal is to derive insights for pricing strategies and market positioning. Explore the repository for detailed documentation and code.

data-analysis ecommerce-platform flipkart laptop python

Last synced: 13 Nov 2024

https://github.com/mrsamsonn/monolithic-polylithic-crystal-segmentation

A grid segmentation algorithm for clustering crystal structures using diffraction patterns. Useful in material science and nanotechnology, this code enables detailed analysis of crystals for research and industrial applications.

clustering crystal-structure crystallography data-analysis diffraction-patterns grid-segmentation image-processing k-means machine-learning matertial-science nanotechnology python research-project research-tools scientific-computing

Last synced: 02 Nov 2024

https://github.com/mlund2k/project-1-baseball-performance-vs.-attendance

Project assets for my first exploratory data analysis: Baseball Performance vs. Attendance.

bigquery data-analysis data-cleaning data-visualization excel rstudio sql tableau tidyverse

Last synced: 12 Oct 2024

https://github.com/darkdk123/handwashing-discovery-analysis

A Guided Project in a Boot camp to Analyse the Original Data used in the Discovery of Viruses & Hand Washing By Dr. Ignaz Semmelweis in Vienna General Hospital in the 1840s.

data-analysis data-science data-visualization matplotlib-pyplot numpy pandas plotly-python python seaborn-plots

Last synced: 07 Nov 2024

https://github.com/darkdk123/house-valuation-model

A Challenge Project in a Boot-Camp to create a ML Model to predict the prices of houses in Boston Massachusetts from multiple parameters Using Multivariable Regression.

data-analysis data-science data-visualization matplotlib-pyplot multivariate-regression predictive-modeling statistics

Last synced: 07 Nov 2024

https://github.com/jbn/vaquero

A Python library for iterative and interactive data wrangling at laptop-scale.

data data-analysis data-cleaning data-mining dirty-data elt etl etl-framework

Last synced: 13 Nov 2024

https://github.com/thesfinox/fit-the-data

Data analysis using Wolfram Mathematica

analysis data data-analysis lab mathematica wolfram wolfram-mathematica

Last synced: 06 Nov 2024

https://github.com/thesfinox/mltools

A collection of simple tools for data science and machine learning projects.

ai data-analysis data-science data-visualization logging machine-learning matplotlib neural-network python toolbox

Last synced: 06 Nov 2024

https://github.com/spaghettifunk/gvb

Analysis of GVB in Amsterdam

data-analysis public-transportation

Last synced: 08 Nov 2024

https://github.com/virajbhutada/telecom-customer-churn-prediction

Predict and prevent customer churn in the telecom industry with this project. Harness the power of advanced analytics and Machine Learning on a diverse dataset to develop a robust classification model. Gain deep insights into customer behavior and identify critical factors influencing churn using interactive Power BI visualizations.

churn-prediction classification-models customer-attrition-analysis customer-churn-prediction data-analysis data-science decision-tree-classifier eda logistic-regression machine-learning machine-learning-algorithms machine-learning-models pandas powerbi powerbi-desktop python random-forest-classifier roc-curve xgboost-classifier

Last synced: 11 Nov 2024

https://github.com/virajbhutada/article-clustered-recommendation-system-ml

This project aims to redefine content discovery by delivering personalized article recommendations tailored to individual user preferences. We use advanced machine learning techniques like PCA and K-means clustering to analyze user behavior and article characteristics to provide highly accurate recommendations.

anaconda article-recommendation clustering-algorithm data-analysis data-science keras-tensorflow machine-learning machine-learning-algorithms ml-models numpy pandas plotly python scikit-learn scipy

Last synced: 15 Oct 2024

https://github.com/vaishnavipaithane/bellabeat-data-analysis-case-study

This capstone project was done as a part of Google Data Analytics Professional Certificate course.

bigquery data-analysis sql tableau

Last synced: 12 Oct 2024

https://github.com/marcogdepinto/olympichistoryanalysis

Python visual analysis of the Olympic Games history. Kaggle gold medal with 15000+ views, 200+ upvotes and 100+ comments.

data-analysis data-science jupyter-notebook olympic-games python seaborn

Last synced: 10 Nov 2024

https://github.com/virajbhutada/amazonprime-tableaudashboard

Uncover valuable insights into Amazon Prime content and trends through an interactive Tableau dashboard. Delve into a comprehensive analysis of shows, movies, and user engagement data. Utilize interactive visualizations to gain a deeper understanding of the Amazon Prime platform.

content-types data-analysis data-visualization streaming-analytics tableau user-engagement

Last synced: 11 Nov 2024

https://github.com/madeiradata/microsoft-data-analysts-club

Open-source Repository of Useful Scripts and Solutions for Microsoft Data Analysts

data-analysis data-visualization microsoft-data-analysis powerbi powerbi-report

Last synced: 06 Nov 2024

https://github.com/kwonnayeon/medium-post-projects

Contains the code and projects from my Medium posts. I share what I've learned through trial and error to help others tackle similar work smoothly.

data-analysis data-science data-visualization medium-articles python r-language sql

Last synced: 13 Nov 2024

https://github.com/ljadhav25/django-data-analyzer

Django Data Analyzer is a web application built using the Django framework, designed to streamline data analysis tasks. Users can upload CSV files containing data for analysis. The application utilizes the powerful data manipulation capabilities of Python libraries like pandas and numpy to perform various analyses on the uploaded data.

data-analysis data-visualization django-application matplotlib numpy pandas python seaborn

Last synced: 10 Nov 2024

https://github.com/sunsided/esc2024

Exploratory Data Analysis on the ESC 2024 results

csv data-analysis eurovision-song-contest scraping

Last synced: 02 Nov 2024

https://github.com/pranav016/exploratory-data-analysis-of-google-app-store-dataset

This is a data analysis done on the Google app store dataset to answer a few questions related to the data through data visualization techniques.

data-analysis

Last synced: 05 Nov 2024

https://github.com/ksharma67/eda-on-ipl

In this python notebook, analysis of IPL matches from 2008 to 2020 is done using python packages like pandas, matplotlib and seaborn.

data-analysis data-science eda matplotlib numpy pandas python seaborn

Last synced: 06 Nov 2024

https://github.com/buildwithlal/introduction-to-data-science-in-python-coursera

introduction to data science in python, part of Applied Data Science using Python Specialization from University of Michigan offered by Coursera

data-analysis matplotlib numpy pandas

Last synced: 08 Nov 2024

https://github.com/pranav016/exploratory-data-analysis-of-sp500-dataset

This a data-analysis that I performed on the S&P 500 dataset and answered a few questions through data visualization techniques.

data-analysis

Last synced: 05 Nov 2024

https://github.com/maazie-khan/olympics-data-enigeering

Worked with Azure Data Factory, Databricks, Data Lake Storage, and Synapse Analytics to build an ETL pipeline for processing and analyzing Olympic Games data from Kaggle.

azure big-data data-analysis dataengineering devops pipeline

Last synced: 14 Oct 2024

https://github.com/faezeh-gholamrezaie/visual-google-scholar-search

A Python script that searches Google Scholar for specific keywords and visually presents the results in various chart formats, enabling researchers to analyze trends and insights in academic literature.

academic academic-research academic-trends ai ai-research bibliometrics data-analysis data-visualization google-scholar publication-analysis python research-trends scholarly scholarly-data word-cloud

Last synced: 07 Nov 2024

https://github.com/subratamondal1/heart-attack-prediction

Heart Attack Prediction of patients based on the required data. Data Ingestion - Data Preparation - Exploratory Data Analysis (EDA) - Modelling - Evaluation.

data-analysis data-science data-visualization kaggle-dataset machine-learning matplotlib-pyplot numpy pandas python3 scikit-learn seaborn

Last synced: 12 Nov 2024

https://github.com/findmyway/dataframe-in-julia

A quick introduction of DataFrame in Julia for users from Python

data-analysis dataframe julia jupyter-notebook

Last synced: 12 Oct 2024

https://github.com/lewismakau/portfolio-projects

This repository contains file data and SQL files for projects used for my Portfolio.

data-analysis data-cleaning data-structures data-visualization database google-analytics microsoft-sql-server mysql powerbi tableau

Last synced: 12 Oct 2024

https://github.com/the-pinbo/dimensionalityredux-pca-vs-autoencoders

Comparative study of PCA and Autoencoders for effective dimensionality reduction, assessed through PSNR and SSIM metrics.

autoencoder-mnist autoencoders data-analysis dimensionality-reduction image-compression mnist neural-networks pca psnr ssim

Last synced: 06 Nov 2024

https://github.com/abidshafee/google.colaboratory_projects

This repository contains the collections of interactive python notebooks (ipynb) that are some of my projects on Data Science, Machine Learning (ML), and Natural Language Processing (NLP).

colaboratory data-analysis data-science lstm machine-learning nlp statistics time-series

Last synced: 06 Nov 2024

https://github.com/nickenshidqia/sql-for-financial-data-analysis

Design SQL queries to generate accurate and timely financial reports including Profit and Loss statements, Balance Sheets, and Cash Flow statements

azure-data-studio data-analysis finance microsoft-sql-server sql

Last synced: 12 Oct 2024

https://github.com/emanoelcampos/python-onemonth

This repository contains educational materials and projects developed during a Python course offered by OneMonth. It covers Python basics, intermediate concepts, web development with Flask, and data analysis with pandas. The course is structured into weeks, each focusing on a different aspect of Python programming and its applications.

data-analysis flask jupyter-notebook onemonth python python3

Last synced: 07 Nov 2024

https://github.com/alinenog/desenvolve_gb_2022

Formação Desenvolve 2022 do Grupo Boticário na área de dados

data-analysis data-science googlesheet machine-learning numpy pandas python

Last synced: 06 Nov 2024

https://github.com/1401dev/iowa-liquor-retail-sales-analysis

This repository contains the analysis of Iowa liquor retail sales data, aimed at uncovering sales trends and forecasting future sales patterns. The project involves data cleaning, preparation, and advanced time series analysis using Microsoft SQL Server and Google Colab.

customer-behavior data-analysis data-cleaning data-science data-visualization exploratory-data-analysis forecasting google-colab machine-learning microsoft-sql-server pandas prophet python retail-analytics retail-sales sales-forecasting sales-performance sql statsmodels time-series-analysis

Last synced: 12 Oct 2024

https://github.com/alexgenovese/react-charts-covid-19-data

Examples on COVID-19 data using different library charts: G2, G2Plot, Plotly, ApexCharts

data-analysis data-science data-visualization react reactjs

Last synced: 07 Nov 2024

https://github.com/ramapinnimty/udacity-mlfoundation-nanodegree

This is a repository containing solutions to the assignments that are a part of the Udacity Machine Learning Foundation Nanodegree program.

assignments data-analysis python3 statistics udacity-machine-learning-nanodegree

Last synced: 08 Nov 2024

https://github.com/itsmeyogesh22/people-s-analytics-case-study

Part of Danny Ma's virtual apprenticeship online program, "People's Analytics Case Study" aims to demonstrate practical use of various SQL Concepts like Materialized Views, Snapshot Data and Historical Data

danny-ma data-analysis dataanalysis datascience mssqlserver pgadmin4 postgresql snapshot-data sqlserver t-sql

Last synced: 12 Oct 2024

https://github.com/zimmi48/nixpkgs-issues

Analysis on nixpkgs issue lifetime.

data-analysis github-api nixpkgs

Last synced: 06 Nov 2024

https://github.com/isaacmaffeis/imad-2023

Model Identification and Data Analysis (IMAD) | University course

data data-analysis data-science model model-identification

Last synced: 12 Nov 2024

https://github.com/baguilar6174/python-jupyter-notebooks

Explore data analysis projects with Python, Jupyter and more tools. Discover stunning visualizations and reveal meaningful information in datasets to make informed decisions.

data-analysis jupyter-notebook kaggle pandas python

Last synced: 07 Nov 2024

https://github.com/shreeparab1890/chat-analyzer

This project is a Data Analysis project to analyze the WhatsApp chats.

data-analysis numpy pandas python

Last synced: 08 Nov 2024

https://github.com/prakashjha1/new-analysis-using-llm-locally

An interactive news analysis tool built with Streamlit and local LLMs. This app allows users to analyze and gain insights from the latest news articles using advanced language models, all running locally. Explore trends, sentiment, and key topics with an intuitive interface.

artificial-intelligence data-analysis data-science llms ollama python streamlit

Last synced: 12 Oct 2024

https://github.com/shreeparab1890/flipkart-laptops-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Laptops listed on Flipkart.

data-analysis eda exploratory-data-analysis matplotlib-pyplot numpy pandas plotly

Last synced: 08 Nov 2024

https://github.com/shreeparab1890/indian-elections-2019-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Indian Lok Sabha Elections 2019.

data data-analysis data-science data-visualization eda exploratory-data-analysis matplotlib numpy pandas plotly python python3 visualization

Last synced: 08 Nov 2024

https://github.com/shreeparab1890/unicorns-of-india-till-sep-2022-analysis-eda

This ipython notebook is the Exploratory data analysis (EDA) of the Unicorns of India till Sep 2022.

analysis data-analysis eda exploratory-data-analysis matplotlib-pyplot numpy pandas plotly

Last synced: 08 Nov 2024

https://github.com/datalopes1/desafio_delivery

Desafio do Clube de Assinaturas da Universidade dos Dados para simular as demandas reais de um analista de dados

data-analysis jupyter python

Last synced: 11 Oct 2024

https://github.com/tj2904/lfb-callout-analysis

An investingation into London FIre Brigade's callout data.

data-analysis decsion-tree kmeans lfb-incidents london-fire-brigade pandas python seaborn

Last synced: 07 Nov 2024

https://github.com/astropenguin/optimap

Optimized integrated intensity map method for spectral cubes

astronomy data-analysis data-science python python3 radio-astronomy spectral-cubes

Last synced: 05 Nov 2024

https://github.com/bineet-ratna-shakya/data-science-salary-analysis

analyzing a dataset containing salaries of data science professionals from 2020 to 2023.

data-analysis data-science data-visualization jupyter numpy pandas python

Last synced: 11 Oct 2024

https://github.com/alinababer/data-science-and-insight-agent-rag-llama3-lava-llm

Data-Science-and-Insight-Agent-RAG-LLama3-Lava-LLM-Django-WebApplication is an advanced AI-driven chatbot designed to assist in data science, document analysis, and image interpretation. This repository contain the Datascience Agent of this project.

artificial-neural-networks classifcation data-analysis data-engineering data-visualization datascience large-language-models llama2 lstm machine-learning python random-forest regression

Last synced: 12 Oct 2024

https://github.com/sehgal-vishal/sql-nyc-collision-analysis

this analysis is based on the Collisions(Accidents) happend in New York City. I have used Sql Server For EDA(Exploratory Data Analysis

data-analysis database eda sql-server

Last synced: 12 Oct 2024

https://github.com/an4pdm/relatorio-de-vendas

O presente projeto foi feito através das ferramentas oferecidas pelo Power BI afim de aprimorar meus conhecimentos sobre ETL. Os dados utilizados foram de origem do site "Kaggle".

data-analysis data-visualization database etl powerbi

Last synced: 13 Nov 2024

https://github.com/bertiewooster/ipywidgets

Interactive data visualizations in a Jupyter Notebook per tutorial https://python.plainenglish.io/interactive-visualizations-with-pandas-seaborn-and-ipywidgets-173e5d7d6a5e

data-analysis data-science data-visualization ipython-notebook ipywidgets juypter-notebook python

Last synced: 13 Nov 2024

https://github.com/dionixius7/titanic-disaster-ml-model

This project predicts the survival of passengers on the Titanic by using Kaggle Titanic Disaster Dataset. The dataset contains information related to passengers, such as age, gender, and class. Different machine learning algorithms have been applied for this predictive model to accomplish an accurate prediction that will define the survival chances

data-analysis data-science data-visualization eda knn-classifier machine-learning neural-network python scikit-learn svm tensorflow titanic-kaggle titanic-survival-prediction

Last synced: 17 Nov 2024

https://github.com/l2nce/datamining-study

Introduction to data mining

data-analysis data-mining matplotlib numpy panda

Last synced: 11 Nov 2024

https://github.com/madhursinghbhadoriya/data_analysis_fifa-players

• Using NumPy, Matplotlib, Pandas, etc processed important Information and Characteristic traits on Jupyter Notebook.

analysis data-analysis data-science graphs jupyter-notebook pandas python

Last synced: 05 Nov 2024

https://github.com/madhursinghbhadoriya/data_analysis_sales_insights_using_tableau

• Performed Data Cleaning using MySQL. • Data analysis and ETL in Tableau. • Created an Interactive Dashboard with significant information about the Sales Insights, Profit and Revenue Analysis.

data-analysis data-visualization dataanalysis etl mysql tableau-dashboards tableau-desktop

Last synced: 05 Nov 2024

https://github.com/changyeop-yang/study-datasciencefoundation

Big Data Science and its Analytics plays a major role in this decade. How to clean and prepare your data for analysis is still a challenge, like How to perform basic visualization of your data, How to model your data, How to curve-fit your data, And finally, how to present your findings and wow the audience

data-analysis ios kyungpook-national-university swift

Last synced: 06 Nov 2024

https://github.com/tsffarias/my-books

Exploratory analysis of my Dataset 'All_the_Books_I_read' which contains all the books I've read

books data-analysis python tableau

Last synced: 05 Nov 2024

https://github.com/balajimohan18/tableau-visualization-project

This repository contains Visualization Projects which is visualized through Tableau Software, by using the visualization we can gain multiple insights and strategies which helps to develop the business for gaining high profit margins and also it provides social values in some cases to reduce damages by calamities.

data-analysis data-science data-visualization exploratory-data-analysis tableau tableau-public

Last synced: 14 Nov 2024

https://github.com/virajbhutada/diamond-price-estimator

This project develops a predictive model to estimate diamond prices based on characteristics like carat, cut, color, and clarity. It covers data preprocessing, feature engineering, model selection, training, and evaluation. The final product is a web app where users can input diamond attributes to get accurate and instant price predictions.

cross-validation css data-analysis data-science-projects data-visualization eda feature-engineering html hyperparameter-tuning jupyter-notebooks machine-learning ml-algorithms model-deployment model-selection performance-optimization predictive-modeling python python-app user-interface

Last synced: 11 Nov 2024

https://github.com/balajimohan18/loan-classification-datascience-project

This project uses machine learning algorithms to predict the classification of loan status. The dataset is loaded and some transformation is done using SQL for getting a proper dataset with some valid informations.

classification data-analysis data-cleaning data-science data-visualization loan-prediction loan-status machine-learning sql supervised-learning

Last synced: 14 Nov 2024

https://github.com/virajbhutada/hr-analytics-excel-sql-tableau-powerbi

Explore a comprehensive HR Analytics portfolio showcasing data analysis and visualization skills. Featuring dashboards in Power BI, Excel, and Tableau, along with SQL queries for deeper insights. A holistic view of expertise in HR analytics, data visualization, and database management. Let's dive into the game of data insights!

data-analysis data-management data-visualization excel hr-analytics interactive-dashboards portfolio-project postgresql powerbi powerbi-visuals sql sql-queries tableau tableau-public

Last synced: 11 Nov 2024

https://github.com/balajimohan18/milk-production-time-series-forecasting-datascience-project

This project uses time series forecasting to predict future milk production. The data used in this project is monthly milk production data from January 1962 to December 1975. The ARIMA (autoregressive integrated moving average) model is used to forecast the milk production. The model is evaluated using various metric.

acf adf data-analysis data-cleaning data-science data-visualization eda exploratory-data-analysis machine-learning pacf seasonality time-series trends

Last synced: 14 Nov 2024

https://github.com/aniketmondal/dataanalysis

Contains cleaning, transformation, and exploratory analysis of various data sets using Python Pandas, NumPy, re, random, etc.

analysis data-analysis data-science pandas python

Last synced: 07 Nov 2024

https://github.com/abdelrahmanbayoumi/titanic-machine-learning-from-disasters

Knowing from a training set of samples listing passengers who survived or did not survive the Titanic disaster, can our model determine based on a given test dataset not containing the survival information, if these passengers in the test dataset survived or not.

data-analysis data-science data-visualization machine-learning pandas

Last synced: 05 Nov 2024