Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/njanakiev/folderstats

Python module that collects detailed statistics from a folder structure

data-analysis filesystem pandas python statistics

Last synced: 06 Nov 2024

https://github.com/alanderex/pydata-pandas-workshop

Material for my PyData Jupyter & Pandas Workshops, I'm also available for personal in-house trainings on request

data-analysis jupyter-notebook pandas visualisation workshop

Last synced: 15 Oct 2024

https://github.com/easonlai/azure_openai_langchain_sample

This repository contains various examples of how to use LangChain, a way to use natural language to interact with LLM, a large language model from Azure OpenAI Service.

azure-openai azure-openai-api azure-openai-service csv data-analysis langchain langchain-python openai python python3

Last synced: 10 Nov 2024

https://github.com/apachecn/pandas-cookbook-code-notes

:book: Pandas Cookbook 带注释源码

code data-analysis notes pandas python

Last synced: 12 Nov 2024

https://github.com/paezha/spatial-analysis-r

Open Educational Resource for teaching spatial data analysis and statistics with R

data-analysis open-educational-resource r r-package r-spatial rstats spatial-data-analysis spatial-statistics statistics

Last synced: 27 Oct 2024

https://github.com/renumics/sliceguard

A library for detecting problematic data segments in structured and unstructured data with few lines of code.

data-analysis data-cleaning data-curation data-exploration data-science data-visualization deep-learning eda exploratory-data-analysis machine-learning python visualization

Last synced: 27 Oct 2024

https://github.com/b0o/apple-autofill-domains

Apple's allowed autofill domains

apple data-analysis github-actions web-scraping

Last synced: 29 Oct 2024

https://github.com/cvjena/libmaxdiv

Implementation of the Maximally Divergent Intervals algorithm for Anomaly Detection in multivariate spatio-temporal time-series.

anomalydetection anomalydiscovery data-analysis data-mining datamining machine-learning machine-learning-library machinelearning time-series timeseries

Last synced: 05 Nov 2024

https://github.com/dask-contrib/dask-awkward

Native Dask collection for awkward arrays, and the library to use it.

columnar-format dask data-analysis data-science data-structure jagged-array python ragged-array

Last synced: 11 Nov 2024

https://github.com/staircase-dev/staircase

A powerful data analysis package based on mathematical step functions. Strongly aligned with pandas.

analysis data-analysis data-structures library numpy pandas python step-function stepfunction

Last synced: 30 Oct 2024

https://github.com/airoldilab/sgd

An R package for large scale estimation with stochastic gradient descent

big-data data-analysis gradient-descent r statistics

Last synced: 12 Oct 2024

https://github.com/404notf0und/FXY

Security-Scenes-Feature-Engineering-Toolkit, Continuous Integration.一款安全数据特征化工具

data-analysis data-mining feature-engineering machine-learning security security-scenes

Last synced: 04 Aug 2024

https://github.com/randyzwitch/streamlit-embedcode

Streamlit component for embedding code snippets such as GitHub gists, CodePen snippets, Gitlab snippets, etc.

data-analysis data-science data-visualization python streamlit streamlit-component

Last synced: 11 Oct 2024

https://github.com/404notf0und/fxy

Security-Scenes-Feature-Engineering-Toolkit, Continuous Integration.一款安全数据特征化工具

data-analysis data-mining feature-engineering machine-learning security security-scenes

Last synced: 07 Nov 2024

https://github.com/jmwoloso/pychattr

Python Channel Attribution (pychattr) - A Python implementation of the excellent R ChannelAttribution library

channel-attribution data-analysis data-science machine-learning python python-channel-attribution rpy2 wrapper

Last synced: 02 Aug 2024

https://github.com/yusufcinarci/data-science-projects

In this repo, there are (beginner-upper) level projects in the field of data science. I will host these projects that I have done in this field every day in this repo. With the hope that it will be useful to those who are interested in the field of data science like me and will just start...

data-analysis data-science data-science-projects jupyter jupyter-notebook python

Last synced: 07 Nov 2024

https://github.com/chiphuyen/metrotwitter

What Twitter reveals about the differences between cities and the monoculture of the Bay Area

data-analysis data-visualization emojis nlp nlp-datasets python twitter twitter-dataset

Last synced: 08 Nov 2024

https://github.com/okfn-brasil/serenata-notebooks

Notebooks from Operação Serenata de Amor | ** Este repositório não recebe atualizações frequentes **

data-analysis ipynb jupyter-notebook python

Last synced: 11 Oct 2024

https://github.com/wrinth/data_analyst_projects

Projects created from Udacity's Data Analyst Nanodegree

data-analysis python udacity-nanodegree

Last synced: 05 Nov 2024

https://github.com/spratiher9/sparkora

Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟

apache apache-spark data data-analysis data-analysis-python data-analytics easy-to-use eda exploratory-data-analysis open-source opensource pyspark python python3 toolkit

Last synced: 27 Oct 2024

https://github.com/AllenInstitute/openscope_databook

OpenScope databook: a collaborative, versioned, data-centric collection of foundational analyses for reproducible systems neuroscience 🐁🧠🔬🖥️📈

dandi-archive data-analysis data-visualization nwb python reproducible-research visualization

Last synced: 12 Nov 2024

https://github.com/districtdatalabs/cultivar

Multidimensional data explorer and visualization tool.

data-analysis data-exploration data-management visualization

Last synced: 11 Nov 2024

https://github.com/DistrictDataLabs/cultivar

Multidimensional data explorer and visualization tool.

data-analysis data-exploration data-management visualization

Last synced: 08 Nov 2024

https://github.com/VUKOZ-OEL/3d-forest

Visualization, processing and analysis of Lidar point clouds, mainly focused on forest environment. New version of 3D Forest. Process files with terabytes of data. Edit new point attributes. Simple addition of new features by plugins.

3d classification cpp cross-platform data-analysis desktop-application editor forest gui interactive-visualization las laser-scanning lidar opengl plugins point-cloud qt scientific-computing segmentation tree

Last synced: 03 Aug 2024

https://github.com/angelfp/visualpic

Data Visualization for Particle-in-Cell Codes.

data-analysis data-visualization openpmd particle-in-cell python vtk

Last synced: 09 Nov 2024

https://github.com/isxcode/spark-yun

Big data computing platform based on Spark <至轻云-打造大数据计算平台/数据中台>

data-analysis docker platform spark

Last synced: 04 Sep 2024

https://github.com/ulikoehler/uliengineering

A python library for calculations perfomed in electronics engineering

data-analysis data-science electronics engineering python

Last synced: 26 Oct 2024

https://github.com/palewire/first-python-notebook

A step-by-step guide to analyzing data with Python and the Jupyter notebook.

altair data-analysis data-journalism education journalism jupyter jupyter-notebook jupyterlab news pandas python sphinx tutorial

Last synced: 31 Oct 2024

https://github.com/cosmoduende/r-youtube-personal-history-analysis

Explore your activity on YouTube with R: How to analyze and visualize your personal pata history. Find out how you consume YouTube using a copy of your personal data from Google Takeout.

data-analysis data-analytics data-visualization data-viz google-takeout r-data r-language r-programming youtube youtube-accounts youtube-analytics youtube-api youtube-data youtube-data-analysis youtube-data-api youtube-data-api-v3 youtube-data-scraping youtube-dataset youtube-scrape youtube-scraper

Last synced: 07 Nov 2024

https://github.com/asnelt/mixedvines

Python package for canonical vine copula trees with mixed continuous and discrete marginals

c-vines copula copula-models copulae copulas data-analysis dependency-analysis dependency-modeling modeling python regular-vines statistics

Last synced: 27 Oct 2024

https://github.com/theengineeringworld/statistics-using-python

These files are part of Youtube Course "Statistics Using Python" Offered By The Engineering WOrld. Offered By: http://youtube.com/theengineeringworld

cleaning data-analysis data-mining data-science data-visualization database jupyter-notebooks python python3 statistics

Last synced: 08 Nov 2024

https://github.com/dcoles/prometheus-pandas

Pandas integration for Prometheus.

data-analysis jupyter-notebook pandas prometheus python

Last synced: 26 Oct 2024

https://github.com/zblz/naima

Derivation of non-thermal particle distributions through MCMC spectral fitting

astronomy astrophysics data-analysis gamma-ray-astronomy python

Last synced: 26 Oct 2024

https://github.com/kb22/GitHub-User-Insights-using-API

The project involves using the GitHub API using user authentication to fetch information such as commits and repositories for that specific user and store them as CSV files for data collection and analysis.

api data-analysis data-science data-scraping github-api python

Last synced: 08 Nov 2024

https://github.com/contextlab/computational-neuroscience

Short undergraduate course taught at University of Pennsylvania on computational and theoretical neuroscience. Provides an introduction to programming in MATLAB, single-neuron models, ion channel models, basic neural networks, and neural decoding.

computational-neuroscience course-materials data-analysis matlab modeling neuron problem-set simulation

Last synced: 06 Nov 2024

https://github.com/ContextLab/computational-neuroscience

Short undergraduate course taught at University of Pennsylvania on computational and theoretical neuroscience. Provides an introduction to programming in MATLAB, single-neuron models, ion channel models, basic neural networks, and neural decoding.

computational-neuroscience course-materials data-analysis matlab modeling neuron problem-set simulation

Last synced: 07 Aug 2024

https://github.com/cdnjs/cf-stats

📈 Monthly usage statistics from Cloudflare for the cdnjs.cloudflare.com domain - The #1 free and open source CDN built to make life easier for developers.

cdnjs cloudflare data data-analysis statistics stats usage usage-data usage-reports

Last synced: 07 Nov 2024

https://github.com/elysian01/data-purifier

A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning, and Automated Data Preprocessing For Machine Learning and Natural Language Processing Applications in Python.

data-analysis data-cleaning data-cleaning-pipeline data-preprocessing data-science data-visualization datapurifier eda exploratory-data-analysis jupyter python-lib python-library python3

Last synced: 07 Nov 2024

https://github.com/briatte/dsr

Introduction to Data Science with R (Sciences Po, Paris, 2023)

course data-analysis data-science data-visualization r statistics

Last synced: 27 Oct 2024

https://github.com/nicolaskruchten/scipy2021

Data Visualization as the First and Last Mile of Data Science: Plotly Express and Dash

data-analysis data-science data-visualization python visualization

Last synced: 08 Nov 2024

https://github.com/lunarwhite/covid-social-analysis

Apply ML on weibo sentiment. 疫情背景下微博文本情感分析与可视化

crawling data-analysis machine-learning nlp python vizualization

Last synced: 06 Nov 2024

https://github.com/SOCR/SOCRAT

A Dynamic Web Toolbox for Interactive Data Processing, Analysis, and Visualization

data-analysis data-science data-visualization socr statistics visual-analytics visualization

Last synced: 03 Nov 2024

https://github.com/dfinke/psduckdb

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 27 Oct 2024

https://github.com/lkuffo/data-viz

Más de 50 ejemplos de visualizaciones y análisis de datos en Matplotlib, Pandas, Seaborn, Plotly, Bokeh y Networkx

data-analysis data-science dataviz geoviz jupyter jupyter-notebook matplotlib networkx pandas plotly python seaborn

Last synced: 05 Nov 2024

https://github.com/root-11/tablite

multiprocessing enabled out-of-memory data analysis library for tabular data.

data-analysis data-science datatype disk etl excel filereader pandas pivot-tables python table tabular-data

Last synced: 11 Oct 2024

https://github.com/tstreamdoth/instacart-market-basket-analysis

Use Instacart public dataset to report which products are often shopped together. 🍋🍉🥑🥦

data-analysis data-science instacart market-basket-analysis

Last synced: 28 Oct 2024

https://github.com/rafzamb/sknifedatar

sknifedatar is a package that serves primarily as an extension to the modeltime 📦 ecosystem. In addition to some functionalities of spatial data and visualization.

data data-analysis data-science data-visualization forecasting r statistics time-series

Last synced: 05 Aug 2024

https://github.com/fjosw/pyerrors

Error propagation and statistical analysis for Markov chain Monte Carlo simulations in lattice QCD and statistical mechanics using autograd

autocorrelation autograd automatic-differentiation condensed-matter correlation data-analysis error-propagation lattice-field-theory lattice-qcd markov-chain monte-carlo particle-physics physics python qcd statistical-analysis statistical-mechanics

Last synced: 26 Oct 2024

https://github.com/ZijieZhaoMMHW/m_mhw1.0

A MATLAB toolbox to detect and analyze marine heatwaves (MHWs).

climate-science data-analysis heatwaves marine-heatwaves matlab

Last synced: 08 Aug 2024

https://github.com/czyt1988/data-workbench

Data processing tool software developed by QT(CPP)

data-analysis graphicsview qt qt-workflow qt5 workflow

Last synced: 08 Nov 2024

https://github.com/sharmaroshan/Insurance-Claim-Prediction

In this Data set we are Predicting the Insurance Claim by each user, Machine Learning algorithms for Regression analysis are used and Data Visualization are also performed to support Analysis.

beginner classification data-analysis data-visualization eda evaluation-metrics finance machine-learning radar-chart

Last synced: 08 Aug 2024

https://github.com/kennbroorg/poorskeme

OSINT - Data Visualization - Blockchain - Awareness - Scam

data-analysis data-visualization python scam smart-contracts visualization

Last synced: 11 Nov 2024

https://github.com/dfinke/PSDuckDB

PSDuckDB is a PowerShell module that provides seamless integration with DuckDB, enabling efficient execution of analytical SQL queries directly from the PowerShell environment.

data-analysis data-science duckdb powershell sql

Last synced: 23 Aug 2024

https://github.com/khanhnamle1994/world-cup-2018

An exploratory data analysis and data visualization project for World Cup 2018

data-analysis data-visualization

Last synced: 10 Nov 2024

https://github.com/braph-software/BRAPH-2

BRAPH 2.0 is a comprehensive software package for the analysis and visualization of brain connectivity data, offering flexible customization, rich visualization capabilities, and a platform for collaboration in neuroscience research.

biomedical-engineering brain-connectivity-analysis brain-research computational-neuroscience connectomics data-analysis data-science data-visualization deep-learning graph-theory machine-learning matlab network-analysis neuroimaging neuroscience open-source reproducible-research research-tools scientific-software toolbox

Last synced: 12 Nov 2024

https://github.com/leeper/make-example

An example of using make for a data analysis project

data-analysis make manuscript reproducible-research

Last synced: 28 Oct 2024

https://github.com/atapas/covid-19

COVID-19 World is yet another Project to build a Dashboard like app to showcase the data related to the COVID-19(Corona Virus).

analytics countries covid covid-19 covid-19-india covid19 dashboard data-analysis data-visualization jamstack react reactjs recharts saas showcase virus visualization

Last synced: 07 Nov 2024

https://github.com/inphyt/covid19-italy-integrated-surveillance-data

COVID-19 integrated surveillance data provided by the Italian Institute of Health and processed via UnrollingAverages.jl to deconvolve the weekly moving averages.

covid-19 covid19-data data data-analysis data-structures data-visualization data-wrangling database dataset epidemiological-data epidemiology italy italy-data italy-dataset open-data surveillance surveillance-data time-series time-series-analysis

Last synced: 12 Nov 2024

https://github.com/stellar/stellar-etl

Stellar ETL will enable real-time analytics on the Stellar network

bitcoin blockchain data-analysis ethereum etl-framework etl-pipeline stellar stellar-lumens stellar-network

Last synced: 06 Nov 2024

https://github.com/kwokhing/yandexcatboost-python-demo

Demo on the capability of Yandex CatBoost gradient boosting classifier on a fictitious IBM HR dataset obtained from Kaggle. Data exploration, cleaning, preprocessing and model tuning are performed on the dataset

catboost data-analysis data-preprocessing data-science feature-selection gradient-boosting gradient-boosting-classifier one-hot-encode pandas pearson-correlation python python27 seaborn variance-analysis visualization yandex-catboost

Last synced: 12 Oct 2024

https://github.com/davidchall/ipaddress

Data analysis for IP addresses and networks

cyber data-analysis ip-address ipv4 ipv6 r vctrs

Last synced: 13 Aug 2024