Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

Data analysis

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.

https://github.com/scikit-learn/scikit-learn

scikit-learn: machine learning in Python

data-analysis data-science machine-learning python statistics

Last synced: 30 Jul 2024

https://github.com/pydata/pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

alignment data-analysis data-science flexible pandas python

Last synced: 09 Aug 2024

https://github.com/pandas-dev/pandas

Flexible and powerful data analysis / manipulation library for Python, providing labeled data structures similar to R data.frame objects, statistical functions, and much more

alignment data-analysis data-science flexible pandas python

Last synced: 31 Jul 2024

https://github.com/metabase/metabase

The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

analytics bi business-intelligence businessintelligence clojure dashboard data data-analysis data-visualization database metabase mysql postgres postgresql reporting slack sql-editor visualization

Last synced: 30 Jul 2024

https://github.com/gradio-app/gradio

Build and share delightful machine learning apps, all in Python. 🌟 Star to support our work!

data-analysis data-science data-visualization deep-learning deploy gradio gradio-interface hacktoberfest interface machine-learning models python python-notebook ui ui-components

Last synced: 30 Jul 2024

https://github.com/microsoft/Data-Science-For-Beginners

10 Weeks, 20 Lessons, Data Science for All!

data-analysis data-science data-visualization pandas python

Last synced: 31 Jul 2024

https://gchq.github.io/CyberChef/

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

compression data-analysis data-manipulation encoding encryption hashing parsing

Last synced: 31 Jul 2024

https://github.com/gchq/cyberchef

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

compression data-analysis data-manipulation encoding encryption hashing parsing

Last synced: 01 Aug 2024

https://github.com/gchq/CyberChef

The Cyber Swiss Army Knife - a web app for encryption, encoding, compression and data analysis

compression data-analysis data-manipulation encoding encryption hashing parsing

Last synced: 30 Jul 2024

https://github.com/allinurl/goaccess

GoAccess is a real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.

analytics apache c caddy cli command-line dashboard data-analysis gdpr goaccess google-analytics monitoring ncurses nginx privacy real-time terminal tui web-analytics webserver

Last synced: 30 Jul 2024

https://github.com/dataease/dataease

人人可用的开源数据可视化分析工具。

apache-doris business-intelligence data-analysis data-visualization echarts kettle superset tableau

Last synced: 30 Jul 2024

https://github.com/airbytehq/airbyte

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

bigquery change-data-capture data data-analysis data-collection data-engineering data-integration data-pipeline elt etl java mssql mysql pipeline postgresql python redshift s3 self-hosted snowflake

Last synced: 30 Jul 2024

https://github.com/sinaptik-ai/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql

Last synced: 02 Aug 2024

https://github.com/Sinaptik-AI/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql

Last synced: 31 Jul 2024

https://github.com/gventuri/pandas-ai

Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

ai csv data data-analysis data-science database datalake gpt-3 gpt-4 llm pandas sql

Last synced: 03 Aug 2024

https://github.com/OpenRefine/OpenRefine

OpenRefine is a free, open source power tool for working with messy data and improving it

data-analysis data-science data-wrangling datacleaning datacleansing datajournalism datamining java journalism opendata reconciliation wikidata

Last synced: 31 Jul 2024

https://github.com/Kanaries/pygwalker

PyGWalker: Turn your pandas dataframe into an interactive UI for visual analysis

data-analysis data-exploration dataframe matplotlib pandas plotly tableau tableau-alternative visualization

Last synced: 31 Jul 2024

https://github.com/guipsamora/pandas_exercises

Practice your pandas skills!

data-analysis exercise pandas practice tutorial

Last synced: 31 Jul 2024

https://github.com/jindaxiang/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 03 Aug 2024

https://github.com/akfamily/akshare

AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

academic akshare asset-pricing bond currency data data-analysis data-science datasets economic-data economics finance finance-api financial-data fundamental futures option quant stock

Last synced: 31 Jul 2024

https://github.com/gonum/gonum

Gonum is a set of numeric libraries for the Go programming language. It contains libraries for matrices, statistics, optimization, and more

data-analysis go golang graph matrix scientific-computing statistics

Last synced: 30 Jul 2024

https://github.com/Alluxio/alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud

alluxio data-analysis data-orchestration hadoop memory-speed presto spark tensorflow virtual-distributed-filesystem

Last synced: 31 Jul 2024

https://github.com/scikit-learn-contrib/imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning

data-analysis data-science machine-learning python statistics

Last synced: 30 Jul 2024

https://github.com/jeecgboot/JimuReport

「开源可视化报表,商业BI替代方案」积木报表是一款类似excel操作风格,在线拖拽完成设计的报表工具。低代码产品的臂膀!功能涵盖: 报表设计、图形报表、打印设计、大屏设计等,完全免费!秉承“简单、易用、专业”的产品理念,极大的降低报表开发难度、缩短开发周期、解决各类报表难题。

bi bigscreen birt data-analysis data-visualization dataease datav echart echarts finereport highcharts ireport jasperreport jfreechart metabase print redash report superset tableau

Last synced: 31 Jul 2024

https://github.com/rhiever/Data-Analysis-and-Machine-Learning-Projects

Repository of teaching materials, code, and data for my data analysis and machine learning projects.

data-analysis data-science evolutionary-algorithm ipython-notebook machine-learning python

Last synced: 31 Jul 2024

https://github.com/airbnb/knowledge-repo

A next-generation curated knowledge sharing platform for data scientists and other technical professions.

data data-analysis data-science knowledge

Last synced: 31 Jul 2024

https://github.com/flyteorg/flyte

Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

data data-analysis data-science dataops declarative fine-tuning flyte golang grpc kubernetes kubernetes-operator llm machine-learning mlops orchestration-engine production production-grade python scale workflow

Last synced: 31 Jul 2024

https://github.com/SpiderClub/weibospider

:zap: A distributed crawler for weibo, building with celery and requests.

data-analysis distributed-crawler python3 sina weibo weibospider

Last synced: 31 Jul 2024

https://github.com/cube2222/octosql

OctoSQL is a query tool that allows you to join, analyse and transform data from multiple databases and file formats using SQL.

cli csv data-analysis go golang json mysql nosql postgresql query query-engine redis sql

Last synced: 31 Jul 2024

https://github.com/javascriptdata/danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

danfojs data-analysis data-analytics data-manipulation data-science dataframe javascript pandas plotting-charts stream-data stream-processing table tensorflow tensors

Last synced: 31 Jul 2024

https://github.com/opensource9ja/danfojs

Danfo.js is an open source, JavaScript library providing high performance, intuitive, and easy to use data structures for manipulating and processing structured data.

danfojs data-analysis data-analytics data-manipulation data-science dataframe javascript pandas plotting-charts stream-data stream-processing table tensorflow tensors

Last synced: 02 Aug 2024

https://github.com/microsoft/TaskWeaver

A code-first agent framework for seamlessly planning and executing data analytics tasks.

agent ai-agents code-interpreter copilot data-analysis llm openai

Last synced: 31 Jul 2024

https://github.com/BrambleXu/pydata-notebook

利用Python进行数据分析 第二版 (2017) 中文翻译笔记

chinese-translation data-analysis jupyter-notebook pandas python-for-data-analysis

Last synced: 31 Jul 2024

https://github.com/Nyandwi/machine_learning_complete

A comprehensive machine learning repository containing 30+ notebooks on different concepts, algorithms and techniques.

computer-vision data-analysis data-science data-visualization datascience deep-learning keras machine-learning matplotlib neural-networks nlp numpy open-source pandas python scikit-learn seaborn tensorflow

Last synced: 01 Aug 2024

https://github.com/whoiskatrin/sql-translator

SQL Translator is a tool for converting natural language queries into SQL code using artificial intelligence. This project is 100% free and open source.

data-analysis data-engineering dataquery datascience dataset openai postgresql query sql

Last synced: 31 Jul 2024

https://github.com/ResidentMario/missingno

Missing data visualization module for Python.

data-analysis data-visualization missing-data pandas python

Last synced: 30 Jul 2024

https://github.com/has2k1/plotnine

A Grammar of Graphics for Python

data-analysis grammar graphics plotting python

Last synced: 31 Jul 2024

https://github.com/lancedb/lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust

Last synced: 31 Jul 2024

https://github.com/eto-ai/lance

Modern columnar data format for ML and LLMs implemented in Rust. Convert from parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, with more integrations coming..

apache-arrow computer-vision data-analysis data-analytics data-centric data-format data-science dataops deep-learning duckdb embeddings llms machine-learning mlops python rust

Last synced: 02 Aug 2024

https://github.com/antonycourtney/tad

A desktop application for viewing and analyzing tabular data

csv data-analysis data-science database desktop-application duckdb parquet-viewer pivot-tables pivots tabular-data

Last synced: 31 Jul 2024

https://github.com/aksnzhy/xlearn

High performance, easy-to-use, and scalable machine learning (ML) package, including linear model (LR), factorization machines (FM), and field-aware factorization machines (FFM) for Python and CLI interface.

data-analysis data-science factorization-machines ffm fm machine-learning statistics

Last synced: 30 Jul 2024

https://github.com/pydata/pandas-datareader

Extract data from a wide range of Internet sources into a pandas DataFrame.

data data-analysis dataset econdb economic-data fama-french finance financial-data fred html pandas pydata python stock-data

Last synced: 31 Jul 2024

https://github.com/quadratichq/quadratic

Quadratic | Data Science Spreadsheet with Python & SQL

data data-analysis data-engineering data-science etl python quadratic spreadsheet sql wasm webgl

Last synced: 01 Aug 2024

https://github.com/jm199504/Financial-Knowledge-Graphs

小型金融知识图谱构建流程(neo4j / python / cypher / KG)

cypher data-analysis graph-database neo4j python

Last synced: 31 Jul 2024

https://github.com/justinzm/gopup

数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

covid19-data data data-analysis data-science datasets economic-data gopup index-data python

Last synced: 31 Jul 2024

https://github.com/root-project/root

The official repository for ROOT: analyzing, storing and visualizing big data, scientifically

c-plus-plus cling data-analysis geometry graphics hacktoberfest interpreter machine-learning mathematics parallel physics python root root-cern statistics visualization

Last synced: 30 Jul 2024

https://github.com/apache/incubator-devlake

Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

dashboard-friendly data data-analysis data-engineering data-integration data-transfers devops domain-layer dora etl golang hacktoberfest integration jira open-source user-friendly

Last synced: 01 Aug 2024

https://github.com/AAChartModel/AAChartKit-Swift

📈📊📱💻🖥️An elegant modern declarative data visualization chart framework for iOS, iPadOS and macOS. Extremely powerful, supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange, bubble, box plot, error bars, funnel, waterfall and polar chart types. 极其精美而又强大的现代化声明式数据可视化图表框架,支持柱状图、条形图、折线图、曲线图、折线填充图、曲线填充图、气泡图、扇形图、环形图、散点图、雷达图、混合图等各种类型的多达几十种的信息图图表,完全满足工作所需.

animation area-chart bar-chart bubble-chart chart column-chart data-analysis data-visualization draw graph graphics ios line-chart model objective-c pie plot spline-chart swift view

Last synced: 04 Aug 2024

https://github.com/ajcr/100-pandas-puzzles

100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

data-analysis numpy pandas python

Last synced: 31 Jul 2024

https://github.com/mito-ds/mito

The mitosheet package, trymito.io, and other public Mito code.

data data-analysis data-science data-visualization jupyter pandas python streamlit-component

Last synced: 31 Jul 2024

https://github.com/MacroAnalyst/Linear_Algebra_With_Python

Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

computational-science data-analysis data-science data-visualization diagonalization eigenvalues eigenvectors gram-schmidt jupyter linear-algebra linear-transformations mathematics matrix matrix-calculations multivariate-normal-distribution null-space python singular-value-decomposition symmetric-matrices vector-space

Last synced: 30 Jul 2024

https://github.com/weijie-chen/Linear-Algebra-With-Python

Lecture Notes for Linear Algebra Featuring Python. This series of lecture notes will walk you through all the must-know concepts that set the foundation of data science or advanced quantitative skillsets. Suitable for statistician/econometrician, quantitative analysts, data scientists and etc. to quickly refresh the linear algebra with the assistance of Python computation and visualization.

computational-science data-analysis data-science data-visualization diagonalization eigenvalues eigenvectors gram-schmidt jupyter linear-algebra linear-transformations mathematics matrix matrix-calculations multivariate-normal-distribution null-space python singular-value-decomposition symmetric-matrices vector-space

Last synced: 31 Jul 2024

https://github.com/justmarkham/pandas-videos

Jupyter notebook and datasets from the pandas video series

data-analysis data-cleaning data-science jupyter-notebook pandas python tutorial

Last synced: 31 Jul 2024

https://github.com/lana-k/sqliteviz

Instant offline SQL-powered data visualisation in your browser

charting csv data-analysis pivot pivot-table plotly plotting sql sqlite visualization

Last synced: 31 Jul 2024

https://github.com/chris1610/pbpython

Code, Notebooks and Examples from Practical Business Python

data-analysis data-visualization datascience pandas python scikit-learn

Last synced: 02 Aug 2024

https://github.com/h2oai/datatable

A Python package for manipulating 2-dimensional tabular data structures

data-analysis data-structure ftrl performance python

Last synced: 31 Jul 2024

https://github.com/bellingcat/octosuite

GitHub Data Analysis Framework.

data-analysis github

Last synced: 01 Aug 2024

https://github.com/elementary-data/elementary

The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.

analytics-engineer bigquery data-analysis data-governance data-lineage data-observability data-pipeline data-pipelines data-reliability data-warehouse dataops dbt dbt-artifacts dbt-packages lineage redshift snowflake

Last synced: 01 Aug 2024

https://github.com/404notf0und/AI-for-Security-Learning

安全场景、基于AI的安全算法和安全数据分析业界实践

data-analysis data-mining machine-learning security

Last synced: 02 Aug 2024

https://github.com/jadianes/spark-py-notebooks

Apache Spark & Python (pySpark) tutorials for Big Data Analysis and Machine Learning as IPython / Jupyter notebooks

big-data bigdata data-analysis data-science ipython ipython-notebook machine-learning mllib notebook pyspark python spark

Last synced: 07 Aug 2024