{"id":13766568,"url":"https://victorcouste.github.io/data-tools/","last_synced_at":"2025-05-10T22:31:11.847Z","repository":{"id":53730624,"uuid":"347338418","full_name":"victorcouste/data-tools","owner":"victorcouste","description":"Data Tools Subjective List","archived":false,"fork":false,"pushed_at":"2023-08-22T18:55:45.000Z","size":522,"stargazers_count":83,"open_issues_count":7,"forks_count":12,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-04-25T01:02:06.280Z","etag":null,"topics":["awesome","awesome-list","data","data-architecture","data-tools","datatools","list","modern","open-source","opensource","tools"],"latest_commit_sha":null,"homepage":"https://victorcouste.github.io/data-tools/","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/victorcouste.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2021-03-13T10:22:06.000Z","updated_at":"2025-02-14T19:45:16.000Z","dependencies_parsed_at":"2024-01-10T19:12:17.332Z","dependency_job_id":null,"html_url":"https://github.com/victorcouste/data-tools","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorcouste%2Fdata-tools","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorcouste%2Fdata-tools/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorcouste%2Fdata-tools/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/victorcouste%2Fdata-tools/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/victorcouste","download_url":"https://codeload.github.com/victorcouste/data-tools/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252931480,"owners_count":21827109,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awesome","awesome-list","data","data-architecture","data-tools","datatools","list","modern","open-source","opensource","tools"],"created_at":"2024-08-03T16:00:57.529Z","updated_at":"2025-05-10T22:31:11.560Z","avatar_url":"https://github.com/victorcouste.png","language":null,"funding_links":[],"categories":["Not exhaustive and personnal list of \"modern\" Data Tools and Projects"],"sub_categories":[],"readme":"\n## Not exhaustive and personnal list of \"modern\" Data Tools and Projects\n\n[![Suggest a Data Tool !](https://img.shields.io/badge/Suggest-a%20Data%20Tool-green)](https://github.com/victorcouste/data-tools/issues/new)\n\n- [Data Architecture articles](#architecture)\n- [Data Ingestion / Data Onboarding / ETL / ELT](#ingestion)\n- [Reverse ETL](#reverse)\n- [Data Collection / Product Analytics / Customer Data](#collection)\n- [Transformation / Preparation / Cleaning / Wrangling](#transformation)\n- [SQL Tools / Editors](#sqltools)\n- [SQL Engines](#sql)\n- [BI / Reporting / Data Visualization](#bi)\n- [Data Quality / Profiling / Observability](#quality)\n- [Data Management / Lineage / Catalog / Governance](#management)\n- [DataOps / Data Fabric](#ops)\n- [Orchestration / Workflow](#orchestration)\n- [Storage / Database](#storage)\n- [Data Privacy / Security / Identity](#privacy)\n- [Others](#others)\n\nNo (file systems) storage or (traditional) databases, and for now, no data science, virtualization, or streaming tools. And no all embedded tools and services proposed by the 3 main public Cloud providers (Google Cloud, Microsoft Azure and AWS).\n\n\u003ca name=\"architecture\"\u003e\u003c/a\u003e**Data Architecture**\n- [Emerging Architectures for Modern Data Infrastructure](https://a16z.com/2020/10/15/the-emerging-architectures-for-modern-data-infrastructure)\n- [The Modern Data Stack: Past, Present, and Future](https://blog.getdbt.com/future-of-the-modern-data-stack)\n- [Data Mesh Principles and Logical Architecture](https://martinfowler.com/articles/data-mesh-principles.html) and a [Data Warehouse comparison](https://blog.starburstdata.com/data-mesh-the-answer-to-the-data-warehouse-hypocrisy)\n- [The Building Blocks of a Modern Data Platform](https://towardsdatascience.com/the-building-blocks-of-a-modern-data-platform-92e46061165)\n- [Two steps towards a modern data platform](https://medium.com/bigdatarepublic/two-steps-towards-a-modern-data-platform-37c74e7c104b)\n- [What your data team is using: The analytics stack](https://technically.dev/posts/what-your-data-team-is-using)\n- [The Top 20 Most Commonly Used Data Engineering Tools](https://www.secoda.co/blog/the-top-20-most-commonly-used-data-engineering-tools)\n- [Data Stacks For Fun \u0026 Nonprofit](https://towardsdatascience.com/data-stacks-for-fun-nonprofit-part-ii-d375d824abf3)\n- [The Future of Business Intelligence is Open Source](https://maximebeauchemin.medium.com/the-future-of-business-intelligence-is-open-source-9b654595773a)\n- [What is Data Observability?](https://towardsdatascience.com/what-is-data-observability-40b337971e3e)\n\n\u003ca name=\"ingestion\"\u003e\u003c/a\u003e**Data Ingestion / Data Onboarding / ETL / ELT**\n- [Flatfile](https://flatfile.io) Data Onboarding platform\n- [Fivetran](https://fivetran.com) Cloud data integration platform\n- [Matillion](https://www.matillion.com) Cloud data integration platform\n- [Apache Gobblin](https://gobblin.apache.org) Open Source distributed data integration framework\n- [Singer](https://www.singer.io) \"Open Source standard for writing scripts that move data\"\n- [Meltano](https://meltano.com) Open Source ELT for the DataOps\n- [Airbyte](https://airbyte.io) Open Source data integration platform\n- [Stitch](https://www.stitchdata.com) Simple, extensible Cloud ETL platform (Talend)\n- [Hevo](https://hevodata.com) No-code data pipeline as a service\n- [Apache Hop](http://hop.incubator.apache.org) Open Source data integration platform project\n- [Meroxa](https://meroxa.com) Real-time data ingestion infrastructure\n- [Portable](https://portable.io) Cloud Hosted ELT Platform\n- Talend, StreamSets, Alooma (Google), Xplenty, Striim, Panoply, Stambia, HVR\n\n\u003ca name=\"reverse\"\u003e\u003c/a\u003e**Reverse ETL**\n- [Census](https://www.getcensus.com) Operational analytics platform, move data from data warehouse to apps\n- [Hightouch](https://www.hightouch.io) Sync customer data to SaaS business platforms\n- [Grouparoo](https://www.grouparoo.com) Open Source framework to move data between database and Cloud apps\n\n\u003ca name=\"collection\"\u003e\u003c/a\u003e**Data Collection / Product Analytics / Customer Data**\n- [Segment](https://segment.com) Customer data platform (CDP) (Twilio)\n- [RudderStack](https://rudderstack.com) Customer data pipeline, event tracking\n- [Snowplow](https://snowplowanalytics.com) Data collection platform \n- [Freshpaint](https://www.freshpaint.io) Collect, control, and deliver customer data\n- [PostHog](https://posthog.com) Open Source Product Analytics platform\n- [Amplitude](https://amplitude.com) Product Analytics platform\n- [Iteratively](https://iterative.ly) Product Analytics platform « Capture customer data you trust »  \n- [Avo](https://www.avo.app) Product Analytics platform\n- [Mixpanel](https://mixpanel.com) Product analytics platform\n- [Indicative](https://www.indicative.com) Product analytics platform \n- [Heap](https://heap.io) Product analytics platform\n- [Supermetrics](https://supermetrics.com) Get marketing data for reporting, analytics and storage\n\n\u003ca name=\"transformation\"\u003e\u003c/a\u003e**Transformation / Preparation / Cleaning / Wrangling**\n- [Trifacta](https://www.trifacta.com) Data Wrangling for Cloud (or Hadoop) platforms and storages\n- [dbt](https://www.getdbt.com) Transform with SQL from command line ([Open Source](https://github.com/fishtown-analytics/dbt)) or Cloud\n- [Dataform](https://dataform.co) Collaboration on SQL pipelines in Cloud data warehouses (Google)\n- [Pano](https://www.pano.dev) Open Source data preparation for Cloud data warehouses\n- [Rasgo](https://www.rasgoml.com) Data preparation for Data Scientists\n- [Mito](https://www.trymito.io) Jupyter Lab extension to generate panda Python code from a spreadsheet\n- [DataPrep](https://dataprep.ai/) Prepare data in Python\n- [OpenRefine](https://openrefine.org) \"A free, open source, powerful tool for working with messy data\"\n\n\u003ca name=\"sqltools\"\u003e\u003c/a\u003e**SQL Tools / Editors**\n- [Count](https://count.co) \"The BI notebook built for analysts\"\n- [PopSQL](https://popsql.com) \"Modern SQL editor\"\n- [DataGrip](https://www.jetbrains.com/datagrip) IDE for SQL (JetBrains)\n- [DBeaver](https://dbeaver.io) Free (or Enterprise and Cloud editions) universal database tool\n- [sq](https://sq.io) \"swiss-army knife for data\", SQL in command line for relational data\n- [SqlDBM](https://sqldbm.com) Develop Database Models\n- [Querybook](https://www.querybook.org) Open Source SQL query and Big Data IDE via a notebook interface\n- [Soda SQL](https://github.com/sodadata/soda-sql) Data testing, monitoring, and profiling for SQL-accessible data\n- [SQLFluff](https://github.com/sqlfluff/sqlfluff) SQL Linting and Auto-formatting for Humans\n\n\u003ca name=\"sql\"\u003e\u003c/a\u003e**SQL Engines**\n- [Trino](https://trino.io) Open Source high perf and distributed SQL query engine (formerly PrestoSQL)\n- [Starburst](https://www.starburst.io) Cloud or On-premises SQL engine (based on [Trino](https://trino.io))\n- [AWS Athena](https://aws.amazon.com/athena) Interactive SQL query service for Amazon S3 (based on Presto)\n- [DataFusion](https://github.com/apache/arrow-datafusion) Query execution engine using Apache Arrow as its in-memory format\n\n\u003ca name=\"bi\"\u003e\u003c/a\u003e**BI / Reporting / Data Visualization**\n- [Metabase](https://www.metabase.com) Open Source business intelligence tool\n- [Apache Superset](https://superset.apache.org) Open Source modern data exploration and visualization platform\n- [Apache ECharts](https://echarts.apache.org) Open Source JavaScript Visualization Library\n- [Cube.js](https://cube.dev) Open Source Analytical API platform\n- [Grafana](https://grafana.com) Open Source analytics \u0026 monitoring solution\n- [Looker](https://looker.com) BI and Analytics Platform (Google)\n- [Redash](https://redash.io) Data visualisation and Dashboarding with SQL (Databricks)\n- [Mode](https://mode.com) Collaborative data platform that combines SQL, R, Python, and visual analytics\n- [Sigma](https://www.sigmacomputing.com) Cloud analytics solution\n- [Hex](https://hex.tech) Collaborative SQL + Python-based notebooks\n- [Lux](https://github.com/lux-org/lux) Python library and API for Intelligent Visual Discovery\n- [y42](https://www.y42.com) \"No-Code Business Intelligence\" platform\n- [Knowage](https://www.knowage-suite.com) Open Source Business Analytics Suite\n- [Rakam](https://rakam.io) Data platform for building analytics interface (dbt integration)\n- [Datawrapper](https://www.datawrapper.de) Enrich stories and articles with data visualization\n- [D3](https://d3js.org) JavaScript library for visualizing data with HTML, SVG, and CSS\n- [Lightdash](https://www.lightdash.com) Open source BI tool fully integrated with dbt projects\n- Tableau, PowerBI, Sisense, Qlik, Spotfire, ThoughtSpot, Chartio (Atlassian), Domo, Toucan Toco\n\n\u003ca name=\"quality\"\u003e\u003c/a\u003e**Data Quality / Profiling / Observability**\n- [Monte Carlo](https://www.montecarlodata.com) \"Data Reliability Delivered\"\n- [Datafold](https://www.datafold.com) Data Observability platform\n- [Great Expectations](https://greatexpectations.io) Open Source data quality, profiling \u0026 validation\n- [Bigeye](https://www.bigeye.com) Automatic data quality monitoring\n- [Anomalo](https://www.anomalo.com) Validate and document your data warehouse\n- [Trackplan](https://trackplan.io) \"Schema Management for Behavioural Data Tracking\"\n- [lightup](https://www.lightup.ai) Cloud data quality indicators provider\n\n\u003ca name=\"management\"\u003e\u003c/a\u003e**Data Management / Lineage / Catalog / Governance**\n- [Datakin](https://datakin.com) DataOps solution, Data Lineage\n- [Marquez](https://marquezproject.github.io/marquez) Open Source metadata and data governance project\n- [DataHub](https://datahubproject.io) Open Source metadata search \u0026 discovery tool\n- [Amundsen](https://www.amundsen.io) Open Source data discovery and metadata engine\n- [Data Galaxy](https://www.datagalaxy.com/en) Data Governance platform with Data Catalog and Data Lineage\n- [Zeenea](https://zeenea.com) Cloud-native Data Catalog\n- [Alation](https://www.alation.com) Data Governance and Data Catalog platform\n- [Collibra](https://www.collibra.com) Data Governance and Data Catalog platform\n- [Secoda](https://www.secoda.co) Data Discovery and Data Catalog\n- [MANTA](https://getmanta.com) Data Lineage platform\n- [data.world](https://data.world) Cloud-native Data Catalog\n- [Stemma](https://www.stemma.ai/) SaaS managed version of Amundsen\n- [Egeria](https://egeria.odpi.org/) Open Metadata and Governance\n\n\u003ca name=\"ops\"\u003e\u003c/a\u003e**DataOps / Data Fabric**\n- [Altan](https://atlan.com) \"the modern data workspace\", Data Management \u0026 DataOps\n- [Nessie](https://projectnessie.org) DataOps for Data Lakes, a \"Git-Like Experience for your Data Lake\"\n- [Nexla](https://www.nexla.com) DataOps platform \"to delivery data for Analytics, AI and Operations\"\n- [Keboola](https://www.keboola.com) DataOps platform\n- [Saagie](https://www.saagie.com) DataOps platform\n- [DataKitchen](https://datakitchen.io) DataOps platform\n- [DAGsHub](https://dagshub.com) GitHub for data\n- [Unravel](https://www.unraveldata.com) DataOps platform\n- [Upsolver](https://www.upsolver.com) \"Compute and pipeline layer between your data lake and the analytics tools\"\n- [Cinchy](https://www.cinchy.com) \"Autonomous Data Fabric\" and Data Management platform\n\n\u003ca name=\"orchestration\"\u003e\u003c/a\u003e**Orchestration / Workflow**\n- [Apache Airflow](https://airflow.apache.org) Open Source workflow scheduler platform\n- [Dagster](https://dagster.io) Open Source \"Data orchestrator for machine learning, analytics, and ETL\"\n- [Prefect](https://www.prefect.io) Workflow management system and platform for dataflow automation\n- [Apache DolphinScheduler](https://dolphinscheduler.apache.org) Distributed and visual workflow scheduler system\n- [Luigi](https://github.com/spotify) Python package to build complex pipelines of batch jobs\n\n\u003ca name=\"storage\"\u003e\u003c/a\u003e**Storage / Database**\n- [DuckDB](https://duckdb.org) In-process SQL OLAP database (Sqlite like column oriented)\n- [ClickHouse](https://clickhouse.tech/) Open-source OLAP database management system\n- [DoltHub](https://www.dolthub.com) \"the true Git for data experience in a SQL database\"\n- [DVC](https://dvc.org) Data Version Control\n- [Materialize](https://materialize.com) Event Streaming Database\n- [Warp 10](https://www.warp10.io) Advanced Time Series Platform\n- Snowflake, Firebolt, BigQuery, Redshift, Apache Cassandra, MongoDB, InfluxDB, QuestDB, Neo4j, SingleStore(MemSQL)\n\n\u003ca name=\"privacy\"\u003e\u003c/a\u003e**Data Privacy / Security / Identity**\n- [Immuta](https://www.immuta.com) \"Self-Service Data Access with Automated Privacy Control\"\n- [Okera](https://www.okera.com) Cloud data security, \"Universal Data Authorization\"\n- [Privacera](https://privacera.com) SaaS Access Governance Solution\n- [Apache Ranger](https://ranger.apache.org) Framework to enable, monitor and manage comprehensive data security\n- [Baffle](https://baffle.io) Cloud security with a \"transparent data security mesh\"\n- [Privitar](https://www.privitar.com) Enterprise Data Privacy Software\n- [ReachFive](https://www.reachfive.com) Identity \u0026 Access Management\n- [Okta](https://www.okta.com) Trusted platform to secure identities, from customers to workforce\n\n\u003ca name=\"others\"\u003e\u003c/a\u003e**Others**\n- [Opendatasoft](https://www.opendatasoft.com) Data sharing platform \n- [Streamlit](https://streamlit.io) Turns data scripts into shareable data web apps\n- [Transform Data](https://transformdata.io) Shared data interface and metrics repository \n- [White Label Data](https://docs.whitelabeldata.com) Platform for building and deploying custom data applications\n- [Flat Data](https://octo.github.com/projects/flat-data) Bring working datasets into your GitHub repositories and versioning them\n\n**And finally don't hesitate to:**\n- [Star](https://github.com/victorcouste/data-tools/stargazers) this GitHub repository Web page\n- Suggest addition interesting and new data tool with a [pull request](https://github.com/victorcouste/data-tools/pulls), an [issue](https://github.com/victorcouste/data-tools/issues/new) or a [message](https://github.com/victorcouste)\n- Share [this list](https://victorcouste.github.io/data-tools) in your newtork\n- Enjoy and Have Fun !\n\nVictor\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/victorcouste.github.io%2Fdata-tools%2F","html_url":"https://awesome.ecosyste.ms/projects/victorcouste.github.io%2Fdata-tools%2F","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/victorcouste.github.io%2Fdata-tools%2F/lists"}