{"id":13557553,"url":"https://github.com/macbre/data-flow-graph","last_synced_at":"2025-10-27T21:31:45.415Z","repository":{"id":62566611,"uuid":"92527962","full_name":"macbre/data-flow-graph","owner":"macbre","description":"Uses your app logs to visualize how the data moves between the code, database, HTTP services, message queue, external storages etc.","archived":false,"fork":false,"pushed_at":"2024-04-15T23:26:28.000Z","size":325,"stargazers_count":23,"open_issues_count":6,"forks_count":2,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-02-01T08:41:51.273Z","etag":null,"topics":["d3-visualization","d3js","data-flow","database","elasticsearch","graph","graphviz","graphviz-dot","kibana","mysql","performance-visualization","query","sankey-diagram","sql","sql-logs","sus","sustainability","visualization"],"latest_commit_sha":null,"homepage":"https://macbre.github.io/data-flow-graph/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/macbre.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-05-26T16:12:38.000Z","updated_at":"2024-12-30T22:23:03.000Z","dependencies_parsed_at":"2024-09-18T20:20:33.767Z","dependency_job_id":null,"html_url":"https://github.com/macbre/data-flow-graph","commit_stats":{"total_commits":120,"total_committers":2,"mean_commits":60.0,"dds":0.01666666666666672,"last_synced_commit":"ac35914da4d1a607e76a7098cb59138b267deb15"},"previous_names":[],"tags_count":5,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macbre%2Fdata-flow-graph","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macbre%2Fdata-flow-graph/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macbre%2Fdata-flow-graph/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/macbre%2Fdata-flow-graph/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/macbre","download_url":"https://codeload.github.com/macbre/data-flow-graph/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238561376,"owners_count":19492704,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["d3-visualization","d3js","data-flow","database","elasticsearch","graph","graphviz","graphviz-dot","kibana","mysql","performance-visualization","query","sankey-diagram","sql","sql-logs","sus","sustainability","visualization"],"created_at":"2024-08-01T12:04:25.047Z","updated_at":"2025-10-27T21:31:45.010Z","avatar_url":"https://github.com/macbre.png","language":"Python","readme":"# data-flow-graph\n\n[![PyPI](https://img.shields.io/pypi/v/data_flow_graph.svg)](https://pypi.python.org/pypi/data_flow_graph)\n[![Build Status](https://travis-ci.org/macbre/data-flow-graph.svg?branch=master)](https://travis-ci.org/macbre/data-flow-graph)\n\n![Graphviz example](https://raw.githubusercontent.com/macbre/data-flow-graph/master/docs/data-flow-example.png)\n\nTakes application logs from Elasticsearch (because you do have logs, right?) and **visualizes how your data flow through the database** allowing you to quickly identify **which parts of your code inserts / updates / deletes / reads data from specific DB tables**.\n\nThis can be extended to handle:\n\n* message queues (Redis, RabbitMQ, [`Scribe`](https://github.com/facebookarchive/scribe), ...)\n* HTTP services communication (GET, POST requests)\n* Amazon's S3 storage operations\n* tcpdump / varnishlog traffic between the hosts\n* (*use your imagintation*)\n\n`data-flow-graph` uses [d3.js](https://d3js.org/) library to visualize the data flow (heavily inspired by [this demo](http://bl.ocks.org/Neilos/584b9a5d44d5fe00f779) by Neil Atkinson).\nAlternatively, you can generate `*.gv` file and render it using [Graphviz](https://www.graphviz.org/).\n\n# [Live demo](https://macbre.github.io/data-flow-graph/)\n\n## Graphs sharing\n\n### Via Gist\n\nFor easy dataflow sharing you can **[upload](https://gist.github.com/macbre/ddf5742b8293062cc78b767fccb5197b) graph data in TSV form to Gist** and [**have it visualized**](https://macbre.github.io/data-flow-graph/gist.html#ddf5742b8293062cc78b767fccb5197b). [Specific gist revisions](https://macbre.github.io/data-flow-graph/gist.html#ef35fb2e6ea7cc617d59090ab1e89618@e3cadc15b51967093a5eae1dff8229cffb0df120) are also supported.\n\n### Via s3\n\nYou can also **upload TSV file to your s3 bucket** (and have [CORS set up there](https://github.com/macbre/data-flow-graph/issues/20)). Navigate to [tsv.html](https://macbre.github.io/data-flow-graph/tsv.html) or [check the example](https://macbre.github.io/data-flow-graph/tsv.html#https://s3.amazonaws.com/s3.macbre.net/data_flow/database.tsv) from [elecena.pl](https://github.com/elecena/data-flow/tree/master/output).\n\n## `dataflow.tsv`\n\nVisualization is generated for a TSV file with the following format:\n\n```\n(source node)\\t(edge label)\\t(target node)\\t(edge weight - optional)\\t(optional metadata displayed in edge on-hover tooltip)\n```\n\n## Example\n\n```tsv\n# a comment - will be ignored by the visualization layer\nmq/request.php\t_update\tmysql:shops\t0.0148\tQPS: 0.1023\nsphinx:datasheets\tsearch\tElecena\\Services\\Sphinx\t0.1888\tQPS: 1.3053\nmysql:products\tgetImagesToFetch\tImageBot\t0.0007\tQPS: 0.0050\nsphinx:products\tsearch\tElecena\\Services\\Sphinx\t0.0042\tQPS: 0.0291\nsphinx:products\tgetIndexCount\tElecena\\Services\\Sphinx\t0.0001\tQPS: 0.0007\nsphinx:products\tproducts\tElecena\\Services\\Search\t0.0323\tQPS: 0.2235\ncurrency.php\t_\tmysql:currencies\t0.0001\tQPS: 0.0008\nsphinx:products\tgetLastChanges\tStatsController\t0.0002\tQPS: 0.0014\nmysql:suggest\tgetSuggestions\tElecena\\Services\\Sphinx\t0.0026\tQPS: 0.0181\nmq/request.php\t_delete\tmysql:shops_stats\t0.0004\tQPS: 0.0030\nsphinx:parameters\tgetDatabaseCount\tParameters\t0.0002\tQPS: 0.0010\n```\n\n\u003e Node names can by categorized by adding a `label` followed by `:` (e.g. `mysql:foo`, `sphinx:index`, `solr:products`, `redis:queue`)\n\n## Generating TSV file for data flow\n\nYou can write your own tool to analyze logs. It just needs to emit TSV file that matches the above format. \n\n[`sources/elasticsearch/logs2dataflow.py`](https://github.com/macbre/data-flow-graph/blob/master/sources/elasticsearch/logs2dataflow.py) is here as an example - it was used to generate TSV for a [demo](https://macbre.github.io/data-flow-graph/) of this tool. 24 hours of logs from [elecena.pl](https://elecena.pl/ ) were analyzed (1mm+ of SQL queries).\n\n## Python module\n\n```\npip install data_flow_graph\n```\n\nPlease refer to `/test` directory for examples on how to use helper functions to generate Graphviz and TSV-formatted data flows.\n\n### Generating graphviz's dot file\n\n```python\nfrom data_flow_graph import format_graphviz_lines\n\nlines = [{\n    'source': 'Foo \"bar\" test',\n    'metadata': '\"The Edge\"',\n    'target': 'Test \"foo\" 42',\n}]\n\ngraph = format_graphviz_lines(lines)\n```\n\n### Generating TSV file\n\n```python\nfrom data_flow_graph import format_tsv_lines\n\nlines = [\n    {\n        'source': 'foo',\n        'edge': 'select',\n        'target': 'bar',\n    },\n    {\n        'source': 'foo2',\n        'edge': 'select',\n        'target': 'bar',\n        'value': 0.5,\n        'metadata': 'test'\n    },\n]\n\ntsv = format_tsv_lines(lines)\n```\n\n## Links\n\n* [vis.js](https://github.com/almende/vis) for visualization ([a graph example](http://etn.io/))\n* [Interactive \u0026 Dynamic Force-Directed Graphs with D3](https://medium.com/ninjaconcept/interactive-dynamic-force-directed-graphs-with-d3-da720c6d7811)\n* [d3.js curved links graph](https://bl.ocks.org/mbostock/4600693)\n* [Bi-directional hierarchical sankey diagram](http://bl.ocks.org/Neilos/584b9a5d44d5fe00f779)\n","funding_links":[],"categories":["Python","database"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacbre%2Fdata-flow-graph","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmacbre%2Fdata-flow-graph","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmacbre%2Fdata-flow-graph/lists"}