{"id":17436088,"url":"https://github.com/timjjting/task-lineage-generator","last_synced_at":"2026-01-21T18:13:06.223Z","repository":{"id":257951838,"uuid":"868209949","full_name":"TimJJTing/Task-Lineage-Generator","owner":"TimJJTing","description":"A simple Task Lineage Diagram Generator","archived":false,"fork":false,"pushed_at":"2024-10-07T14:23:59.000Z","size":831,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-07T03:08:51.998Z","etag":null,"topics":["d3","dag","data-visualization","golang","graphviz-dot","lineage","task"],"latest_commit_sha":null,"homepage":"https://timjjting.github.io/Task-Lineage-Generator/interface/index.html","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TimJJTing.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-05T18:58:00.000Z","updated_at":"2024-10-07T14:24:03.000Z","dependencies_parsed_at":"2024-10-17T04:40:12.327Z","dependency_job_id":null,"html_url":"https://github.com/TimJJTing/Task-Lineage-Generator","commit_stats":null,"previous_names":["timjjting/task-lineage-generator"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimJJTing%2FTask-Lineage-Generator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimJJTing%2FTask-Lineage-Generator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimJJTing%2FTask-Lineage-Generator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TimJJTing%2FTask-Lineage-Generator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TimJJTing","download_url":"https://codeload.github.com/TimJJTing/Task-Lineage-Generator/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":228552378,"owners_count":17935803,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["d3","dag","data-visualization","golang","graphviz-dot","lineage","task"],"created_at":"2024-10-17T10:02:09.947Z","updated_at":"2026-01-21T18:13:06.182Z","avatar_url":"https://github.com/TimJJTing.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Task Lineage Generator\n\nA simple tool for rendering task lineage diagrams written in Golang. Try [Live Demo](https://timjjting.github.io/Task-Lineage-Generator/interface/index.html)\n\n![Task Lineage Diagram (Colored)](dot.svg)\n\n![Task Lineage Diagram (interactive)](demo.gif)\n\n## Usage\n\n```text\nUsage:\n  tlg [flags]\n\nFlags:\n  -k, --config          Path for the yaml configuration file. (default \"config.yaml\")\n  -c, --color           Color mode. If turned on, the output is colored.\n  -f, --format string   Output file format, one of [svg, dot, png, jpg]. (default \"svg\")\n  -g, --group           Group Layer. If turned on, nodes under the same layer are grouped together, which means they are placed next to each other if possible. (recommended)\n  -h, --help            help for tlg\n  -i, --input string    Root directory for yaml files. (default \".\")\n  -l, --layout string   Graph Layout. Currently support [circo, dot, fdp, neato, osage, patchwork, sfdp, twopi]. (default \"dot\")\n  -n, --no-reach        Turn this on to skip the reachability analysis.\n  -o, --output string   Output file path. (default \"graph\")\n  -r, --reach string    Output file path for the reachability analysis report. (default \"reachability.json\")\n  -s, --size string     Graph size. Currently only support [fhd, a3]. (default \"fhd\")\n```\n\n### Task YAMLs\n\nA task configuration (input) should look like this:\n\n```yaml\ntask: \"task_name\"\ntask_id: \"task_0\"\nstart_date: \"2024-01-01\"\nend_date: \"2024-12-31\"\nfrequency: 1\nunit: \"day\"\nqueue: \"cl\"\nlevel: \"lv1\" # tld uses this to group tasks\nruntime:\n  directory: \"/data/tasks\"\n  executable: \"python\"\n  file: \"task\"\n  extension: \"py\"\ndependency: # dependency tasks, this entry is optional\n  - task_id: \"task_1\" # links to another task\n    storage: \"s3://data/tasks/task_1\"\n    unit: \"day\"\n    frequency: 1\n    start_date: \"2024-01-01\"\n    end_date: \"2024-12-31\"\n  - task_id: \"task_2\" # links to another task\n    storage: \"s3://data/tasks/task_2\"\n    unit: \"week\"\n    frequency: 2\n    start_date: \"2024-01-01\"\n    end_date: \"2024-12-31\"\n```\n\n### Config YAML\n\nThe configuration file, `config.yaml`, should look like this:\n\n```yaml\n# color mapping for task levels\ncolors:\n  # level_name: color_hex\n  lv0: \"#0074FF\"\n  lv1: \"#17C3FF\"\n  lv2: \"#7ADEF4\"\n  lv3: \"#40FDCB\"\n  default: \"#FFAE10\"\n```\n\nYou can have your own mapping.\n\n## Run the Source\n\nYou need to have your Golang 1.20+ environment up and set, then you can do the following:\n\n### Test Run\n\n```sh\ngo run main.go {-i [taskYamlRootDir]} {-o [targetFile]} {-f [format]} {-g} {-l [layout]} {-c} {-n} {-r [reachabilityFile]}\n```\n\nFor example\n\n```sh\n# for colored lineage\ngo run main.go -i ./mock-tasks -o dot.svg -f svg -l dot -c -g\n# for bw lineage\ngo run main.go -i ./mock-tasks -o dot.svg -f svg -l dot -g\n```\n\nyield this\n\n![Task Lineage Diagram (Colored)](dot.svg)\n\nand this\n\n![Task Lineage Diagram (BW)](dot-bw.svg)\n\n### Compile and Run\n\n```sh\n# compile\nGOFLAGS=-mod=mod go build -o bin/tlg main.go\n\n# run the binary\n./bin/tlg {-i [taskYamlRootDir]} {-o [targetFile]} {-f [format]} {-g} {-l [layout]} {-c} {-n} {-r [reachabilityFile]}\n```\n\n## Project Organization\n\n- `cmd`: for cli\n- `dot`: graph construction and rendering\n- `interface`: simple interactive web interface\n- `mock-tasks`: a set of mock YAML files which simulates a real-world data pipeline configuration\n- `reader`: YAML reader for reading task configurations\n- `schema`: data schema\n\n## Project Background\n\nThe data team I work with has developed a custom-built data pipeline consisting of ETL processes and machine learning models. When a task fails or is delayed, all subsequent tasks must be rerun. The challenge lies in determining the cascading effects of rerunning a task, as our pipeline includes over 250 tasks with complex interdependencies. Apart from common issues like data inconsistency and database overloads, understanding the downstream impact of an upstream task trigger has been a significant challenge.\n\nWhen I started thinking about a solution, the concept of a graph immediately came to mind. If each task could be represented as a node and each dependency as an edge, we could visualize the entire pipeline as a graph. Visual programming languages like [PureData](https://puredata.info/) and [TouchDesigner](https://derivative.ca/) have successfully implemented similar concepts by representing functions and data flows as nodes and edges. The beauty of using a graph is that beyond simply visualizing the pipeline, it could also enable us to perform queries and analyze task dependencies in more depth.\n\nHaving all the features we wanted and built explicitly to manage data pipelines, [Apache Airflow](https://airflow.apache.org/) offers a clear solution. However, adopting it would require extensive resources on rewriting the entire pipeline, deploying to our servers, and getting the entire team up to speed. Given the constraints at the time, I set out to develop a simpler solution that could integrate seamlessly into our existing pipeline without code migration. I believed that a well-designed visualization tool would resolve the lack of clarity.\n\n### Project Goal\n\n- Visualize task lineages as Directed Acyclic Graphs (DAGs)\n- Keep the solution as simple as possible\n- Seamless integration with our existing pipeline\n- Automatically refresh the visualization upon task specification (yaml) updates\n- Provide interactivity to enhance usability\n\n### Project Development\n\n#### Technical Decisions\n\nAt the time, I frequently used GraphViz to create charts, and I realized it could be an ideal tool for visualizing task lineages. The key was to ensure it runs automatically whenever task specifications are updated. Since we use GitLab for version control, I leveraged GitLab CI to trigger jobs on code commits. The next question was how to encapsulate GraphViz within a CI environment and generate lineage diagrams from task specifications.\n\nAfter some research, I found the Golang package [goccy/go-graphviz](https://github.com/goccy/go-graphviz), which provided all the necessary features, including graph representation and traversal via a Golang API. Along with the [YAML package](https://gopkg.in/yaml.v3) for reading task specifications, I was able to compile the entire project into a small binary that others could use, or package it into a Docker image for integration into CI tasks. The workflow was structured as follows:\n\n```mermaid\nflowchart LR\n    subgraph tool[\"Task Lineage Generator CLI\"]\n    reader[\"YAML Reader\"] --\u003e node[\"Draw Nodes\"] --\u003e edge[\"Draw Edges\"] --\u003e out1[\"Construct Graph Meta\"] --\u003e out2[\"Output Diagram\"]\n    end\n    yaml[\"Task Configs (YAML)\"] --\u003e tool\n    tool --\u003e diag[\"Lineage Diagrams\"]\n    tool --\u003e meta[\"Graph Meta *reachability.json*\"]\n```\n\nAnd the setup for the CI/CD pipeline:\n\n1. Import task configurations as submodules of a main repo\n2. For every task configuration update, trigger main repo’s reference update to trigger its CI\n3. Run Task Lineage Generator in main repo's CI\n4. In the end of the CI, publish the lineage diagram\n\n#### Interactive Interface\n\nTo enhance the diagram's usability, I developed a simple website to host the SVGs with basic interactivity. Staying true to the simplicity goal, I chose to avoid frameworks and used only HTML, CSS, and D3.js. This allowed me to implement essential features such as panning, zooming, highlighting, and tooltips to display detailed information for each task. This minimal yet effective interactivity resolved the clarity issues, providing a functional and user-friendly visualization tool. (see `/interface`)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimjjting%2Ftask-lineage-generator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftimjjting%2Ftask-lineage-generator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftimjjting%2Ftask-lineage-generator/lists"}