https://github.com/semcod/mdflow
https://github.com/semcod/mdflow
Last synced: 8 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/semcod/mdflow
- Owner: semcod
- License: apache-2.0
- Created: 2026-05-03T07:53:01.000Z (about 1 month ago)
- Default Branch: main
- Last Pushed: 2026-05-03T08:52:54.000Z (about 1 month ago)
- Last Synced: 2026-05-03T10:19:39.744Z (about 1 month ago)
- Language: HTML
- Size: 1.02 MB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# mdflow
## AI Cost Tracking
   
  
- π€ **LLM usage:** $0.7500 (5 commits)
- π€ **Human dev:** ~$200 (2.0h @ $100/h, 30min dedup)
Generated on 2026-05-03 using [openrouter/qwen/qwen3-coder-next](https://openrouter.ai/qwen/qwen3-coder-next)
---
**Markdown dependency analyzer β extract all dependencies, generate diagrams and charts.**
`mdflow` parses Markdown files and extracts every possible structural element:
headings, links, fenced code blocks (including `markpact:*` embedded file references),
list items, TOON/YAML quality sections, and document metadata.
It then generates Mermaid diagrams, HTML reports, and Markdown summaries.
---
## What it extracts
| Element | Details |
|---|---|
| **Headings** | Full H1βH6 hierarchy, anchor slugs |
| **Links** | `[text](href)` β classified as internal / external / anchor / image |
| **Code blocks** | Language, content, line range, `markpact:type path=...` metadata |
| **List items** | Depth, parent heading, clean text |
| **TOON sections** | ALERTS, REFACTOR, HOTSPOTS, HEALTH, NEXT, RISKS, PIPELINES⦠|
| **Document metadata** | `## Metadata` key/value lists |
| **Cross-doc dependencies** | Links between files, `markpact` embedded file paths |
---
## Generated outputs
| Output | Description |
|---|---|
| `{stem}_report.html` | Self-contained HTML report with all diagrams (Mermaid.js) |
| `{stem}_report.md` | Markdown summary with inline Mermaid |
| `{stem}_heading_mindmap.mermaid` | Mindmap of heading hierarchy |
| `{stem}_section_flow.mermaid` | Section flowchart with code/link annotations |
| `{stem}_code_pie.mermaid` | Pie chart of code blocks by language |
| `{stem}_markpact_graph.mermaid` | Graph of embedded file references |
| `{stem}_alerts_graph.mermaid` | TOON alerts & refactor tasks flowchart |
| `{stem}_workflow.mermaid` | DOQL workflow steps diagram |
| `dependency_graph.html` | Cross-document dependency graph (directory scan) |
---
## Installation
```bash
# Clone or copy the mdflow/ directory, then:
pip install -e .
# No mandatory dependencies β pure stdlib.
```
---
## Usage
### Python API
```python
from mdflow import MdFlow
flow = MdFlow()
# ββ Single file βββββββββββββββββββββββββββββββββββββββββββββββ
doc = flow.parse("SUMR.md")
print(doc.title) # "Ze ΕΊrΓ³deΕ"
print(len(doc.headings)) # 24
print([ts.name for ts in doc.toon_sections]) # ['HEALTH', 'REFACTOR', ...]
print(doc.metadata) # {'name': 'redsl', 'version': '1.2.45', ...}
# Access markpact embedded file references
for cb in doc.markpact_blocks:
print(f"markpact:{cb.markpact_type} path={cb.markpact_path}")
# Get TOON quality metrics
metrics = flow.toon_metrics(doc)
print(metrics["health"]) # {'cc_mean': 20.0, 'critical': 7}
print(metrics["refactors"][:3]) # list of refactor tasks
# Get all Mermaid diagrams as strings (no files written)
diagrams = flow.diagrams(doc)
print(diagrams["section_flow"]) # flowchart TD ...
# Generate reports to disk
flow.report(doc, "output/") # writes HTML + MD + .mermaid files
# ββ Directory scan ββββββββββββββββββββββββββββββββββββββββββββ
docs, graph = flow.scan("docs/", "output/")
print(f"{len(docs)} files, {len(graph.edges)} dependency edges")
```
### CLI
```bash
# Analyze a single file
mdflow analyze SUMR.md --output output/
# Select formats
mdflow analyze SUMR.md --format html,md
# Scan a directory
mdflow scan docs/ --output output/
# Print a specific Mermaid diagram to stdout
mdflow diagram SUMR.md --diagram section_flow
mdflow diagram SUMR.md --diagram list # list available diagrams
# Write diagram to file
mdflow diagram SUMR.md --diagram alerts_graph -o alerts.mermaid
```
## Mermaid validation
Every generated `.mermaid` file is automatically validated before writing.
Detected issues are printed inline and written as tickets to `TODO.md`:
```
[mdflow] β 1 error(s) output/SUMR_section_flow.mermaid
β [BACKTICK_IN_LABEL] Backtick inside node label (line 5): ...
[mdflow] β 1 validation ticket(s) written to TODO.md
```
Validation checks: `EMPTY_DIAGRAM`, `NO_DIAGRAM_TYPE`, `BACKTICK_IN_LABEL`,
`DUPLICATE_NODE_ID`, `MINDMAP_ILLEGAL_CHARS`.
---
## Quality tooling
mdflow uses [`prefact`](https://github.com/semcod/prefact) and
[`pyqual`](https://github.com/semcod/pyqual) for automated code quality gates.
```bash
# Run full quality loop (prefact scan β ruff β pytest β LLM fix on fail)
task quality # alias: pyqual run
# Scan for code issues (duplicate imports, wildcard imports, β¦)
task prefact # alias: prefact scan -p .
# Auto-fix detected issues
task prefact-fix # alias: prefact fix -p .
```
A **git pre-commit hook** (`.git/hooks/pre-commit`) runs all checks automatically
before every commit and blocks on failures, writing tickets to `TODO.md`.
---
## Testing
### Unit tests
```bash
pytest tests/ -v
```
### E2E / CLI tests (TestQL)
142 scenarios covering CLI commands, output file validation, and integration
with real semcod workspace projects:
```bash
# All scenarios
task testql-run
# Smoke only (help, subcommands)
task testql-smoke
# Full E2E (analyze, scan, diagram, semcod projects, mermaid validation)
task testql-e2e
# Single scenario
testql run testql-scenarios/02_cli_analyze_e2e.testql.toon.yaml
```
Scenarios in `testql-scenarios/`:
| File | Tests | Scope |
|---|---|---|
| `01_cli_help_version` | 16 | help, subcommand help |
| `02_cli_analyze_e2e` | 35 | analyze: HTML/MD/mermaid output |
| `03_cli_scan_e2e` | 13 | scan: per_file output, dependency graph |
| `04_cli_diagram_e2e` | 23 | diagram: list, stdout, file, unknown name |
| `05_e2e_semcod_projects` | 30 | prefact, pyqual, planfile, goal SUMD.md |
| `06_e2e_mermaid_validation` | 22 | backtick-free labels, pie title format |
---
## Architecture
```
mdflow/
βββ __init__.py β MdFlow faΓ§ade (high-level API)
βββ models.py β Data classes: MdDocument, DependencyGraph, β¦
βββ parser.py β Core Markdown parser (stdlib only)
βββ validators.py β Mermaid diagram validator + TODO.md ticket writer
βββ analyzers/
β βββ __init__.py β DependencyAnalyzer, StructureAnalyzer,
β CodeInventoryAnalyzer, ToonAnalyzer
βββ generators/
β βββ __init__.py
β βββ mermaid.py β All Mermaid diagram generators
β βββ html.py β Self-contained HTML report (split into helpers)
β βββ markdown.py β Markdown summary report (split into helpers)
βββ cli.py β argparse CLI entry point
```
---
## Examples
### Basic
- **`examples/basic/01_parse_single_file.py`** β Parse and inspect a single document
- **`examples/basic/02_generate_reports.py`** β Generate HTML, Markdown, and Mermaid reports
- **`examples/basic/03_diagrams_as_strings.py`** β Get diagrams as strings (no file I/O)
- **`examples/basic/04_cli_basics.sh`** β CLI: `analyze`, `scan`, `diagram`
### Advanced
- **`examples/advanced/01_directory_scan.py`** β Scan a directory, build dependency graphs
- **`examples/advanced/02_toon_analysis.py`** β Extract TOON quality metrics
- **`examples/advanced/03_custom_diagram_pipeline.py`** β Custom HTML with selected diagrams
### API / Extensibility
- **`examples/api/01_low_level_parser.py`** β Use `MdParser` directly
- **`examples/api/02_custom_analyzer.py`** β Build your own analyzer
### semcod workspace
- **`examples/semcod/analyze_prefact.py`** β Parse `prefact/SUMD.md`, extract TOON metrics
- **`examples/semcod/scan_semcod_workspace.py`** β Scan 6 semcod projects, cross-project TOON summary
- **`examples/semcod/toon_comparison.py`** β CC/alerts/refactors comparison table across projects
- **`examples/semcod/04_cli_semcod.sh`** β CLI shell examples for the semcod workspace
```bash
python examples/semcod/toon_comparison.py
python examples/semcod/scan_semcod_workspace.py
```
---
## Supported TOON sections
`mdflow` recognises these TOON section names inside `toon` / `yaml` code blocks
and in blocks tagged `markpact:analysis`:
`ALERTS` Β· `REFACTOR` Β· `HOTSPOTS` Β· `HEALTH` Β· `NEXT` Β· `RISKS` Β· `PIPELINES`
Β· `DUPLICATES` Β· `WARNINGS` Β· `MODULES` Β· `EVOLUTION` Β· `COUPLING`
---
## Extension points
- **Custom extractor**: subclass or monkey-patch `MdParser`
- **Custom diagram**: call `flow.diagrams(doc)` and extend the `mermaid` module
- **Graphviz output**: install `graphviz` Python package and use
`DependencyGraph` data directly
---
## License
Licensed under Apache-2.0.