{"id":51412957,"url":"https://github.com/mateffy/datamark","last_synced_at":"2026-07-04T16:02:12.484Z","repository":{"id":367804777,"uuid":"1254291568","full_name":"mateffy/datamark","owner":"mateffy","description":"A toolkit for creating Markdown-based file formats and mapping AST nodes (headings/codeblock/todos/...) to structured data.","archived":false,"fork":false,"pushed_at":"2026-06-27T17:53:59.000Z","size":8183,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2026-06-27T19:19:22.279Z","etag":null,"topics":["file-format","lexer","markdown","parser","typescript"],"latest_commit_sha":null,"homepage":"https://datamark.md","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mateffy.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":"AGENTS.md","dco":null,"cla":null}},"created_at":"2026-05-30T11:38:51.000Z","updated_at":"2026-06-27T17:53:09.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/mateffy/datamark","commit_stats":null,"previous_names":["mateffy/datamark"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/mateffy/datamark","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mateffy%2Fdatamark","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mateffy%2Fdatamark/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mateffy%2Fdatamark/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mateffy%2Fdatamark/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mateffy","download_url":"https://codeload.github.com/mateffy/datamark/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mateffy%2Fdatamark/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":35127443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-07-04T02:00:05.987Z","response_time":113,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["file-format","lexer","markdown","parser","typescript"],"created_at":"2026-07-04T16:02:11.812Z","updated_at":"2026-07-04T16:02:12.474Z","avatar_url":"https://github.com/mateffy.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cbr/\u003e\n\n\u003cp align=\"center\"\u003e\n  \u003cpicture\u003e\n    \u003csource media=\"(prefers-color-scheme: dark)\" srcset=\"./packages/documentation/public/datamark-logo-light.svg\"\u003e\n    \u003cimg alt=\"OpenStorage Logo\" src=\"./packages/documentation/public/datamark-logo.svg\" align=\"center\" width=\"275\"\u003e\n  \u003c/picture\u003e\n\u003c/p\u003e\n\n\u003cbr/\u003e\n\n# `datamark` – Use Markdown as a Data Format\n\n**datamark** is a TypeScript library for turning Markdown documents into typed, validated objects — and back again. It gives you a **unified AST with a native section tree**, a lightweight **format system** for bidirectional parse/stringify, and typed **Markdown builder primitives** for serialization.\n\n\n\u003cbr\u003e\n\n## Why datamark?\n\nMarkdown is everywhere: blogs, changelogs, API docs, READMEs, and more. But Markdown is a text format, not a data format. It can be hard to extract structured data from it, and even harder to write valid Markdown back after partial edits.\n\n- *\"Changelogs are hand-written, but we want to validate them and extract structured release data.\"*\n- *\"We use frontmatter for blog post metadata, but we need structured data about the sections for table of contents and SEO.\"*\n- *\"I want to edit all headings in a Markdown document and reserialize it without losing formatting.\"*\n- *\"API docs are Markdown files with code blocks. We want to extract endpoints and examples as typed data.\"*\n- *\"I want to make Markdown code snippets executable.\"*\n\ndatamark bridges that gap: you define a format with a frontmatter schema, an output schema, and parse/stringify functions. The resulting data is fully typed and validated, and you can convert it back to Markdown without losing formatting.\n\n| SDK | What it does |\n|---|---|\n| **Format SDK** | Define `datamark()` formats: typed frontmatter, validated output, bidirectional parse/stringify |\n| **Parse SDK** | Parse Markdown into a typed AST with frontmatter, section tree, headings, code blocks, lists, tables, and todos |\n| **Stringify SDK** | Build Markdown with typed primitives: `heading()`, `codeBlock()`, `list()`, `frontmatter()`, `paragraph()`, and more |\n\nBring your own validator. Zod, Valibot, ArkType, TypeBox — anything implementing Standard Schema v1 works out of the box.\n\n---\n\n## Quick start\n\n### 1. Install\n\n```bash\nnpm install datamark\n# or\nbun add datamark\n```\n\nFor validation, also install your schema library:\n\n```bash\nnpm install zod\n```\n\n### 2. Parse your first document\n\nDefine a format — frontmatter schema, output schema, parse function:\n\n```typescript\nimport { datamark } from \"datamark\";\nimport { inlineText, textContent, findAll, isCodeBlock } from \"datamark/parse\";\nimport { frontmatter, heading, paragraph, codeBlock } from \"datamark/stringify\";\nimport * as z from \"zod\";\n\nexport const PlanFormat = datamark({\n  frontmatterSchema: z.object({ id: z.string() }),\n  schema: z.object({\n    id: z.string(),\n    title: z.string(),\n    steps: z.array(z.object({\n      description: z.string(),\n      scripts: z.array(z.string()),\n    })),\n  }),\n\n  parse(doc) {\n    const id = doc.frontmatter.id; // typed as string\n    const titleSection = doc.root.children.find((n) =\u003e n.type === \"section\") as any;\n    const title = titleSection ? inlineText(titleSection.heading.children) : \"\";\n\n    const steps = titleSection\n      ? titleSection.children\n          .filter((n: any) =\u003e n.type === \"section\")\n          .map((section: any) =\u003e {\n            const scripts = findAll(section, (n) =\u003e isCodeBlock(n, \"javascript\")).map(\n              (n: any) =\u003e n.value\n            );\n            const description = textContent(section).trim();\n            return { description, scripts };\n          })\n      : [];\n\n    return { id, title, steps };\n  },\n\n  stringify(data) {\n    let md = frontmatter({ id: data.id }) + heading(data.title) + \"\\n\\n\";\n    for (const step of data.steps) {\n      md += heading(\"Step\", 2) + \"\\n\\n\" + paragraph(step.description) + \"\\n\\n\";\n      for (const script of step.scripts) {\n        md += codeBlock(script, \"javascript\") + \"\\n\\n\";\n      }\n    }\n    return md;\n  },\n});\n```\n\n### 3. Use it\n\n```typescript\nconst input = `---\nid: plan-001\n---\n# Q3 Roadmap\n\n## Step\n\nSet up the project.\n\n\\`\\`\\`javascript\nnpm init -y\n\\`\\`\\`\n\n## Step\n\nImplement the core features.\n`;\n\nlet data = PlanFormat.parse(input);\nconsole.log(data.id);     // \"plan-001\"\nconsole.log(data.title);  // \"Q3 Roadmap\"\nconsole.log(data.steps[0].description); // \"Set up the project.\"\n\ndata.title = \"Roadmap for Q3\";\n\nconst md = PlanFormat.stringify(data);\n// Back to Markdown, ready to save\n```\n\n---\n\n## What datamark is\n\n- **A typed Markdown parser.** Frontmatter is parsed as YAML. The body becomes a proper AST with headings, paragraphs, code blocks, lists, tables, blockquotes — and a native section tree where H1/H2/H3 are parent nodes, not flat siblings.\n- **A format system.** Define `datamark()` with a schema and parse/stringify functions. The result is fully typed and validated.\n- **A Markdown builder.** Typed primitives like `heading()`, `codeBlock()`, `list()`, `frontmatter()` — no string concatenation, no indentation bugs.\n\n## What datamark is NOT\n\n- **It is not a static site generator.** It parses and transforms Markdown, but it does not build HTML pages or apply themes.\n- **It is not a Markdown renderer.** It produces data and Markdown strings, not HTML.\n- **It does not stream.** Input string in, typed object out.\n- **It is not a general-purpose parser generator.** It is specifically designed for Markdown documents.\n\n---\n\n## Core concepts\n\n### Section tree\n\nUnlike flat ASTs where headings are just block nodes, datamark nests them. Every heading becomes a `SectionNode` that owns everything beneath it until the next heading of equal or greater depth.\n\n```typescript\nimport { parse } from \"datamark\";\n\nconst doc = parse(`\n# Title\n\nIntro paragraph.\n\n## Section A\n\nSome text.\n\n### Subsection A1\n\nMore text.\n\n## Section B\n\nFinal text.\n`);\n\nconst topSection = doc.root.children[0] as any;\nconsole.log(topSection.heading.depth); // 1\n\nconst subSections = topSection.children.filter((n: any) =\u003e n.type === \"section\");\nconsole.log(subSections.length); // 2 (Section A and Section B)\n```\n\nThis makes traversal intuitive: find a section, look inside its `children` for nested sections, code blocks, lists, or paragraphs.\n\n### Frontmatter validation\n\nFrontmatter is validated before your parse function runs. If `frontmatterSchema` is provided, `doc.frontmatter` is typed inside your parse function — no casting needed.\n\n```typescript\nconst BlogFormat = datamark({\n  frontmatterSchema: z.object({\n    title: z.string(),\n    date: z.string(),\n    author: z.string(),\n  }),\n  parse(doc) {\n    const meta = doc.frontmatter; // typed as { title, date, author }\n    // ...\n  },\n});\n```\n\nIf the YAML is malformed, you get `FrontmatterError`. If it fails schema validation, you get `ValidationError` with structured issue data.\n\n### Markdown builder primitives\n\nThe Stringify SDK gives you typed functions for every Markdown construct. No more template literal indentation bugs:\n\n```typescript\nimport { frontmatter, heading, paragraph, codeBlock, list } from \"datamark/stringify\";\n\nconst markdown = [\n  frontmatter({ title: \"API Guide\", version: \"2.0\" }),\n  heading(\"Authentication\", 2),\n  paragraph(\"Use Bearer tokens for all requests.\"),\n  codeBlock(\"fetch('/api', { headers: { Authorization: 'Bearer ...' } });\", \"javascript\"),\n  list([\"Install the SDK\", \"Configure your API key\", \"Make your first request\"]),\n].join(\"\\n\\n\");\n```\n\n---\n\n## SDK layers\n\ndatamark is three focused SDKs that compose together:\n\n| Layer | Import | Purpose |\n|---|---|---|\n| **Parse SDK** | `datamark/parse` | Parse documents, extract frontmatter, query the AST with `find`, `findAll`, `textContent`, `extractTodoItems`, `sectionsAtDepth`, `splitBy`, etc. |\n| **Format SDK** | `datamark` | Define `datamark()` formats with typed frontmatter, validated output, and bidirectional parse/stringify |\n| **Stringify SDK** | `datamark/stringify` | Build Markdown with typed primitives: `heading`, `paragraph`, `codeBlock`, `list`, `blockquote`, `frontmatter`, `strong`, `em`, `link`, etc. |\n\nUse only what you need. If you just want to parse Markdown and query it, import `datamark/parse`. If you want full bidirectional formats, use the Format SDK. If you just need to generate Markdown strings, use the Stringify SDK.\n\n---\n\n## Philosophy\n\n- **Markdown as a data format.** Not just human-readable text — structured, typed, validated data that happens to render everywhere.\n- **One definition, two directions.** Define `parse` and `stringify` in the same place. Round-trip by design.\n- **Bring your own validator.** Standard Schema v1 means Zod, Valibot, ArkType, TypeBox, and anything else compliant — no lock-in.\n- **AST-native section tree.** Headings are parent nodes, not flat siblings. Traversal is intuitive, not imperative.\n- **Deterministic.** No global state, no side effects. Same input, same output, every time.\n\n---\n\n## Documentation\n\n| What you need | Link |\n|---------------|------|\n| Get started in 5 minutes | [Quickstart](https://datamark.md/docs/quickstart) |\n| Parse SDK reference | [Parse SDK](https://datamark.md/docs/parse) |\n| Stringify SDK reference | [Stringify SDK](https://datamark.md/docs/stringify) |\n| Format SDK reference | [Format SDK](https://datamark.md/docs/template) |\n| Understand the AST | [AST deep dive](https://datamark.md/docs/explanation/ast) |\n| Real-world examples | [Examples](https://datamark.md/docs/examples) |\n| Compare to alternatives | [Comparisons](https://datamark.md/compare) |\n\nFull documentation lives at **[datamark.md](https://datamark.md)**.\n\n---\n\n## License\n\nMIT\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmateffy%2Fdatamark","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmateffy%2Fdatamark","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmateffy%2Fdatamark/lists"}