{"id":15102714,"url":"https://github.com/lennartpollvogt/markdown-to-data","last_synced_at":"2025-08-19T04:31:15.923Z","repository":{"id":255988607,"uuid":"854053020","full_name":"lennartpollvogt/markdown-to-data","owner":"lennartpollvogt","description":"Convert markdown and its elements (tables, lists, code, etc.) into structured, easily processable data formats like lists and hierarchical dictionaries (or JSON), with support for parsing back to markdown.","archived":false,"fork":false,"pushed_at":"2024-12-15T20:33:21.000Z","size":182,"stargazers_count":3,"open_issues_count":1,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-15T20:49:32.521Z","etag":null,"topics":["dictionaries","json","lists","markdown","markdown-parser","markdown-to-data","markdown-to-json","md","parser","parsing","tables"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lennartpollvogt.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-09-08T09:28:11.000Z","updated_at":"2024-12-15T20:32:21.000Z","dependencies_parsed_at":"2024-09-08T11:26:37.690Z","dependency_job_id":"6c669e2b-c661-4d9a-b037-43232b2c409d","html_url":"https://github.com/lennartpollvogt/markdown-to-data","commit_stats":{"total_commits":2,"total_committers":1,"mean_commits":2.0,"dds":0.0,"last_synced_commit":"5b6803c073757a0ff26b9551a79b5b157f41a024"},"previous_names":["lennartpollvogt/markdown-to-data"],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lennartpollvogt%2Fmarkdown-to-data","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lennartpollvogt%2Fmarkdown-to-data/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lennartpollvogt%2Fmarkdown-to-data/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lennartpollvogt%2Fmarkdown-to-data/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lennartpollvogt","download_url":"https://codeload.github.com/lennartpollvogt/markdown-to-data/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":230317806,"owners_count":18207804,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dictionaries","json","lists","markdown","markdown-parser","markdown-to-data","markdown-to-json","md","parser","parsing","tables"],"created_at":"2024-09-25T19:05:19.457Z","updated_at":"2025-08-19T04:31:15.907Z","avatar_url":"https://github.com/lennartpollvogt.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# markdown-to-data\n\nConvert markdown and its elements (tables, lists, code, etc.) into structured, easily processable data formats like lists and hierarchical dictionaries (or JSON), with support for parsing back to markdown.\n\n## Status\n\n- [x] Detect, extract and convert markdown building blocks into Python data structures\n- [x] Provide two formats for parsed markdown:\n  - [x] List format: Each building block as separate dictionary in a list\n  - [x] Dictionary format: Nested structure using headers as keys\n- [x] Convert parsed markdown to JSON\n- [x] Parse markdown data back to markdown formatted string\n  - [x] Add options which data gets parsed back to markdown\n- [x] Extract specific building blocks (e.g., only tables or lists)\n- [x] Support for task lists (checkboxes)\n- [x] Enhanced code block handling with language detection\n- [x] Comprehensive blockquote support with nesting\n- [x] Consistent handling of definition lists\n- [x] Provide comprehensive documentation\n- [x] Add more test coverage --\u003e 215 test cases\n- [x] Publish on PyPI\n- [x] Add line numbers (`start_line` and `end_line`) to parsed markdown elements\n- [ ] Align with edge cases of [Common Markdown Specification](https://spec.commonmark.org/0.31.2/)\n\n## Quick Overview\n\n### Install\n\n```bash\npip install markdown-to-data\n```\n\n### Basic Usage\n\n```python\nfrom markdown_to_data import Markdown\n\nmarkdown = \"\"\"\n---\ntitle: Example text\nauthor: John Doe\n---\n\n# Main Header\n\n- [ ] Pending task\n    - [x] Completed subtask\n- [x] Completed task\n\n## Table Example\n| Column 1 | Column 2 |\n|----------|----------|\n| Cell 1   | Cell 2   |\n\n´´´python\ndef hello():\n    print(\"Hello World!\")\n´´´\n\"\"\"\n\nmd = Markdown(markdown)\n\n# Get parsed markdown as list\nprint(md.md_list)\n# Each building block is a separate dictionary in the list\n\n# Get parsed markdown as nested dictionary\nprint(md.md_dict)\n# Headers are used as keys for nesting content\n\n# Get information about markdown elements\nprint(md.md_elements)\n```\n\n### Output Formats\n\n#### List Format (`md.md_list`)\n\n```python\n[\n    {\n        'metadata': {'title': 'Example text', 'author': 'John Doe'},\n        'start_line': 2,\n        'end_line': 5\n    },\n    {\n        'header': {'level': 1, 'content': 'Main Header'},\n        'start_line': 7,\n        'end_line': 7\n    },\n    {\n        'list': {\n            'type': 'ul',\n            'items': [\n                {\n                    'content': 'Pending task',\n                    'items': [\n                        {\n                            'content': 'Completed subtask',\n                            'items': [],\n                            'task': 'checked'\n                        }\n                    ],\n                    'task': 'unchecked'\n                },\n                {'content': 'Completed task', 'items': [], 'task': 'checked'}\n            ]\n        },\n        'start_line': 9,\n        'end_line': 11\n    },\n    {\n        'header': {'level': 2, 'content': 'Table Example'},\n        'start_line': 13,\n        'end_line': 13\n    },\n    {\n        'table': {'Column 1': ['Cell 1'], 'Column 2': ['Cell 2']},\n        'start_line': 14,\n        'end_line': 16\n    },\n    {\n        'code': {\n            'language': 'python',\n            'content': 'def hello():\\n    print(\"Hello World!\")'\n        },\n        'start_line': 18,\n        'end_line': 21\n    }\n]\n```\n\n#### Dictionary Format (`md.md_dict`)\n\n```python\n{\n    'metadata': {'title': 'Example text', 'author': 'John Doe'},\n    'Main Header': {\n        'list_1': {\n            'type': 'ul',\n            'items': [\n                {\n                    'content': 'Pending task',\n                    'items': [\n                        {\n                            'content': 'Completed subtask',\n                            'items': [],\n                            'task': 'checked'\n                        }\n                    ],\n                    'task': 'unchecked'\n                },\n                {'content': 'Completed task', 'items': [], 'task': 'checked'}\n            ]\n        },\n        'Table Example': {\n            'table_1': {'Column 1': ['Cell 1'], 'Column 2': ['Cell 2']},\n            'code_1': {\n                'language': 'python',\n                'content': 'def hello():\\n    print(\"Hello World!\")'\n            }\n        }\n    }\n}\n```\n\n#### MD Elements (`md.md_elements`)\n\n```python\n{\n    'metadata': {\n        'count': 1,\n        'positions': [0],\n        'variants': ['2_fields'],\n        'summary': {}\n    },\n    'header': {\n        'count': 2,\n        'positions': [1, 3],\n        'variants': ['h1', 'h2'],\n        'summary': {'levels': {1: 1, 2: 1}}\n    },\n    'list': {\n        'count': 1,\n        'positions': [2],\n        'variants': ['task', 'ul'],\n        'summary': {'task_stats': {'checked': 2, 'unchecked': 1, 'total_tasks': 3}}\n    },\n    'table': {\n        'count': 1,\n        'positions': [4],\n        'variants': ['2_columns'],\n        'summary': {'column_counts': [2], 'total_cells': 2}\n    },\n    'paragraph': {\n        'count': 4,\n        'positions': [5, 6, 7, 8],\n        'variants': [],\n        'summary': {}\n    }\n}\n```\n\nThe enhanced `md_elements` property now provides:\n\n- **Extended variant tracking**: Headers show level variants (h1, h2, etc.), tables show column counts, lists identify task lists\n- **Summary statistics**: Detailed analytics for each element type including task list statistics, language distribution for code blocks, header level distribution, table cell counts, and blockquote nesting depth\n- **Better performance**: Fixed O(n²) performance issue with efficient indexing\n- **Consistent output**: Variants are sorted lists instead of sets for predictable results\n\n### Parse back to markdown (`to_md`)\n\nThe `Markdown` class provides a method to parse markdown data back to markdown-formatted strings.\nThe `to_md` method comes with options to customize the output:\n\n```python\nfrom markdown_to_data import Markdown\n\nmarkdown = \"\"\"\n---\ntitle: Example\n---\n\n# Main Header\n\n- [x] Task 1\n    - [ ] Subtask\n- [ ] Task 2\n\n## Code Example\n´´´python\nprint(\"Hello\")\n´´´\n\"\"\"\n\nmd = Markdown(markdown)\n```\n\n**Example 1**: Include specific elements\n\n```python\nprint(md.to_md(\n    include=['header', 'list'],  # Include all headers and lists\n    spacer=1  # One empty line between elements\n))\n```\n\nOutput:\n\n```markdown\n# Main Header\n\n- [x] Task 1\n  - [ ] Subtask\n- [ ] Task 2\n```\n\n**Example 2**: Include by position and exclude specific types\n\n```python\nprint(md.to_md(\n    include=[0, 1, 2],  # Include first three elements\n    exclude=['code'],   # But exclude any code blocks\n    spacer=2           # Two empty lines between elements\n))\n```\n\nOutput:\n\n```markdown\n---\ntitle: Example\n---\n\n# Main Header\n\n- [x] Task 1\n  - [ ] Subtask\n- [ ] Task 2\n```\n\n#### Using `to_md_parser` Function\n\nThe `to_md_parser` function can be used directly to convert markdown data structures to markdown text:\n\n```python\nfrom markdown_to_data import to_md_parser\n\ndata = [\n    {\n        'metadata': {\n            'title': 'Document'\n        }\n    },\n    {\n        'header': {\n            'level': 1,\n            'content': 'Title'\n        }\n    },\n    {\n        'list': {\n            'type': 'ul',\n            'items': [\n                {\n                    'content': 'Task 1',\n                    'items': [],\n                    'task': 'checked'\n                }\n            ]\n        }\n    }\n]\n\nprint(to_md_parser(data=data, spacer=1))\n```\n\nOutput:\n\n```markdown\n---\ntitle: Document\n---\n\n# Title\n\n- [x] Task 1\n```\n\n## Supported Markdown Elements\n\n### Metadata (YAML frontmatter)\n\n```python\nmetadata = '''\n---\ntitle: Document\nauthor: John Doe\ntags: markdown, documentation\n---\n'''\n\nmd = Markdown(metadata)\nprint(md.md_list)\n```\n\nOutput:\n\n```python\n[\n    {\n        'metadata': {\n            'title': 'Document',\n            'author': 'John Doe',\n            'tags': ['markdown', 'documentation']\n        },\n        'start_line': 2,\n        'end_line': 6\n    }\n]\n```\n\n### Headers\n\n```python\nheaders = '''\n# Main Title\n## Section\n### Subsection\n'''\n\nmd = Markdown(headers)\nprint(md.md_list)\n```\n\nOutput:\n\n```python\n[\n    {\n        'header': {'level': 1, 'content': 'Main Title'},\n        'start_line': 2,\n        'end_line': 2\n    },\n    {\n        'header': {\n            'level': 2,\n            'content': 'Section'\n        },\n        'start_line': 3,\n        'end_line': 3\n    },\n    {\n        'header': {'level': 3, 'content': 'Subsection'},\n        'start_line': 4,\n        'end_line': 4\n    }\n]\n```\n\n### Lists (Including Task Lists)\n\n```python\nlists = '''\n- Regular item\n    - Nested item\n- [x] Completed task\n    - [ ] Pending subtask\n1. Ordered item\n    1. Nested ordered\n'''\n\nmd = Markdown(lists)\nprint(md.md_list)\n```\n\nOutput:\n\n```python\n[\n    {\n        'list': {\n            'type': 'ul',\n            'items': [\n                {\n                    'content': 'Regular item',\n                    'items': [\n                        {'content': 'Nested item', 'items': [], 'task': None}\n                    ],\n                    'task': None\n                },\n                {\n                    'content': 'Completed task',\n                    'items': [\n                        {\n                            'content': 'Pending subtask',\n                            'items': [],\n                            'task': 'unchecked'\n                        }\n                    ],\n                    'task': 'checked'\n                }\n            ]\n        },\n        'start_line': 2,\n        'end_line': 5\n    },\n    {\n        'list': {\n            'type': 'ol',\n            'items': [\n                {\n                    'content': 'Ordered item',\n                    'items': [\n                        {'content': 'Nested ordered', 'items': [], 'task': None}\n                    ],\n                    'task': None\n                }\n            ]\n        },\n        'start_line': 6,\n        'end_line': 7\n    }\n]\n```\n\n### Tables\n\n```python\ntables = '''\n| Header 1 | Header 2 |\n|----------|----------|\n| Value 1  | Value 2  |\n| Value 3  | Value 4  |\n'''\n\nmd = Markdown(tables)\nprint(md.md_list)\n```\n\nOutput:\n\n```python\n[\n    {\n        'table': {\n            'Header 1': ['Value 1', 'Value 3'],\n            'Header 2': ['Value 2', 'Value 4']\n        },\n        'start_line': 2,\n        'end_line': 5\n    }\n]\n```\n\n### Code Blocks\n\n```python\ncode = '''\n´´´python\ndef example():\n    return \"Hello\"\n´´´\n\n´´´javascript\nconsole.log(\"Hello\");\n´´´\n'''\n\nmd = Markdown(code)\nprint(md.md_list)\n```\n\nOutput:\n\n```python\n[\n    {\n        'code': {\n            'language': 'python',\n            'content': 'def example():\\n    return \"Hello\"'\n        },\n        'start_line': 2,\n        'end_line': 5\n    },\n    {\n        'code': {'language': 'javascript', 'content': 'console.log(\"Hello\");'},\n        'start_line': 7,\n        'end_line': 9\n    }\n]\n```\n\n### Blockquotes\n\n```python\nblockquotes = '''\n\u003e Simple quote\n\u003e Multiple lines\n\n\u003e Nested quote\n\u003e\u003e Inner quote\n\u003e Back to outer\n'''\n\nmd = Markdown(blockquotes)\nprint(md.md_list)\n```\n\nOutput:\n\n```python\n[\n    {\n        'blockquote': [\n            {'content': 'Simple quote', 'items': []},\n            {'content': 'Multiple lines', 'items': []}\n        ],\n        'start_line': 2,\n        'end_line': 3\n    },\n    {\n        'blockquote': [\n            {\n                'content': 'Nested quote',\n                'items': [{'content': 'Inner quote', 'items': []}]\n            },\n            {'content': 'Back to outer', 'items': []}\n        ],\n        'start_line': 5,\n        'end_line': 7\n    }\n]\n```\n\n### Definition Lists\n\n```python\ndef_lists = '''\nTerm\n: Definition 1\n: Definition 2\n'''\n\nmd = Markdown(def_lists)\nprint(md.md_list)\n```\n\nOutput:\n\n```python\n[\n    {\n        'def_list': {'term': 'Term', 'list': ['Definition 1', 'Definition 2']},\n        'start_line': 2,\n        'end_line': 4\n    }\n]\n```\n\n## Limitations\n\n- Some extended markdown flavors might not be supported\n- Inline formatting (bold, italic, links) is currently not parsed\n- Table alignment specifications are not preserved\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request or open an issue.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flennartpollvogt%2Fmarkdown-to-data","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flennartpollvogt%2Fmarkdown-to-data","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flennartpollvogt%2Fmarkdown-to-data/lists"}