{"id":15172793,"url":"https://github.com/serlo/mediawiki-parser","last_synced_at":"2025-10-26T04:30:29.193Z","repository":{"id":57637713,"uuid":"111007934","full_name":"serlo/mediawiki-parser","owner":"serlo","description":"This project aims to develop a parser for mediawiki markdown on the basis of Parsing Expression Grammars. ","archived":false,"fork":false,"pushed_at":"2021-08-23T19:30:49.000Z","size":182,"stargazers_count":13,"open_issues_count":2,"forks_count":2,"subscribers_count":8,"default_branch":"main","last_synced_at":"2024-09-28T10:04:29.031Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/serlo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-11-16T18:33:56.000Z","updated_at":"2024-02-26T01:44:18.000Z","dependencies_parsed_at":"2022-09-02T03:23:04.690Z","dependency_job_id":null,"html_url":"https://github.com/serlo/mediawiki-parser","commit_stats":null,"previous_names":["vroland/mediawiki-parser"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serlo%2Fmediawiki-parser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serlo%2Fmediawiki-parser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serlo%2Fmediawiki-parser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/serlo%2Fmediawiki-parser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/serlo","download_url":"https://codeload.github.com/serlo/mediawiki-parser/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":219864036,"owners_count":16555943,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-09-27T10:04:29.119Z","updated_at":"2025-10-26T04:30:28.811Z","avatar_url":"https://github.com/serlo.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# mediawiki-parser\nThis project aims to develop a parser for a subset of mediawiki markdown on the basis of Parsing Expression Grammars. \nIt currently features a generated parser and test generation from a specification document. A simple binary to read from a file and write yaml to stdout is provided.\n\n## Disclaimer\n\nThe goal of mediawiki-parser is *not* full compatibility with MediaWiki and all of it's quirks. It is intended to be used if rejecting exotic or malformed input is fine. \nThe markup supported is currently largely oriented towards the need of a specific MediaWiki Project and will likely not change drastically without external contributions. \n\nIf you want to parse any MediaWiki with all its weirdness, take a look at [Parse Wiki Text](https://github.com/portstrom/parse_wiki_text) instead.\n\n## Currently supported MediaWiki:\n\n* Text formatting: `''italic'', '''bold''', \u003cmath\u003e\\LaTex\u003c/math\u003e, \u003ccode\u003e\u003c/code\u003e, ...`\n* Paragraphs\n* Heading hierarchies\n* Lists\n* Internal references (files) `[[File.ext|option|caption]]`\n* External references `[https://example.com/ example]`\n* Tables\n* Generic templates `{{name|anon_arg|arg=value}}`\n* Galleries\n* Generic html tags and comments `\u003cthing\u003econtent\u003c/thing\u003e`\n\n## Known Limitations\n\nThis project has some known limitations, which might or might not be lifted in the future. \nPart of this comes from treating WikiText as a context-free formal language, which is not entrierly true.\n\n* `{,},[,]`  cannot be used in plain text, as they normally indicate special syntax. However, using them in math or `\u003cnowiki\u003e` is fine.\n* Indentation is currently not parsed as `pre`.\n* Templates are only pared on a syntactical level, they have no effects on their content whatsoever.\n\n\n## Example\n\nParsing will result in either a syntax tree with position information (mostly omitted here for conciseness):\n\nInput:\n``` markdown\nthis is some ''formatted'' [https://example.com example] text.\n```\nOutput (as pseudo-YAML):\n``` yaml\n---\ntype: document\nposition: ...\ncontent:\n  - type: paragraph\n    position: ...\n    content:\n      - type: text\n        position: ...\n        text: \"this is some \"\n      - type: formatted\n        position: ...\n        markup: italic\n        content:\n          - type: text\n            position:\n              start:\n                offset: 15\n                line: 1\n                col: 16\n              end:\n                offset: 24\n                line: 1\n                col: 25\n            text: formatted\n      - type: text\n        position: ...\n        text: \" \"\n      - type: externalreference\n        position: ...\n        target: \"https://example.com\"\n        caption:\n          - type: text\n            position: ...\n            text: example\n      - type: text\n        position: ...\n        text: \" text.\"\n```\n\nOr a syntax error (here is a pretty representation):\n```\nERROR in line 1 at column 57: Could not continue to parse, expected one of: ''', [, \u003c!--, '', [[, EOF, \"\\n\", {{, [ \t], opening html tag, \u003c, normal text\n1 | this is some ''formatted'' [https://example.com example]] text.\n2 |\n``` \n\n## API\n\nThe library provides a straight forward `parse()` function:\n\n```rust\nlet input = \"Hello World\";\nlet result = mediawiki_parser::parse(\u0026input)\n    .expect(\\\"Parsing of the input for {} failed!\\\");\nprintln!(\\\"{{}}\\\", \u0026serde_yaml::to_string(\u0026result).unwrap());\n```\n\nThe result is a custom abstract syntax tree (AST). See the documentation for details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserlo%2Fmediawiki-parser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fserlo%2Fmediawiki-parser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fserlo%2Fmediawiki-parser/lists"}