Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/cxuesong/mwparserfromscratch
A basic .NET Library for parsing wikitext into AST.
https://github.com/cxuesong/mwparserfromscratch
ast mediawiki parsing wikitext
Last synced: about 1 month ago
JSON representation
A basic .NET Library for parsing wikitext into AST.
- Host: GitHub
- URL: https://github.com/cxuesong/mwparserfromscratch
- Owner: CXuesong
- License: apache-2.0
- Created: 2016-10-12T13:22:54.000Z (about 8 years ago)
- Default Branch: master
- Last Pushed: 2024-08-11T05:58:14.000Z (4 months ago)
- Last Synced: 2024-10-30T06:27:39.098Z (about 2 months ago)
- Topics: ast, mediawiki, parsing, wikitext
- Language: C#
- Homepage:
- Size: 319 KB
- Stars: 18
- Watchers: 5
- Forks: 5
- Open Issues: 5
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[CXuesong.MW.MwParserFromScratch](https://www.nuget.org/packages/CXuesong.MW.MwParserFromScratch) | ![CXuesong.MW.MwParserFromScratch](https://img.shields.io/nuget/vpre/CXuesong.MW.MwParserFromScratch?style=flat-square) ![NuGet version (CXuesong.MW.WikiClientLibrary)](https://img.shields.io/nuget/dt/CXuesong.MW.MwParserFromScratch.svg?style=flat-square)
# MwParserFromScratch
A .NET Library for parsing wikitext into AST. The repository is still under development, but it can already handle most part of wikitext.
[FuGet Gallery](https://www.fuget.org/packages/CXuesong.MW.MwParserFromScratch/0.3.0-int.6): See library classes and API documentation.
## Usage
This package is now on NuGet. You may install the package using one of the following commands
```
# Package Management Console
Install-Package CXuesong.MW.MwParserFromScratch -Pre
# .NET CLI
dotnet add package CXuesong.MW.MwParserFromScratch -v 3.0.0-int.6
```After adding reference to this library, import the namespaces
```c#
using MwParserFromScratch;
using MwParserFromScratch.Nodes;
```Then just pass the text to the parser
```c#
var parser = new WikitextParser();
var text = "Paragraph.\n* Item1\n* Item2\n";
var ast = parser.Parse(text);
```Now `ast` contains the `Wikitext` instance, the root of AST.
You can also take a look at `ConsoleTestApplication1`, where there're some demos. `SimpleDemo` illustrates how to search and replace in the AST.
```c#
static void SimpleDemo()
{
// Fills the missing template parameters.
var parser = new WikitextParser();
var templateNames = new [] {"Expand section", "Cleanup"};
var text = @"==Hello==
{{Expand section|
date=2010-10-05
}}
{{Cleanup}}
This is a nice '''paragraph'''.
==References==
{{Reflist}}
";
var ast = parser.Parse(text);
// Convert the code snippets to nodes
var dateName = parser.Parse("date");
var dateValue = parser.Parse(DateTime.Now.ToString("yyyy-MM-dd"));
Console.WriteLine("Issues:");
// Search and set
foreach (var t in ast.EnumDescendants().OfType()
.Where(t => templateNames.Contains(MwParserUtility.NormalizeTemplateArgumentName(t.Name))))
{
// Get the argument by name.
var date = t.Arguments["date"];
if (date != null)
{
// To print the wikitext instead of user-friendly text, use ToString()
Console.WriteLine("{0} ({1})", t.Name.ToPlainText(), date.Value.ToPlainText());
}
// Update/Add the argument
t.Arguments.SetValue(dateName, dateValue);
}
Console.WriteLine();
Console.WriteLine("Wikitext:");
Console.WriteLine(ast.ToString());
}
```The console output is as follows
```wiki
Issues:
Expand section (2010-10-05)Wikitext:
==Hello==
{{Expand section|
date=2017-02-26}}
{{Cleanup|date=2017-02-26}}
This is a nice '''paragraph'''.
==References==
{{Reflist}}
````ParseAndPrint` can roughly print out the parsed tree. Here's a runtime example
```
Please input the wikitext to parse, use EOF (Ctrl+Z) to accept:
==Hello==
* ''Item1''
* [[Item2]]
---------
test
^Z
Parsed AST
Wikitext [==Hello==\r\n* ''Item1]
.Paragraph [==Hello==\r]
..PlainText [==Hello==\r]
.ListItem [* ''Item1''\r]
..PlainText [ ]
..FormatSwitch ['']
..PlainText [Item1]
..FormatSwitch ['']
..PlainText [\r]
.ListItem [* [[Item2]]\r]
..PlainText [ ]
..WikiLink [[[Item2]]]
...Run [Item2]
....PlainText [Item2]
..PlainText [\r]
.ListItem [---------\r]
..PlainText [\r]
.Paragraph [
{
{"format", "json"},
{"action", "query"},
{"prop", "revisions"},
{"rvlimit", "1"},
{"rvprop", "content"},
{"titles", title}
};
var response = client.PostAsync(EndPointUrl, new FormUrlEncodedContent(requestContent)).Result;
var root = JObject.Parse(response.Content.ReadAsStringAsync().Result);
var content = (string) root["query"]["pages"].Children().First().Value["revisions"][0]["*"];
var parser = new WikitextParser();
return parser.Parse(content);
}
```You may need `Newtonsoft.Json` NuGet package to parse JSON.
## Limitations
* For now it does not support table syntax, but I'll work on this.
* Text inside parser tags (rather than normal HTML tags) will not be parsed an will be preserved in `ParserTag.Content`. For certain parser tags (e.g. ``), You can parse the `Content` again to get the AST.
* It may handle some pathological cases differently from MediaWiki parser. E.g. `{{{{{arg}}` (See Issue #1).