Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/cxuesong/mwparserfromscratch

A basic .NET Library for parsing wikitext into AST.
https://github.com/cxuesong/mwparserfromscratch

ast mediawiki parsing wikitext

Last synced: about 1 month ago
JSON representation

A basic .NET Library for parsing wikitext into AST.

Awesome Lists containing this project

README

        

[CXuesong.MW.MwParserFromScratch](https://www.nuget.org/packages/CXuesong.MW.MwParserFromScratch) | ![CXuesong.MW.MwParserFromScratch](https://img.shields.io/nuget/vpre/CXuesong.MW.MwParserFromScratch?style=flat-square) ![NuGet version (CXuesong.MW.WikiClientLibrary)](https://img.shields.io/nuget/dt/CXuesong.MW.MwParserFromScratch.svg?style=flat-square)

# MwParserFromScratch

A .NET Library for parsing wikitext into AST. The repository is still under development, but it can already handle most part of wikitext.

[FuGet Gallery](https://www.fuget.org/packages/CXuesong.MW.MwParserFromScratch/0.3.0-int.6): See library classes and API documentation.

## Usage

This package is now on NuGet. You may install the package using one of the following commands

```
# Package Management Console
Install-Package CXuesong.MW.MwParserFromScratch -Pre
# .NET CLI
dotnet add package CXuesong.MW.MwParserFromScratch -v 3.0.0-int.6
```

After adding reference to this library, import the namespaces

```c#
using MwParserFromScratch;
using MwParserFromScratch.Nodes;
```

Then just pass the text to the parser

```c#
var parser = new WikitextParser();
var text = "Paragraph.\n* Item1\n* Item2\n";
var ast = parser.Parse(text);
```

Now `ast` contains the `Wikitext` instance, the root of AST.

You can also take a look at `ConsoleTestApplication1`, where there're some demos. `SimpleDemo` illustrates how to search and replace in the AST.

```c#
static void SimpleDemo()
{
// Fills the missing template parameters.
var parser = new WikitextParser();
var templateNames = new [] {"Expand section", "Cleanup"};
var text = @"==Hello==
{{Expand section|
date=2010-10-05
}}
{{Cleanup}}
This is a nice '''paragraph'''.
==References==
{{Reflist}}
";
var ast = parser.Parse(text);
// Convert the code snippets to nodes
var dateName = parser.Parse("date");
var dateValue = parser.Parse(DateTime.Now.ToString("yyyy-MM-dd"));
Console.WriteLine("Issues:");
// Search and set
foreach (var t in ast.EnumDescendants().OfType()
.Where(t => templateNames.Contains(MwParserUtility.NormalizeTemplateArgumentName(t.Name))))
{
// Get the argument by name.
var date = t.Arguments["date"];
if (date != null)
{
// To print the wikitext instead of user-friendly text, use ToString()
Console.WriteLine("{0} ({1})", t.Name.ToPlainText(), date.Value.ToPlainText());
}
// Update/Add the argument
t.Arguments.SetValue(dateName, dateValue);
}
Console.WriteLine();
Console.WriteLine("Wikitext:");
Console.WriteLine(ast.ToString());
}
```

The console output is as follows

```wiki
Issues:
Expand section (2010-10-05)

Wikitext:
==Hello==
{{Expand section|
date=2017-02-26}}
{{Cleanup|date=2017-02-26}}
This is a nice '''paragraph'''.
==References==
{{Reflist}}
```

`ParseAndPrint` can roughly print out the parsed tree. Here's a runtime example

```
Please input the wikitext to parse, use EOF (Ctrl+Z) to accept:
==Hello==
* ''Item1''
* [[Item2]]
---------
test
^Z
Parsed AST
Wikitext [==Hello==\r\n* ''Item1]
.Paragraph [==Hello==\r]
..PlainText [==Hello==\r]
.ListItem [* ''Item1''\r]
..PlainText [ ]
..FormatSwitch ['']
..PlainText [Item1]
..FormatSwitch ['']
..PlainText [\r]
.ListItem [* [[Item2]]\r]
..PlainText [ ]
..WikiLink [[[Item2]]]
...Run [Item2]
....PlainText [Item2]
..PlainText [\r]
.ListItem [---------\r]
..PlainText [\r]
.Paragraph [
{
{"format", "json"},
{"action", "query"},
{"prop", "revisions"},
{"rvlimit", "1"},
{"rvprop", "content"},
{"titles", title}
};
var response = client.PostAsync(EndPointUrl, new FormUrlEncodedContent(requestContent)).Result;
var root = JObject.Parse(response.Content.ReadAsStringAsync().Result);
var content = (string) root["query"]["pages"].Children().First().Value["revisions"][0]["*"];
var parser = new WikitextParser();
return parser.Parse(content);
}
```

You may need `Newtonsoft.Json` NuGet package to parse JSON.

## Limitations

* For now it does not support table syntax, but I'll work on this.
* Text inside parser tags (rather than normal HTML tags) will not be parsed an will be preserved in `ParserTag.Content`. For certain parser tags (e.g. ``), You can parse the `Content` again to get the AST.
* It may handle some pathological cases differently from MediaWiki parser. E.g. `{{{{{arg}}` (See Issue #1).