https://github.com/wiki-connect/parsewiki
A library that helps parse wikitext data
https://github.com/wiki-connect/parsewiki
mediawiki parser wikipedia wikitext wikitext-parser
Last synced: 4 months ago
JSON representation
A library that helps parse wikitext data
- Host: GitHub
- URL: https://github.com/wiki-connect/parsewiki
- Owner: wiki-connect
- License: gpl-3.0
- Created: 2023-10-23T15:23:10.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2023-12-18T23:06:47.000Z (over 2 years ago)
- Last Synced: 2025-04-05T22:45:01.733Z (about 1 year ago)
- Topics: mediawiki, parser, wikipedia, wikitext, wikitext-parser
- Language: PHP
- Homepage:
- Size: 28.3 KB
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# WikiConnect ParseWiki
A powerful PHP library for parsing MediaWiki-style content from raw wiki text.
---
## ๐ Overview
This library allows you to extract:
- Templates (single, multiple, nested)
- Internal wiki links
- External links
- Citations (references)
- Categories (with or without display text)
Perfect for handling wiki-formatted text in PHP projects.
---
## ๐๏ธ Project Structure
- `ParserTemplates`: Parses multiple templates.
- `ParserTemplate`: Parses a single template.
- `ParserInternalLinks`: Parses internal wiki links.
- `ParserExternalLinks`: Parses external links.
- `ParserCitations`: Parses citations and references.
- `ParserCategories`: Parses categories from wiki text.
- `DataModel` classes:
- `Attribute`
- `Citation`
- `ExternalLink`
- `InternalLink`
- `Parameters`
- `Template`
- `tests/`: Contains PHPUnit test files:
- `ParserCategoriesTest`
- `ParserCitationsTest`
- `ParserExternalLinksTest`
- `ParserInternalLinksTest`
- `ParserTemplatesTest`
- `ParserTemplateTest`
- `DataModel` tests:
- `AttributeTest`
- `ParametersTest`
- `TemplateTest`
---
## ๐ Features
- โ
Parse single and multiple templates.
- โ
Support nested templates.
- โ
Handle named and unnamed template parameters.
- โ
Extract internal links with or without display text.
- โ
Extract external links with or without labels.
- โ
Parse citations including attributes and special characters.
- โ
Parse categories, support custom namespaces, handle whitespaces and special characters.
- โ
Full PHPUnit test coverage.
---
## ๐งฉ Wikitext Features Support
| Feature | Read โ
| Modify โ๏ธ | Replace ๐ |
|--------------------------- |---------|------------|------------|
| **Templates** | โ
Yes | โ
Yes | โ
Yes |
| **Parameters** | โ
Yes | โ
Yes | โ
Yes |
| **Citations** | โ
Yes | โ
Yes | โ
Yes |
| **Citations>Attributes** | โ
Yes | โ
Yes | โ
Yes |
| **Internal Links** | โ
Yes | | |
| **External Links** | โ
Yes | | |
| **Categories** | โ
Yes | | |
| **HTML Tags** | | | |
| **Parser Functions** | | | |
| **Tables** | | | |
| **Sections** | | | |
| **Magic Words** | | | |
> ๐ก **Note:** Some features are partially supported or under development. Contributions are welcome!
---
## โ๏ธ Requirements
- PHP 8.0 or higher
- PHPUnit 9 or higher
---
## ๐ป Installation
```bash
composer require wiki-connect/parsewiki
```
Make sure you have proper PSR-4 autoloading for the `WikiConnect\ParseWiki` namespace.
---
## ๐งช Running Tests
```bash
vendor/bin/phpunit tests
```
### Test Coverage:
- **Templates:** Single, multiple, nested, named/unnamed parameters.
- **Internal Links:** Simple, with display text, special characters.
- **External Links:** With/without labels, multiple links, whitespace handling.
- **Citations:** With/without attributes, special characters.
- **Categories:** Simple, with display text, custom namespaces, whitespaces, special characters.
---
## โจ Example Usage
### Parsing Templates
```php
use WikiConnect\ParseWiki\ParserTemplates;
$text = '{{Infobox person|name=John Doe|birth_date=1990-01-01}}';
$parser = new ParserTemplates($text);
$templates = $parser->getTemplates();
foreach ($templates as $template) {
echo $template->getName();
print_r($template->getParameters());
}
```
### Parsing and Editing a single Template
```php
use WikiConnect\ParseWiki\ParserTemplate;
$text = '{{Infobox_Person|name=John Doe|birth_date=1990-01-01}}';
$parser = new ParserTemplate($text);
$template = $parser->getTemplate();
// Edit the template
$template->setName('Infobox person');
$template->parameters->set('birth_place', '[[New York City|New York]]');
$new_template = $template->toString();
echo $new_template; // {{Infobox person|name=John Doe|birth_date=1990-01-01|birth_place=[[New York City|New York]]}
```
### Parsing Internal Links
```php
use WikiConnect\ParseWiki\ParserInternalLinks;
$text = 'See [[Main Page|the main page]] and [[Help]].';
$parser = new ParserInternalLinks($text);
$links = $parser->getTargets();
foreach ($links as $link) {
echo 'Target: ' . $link->getTarget() . PHP_EOL;
echo 'Text: ' . ($link->getText() ?? $link->getTarget()) . PHP_EOL;
}
```
### Parsing External Links
```php
use WikiConnect\ParseWiki\ParserExternalLinks;
$text = 'Visit [https://example.com Example Site] and [https://nolabel.com].';
$parser = new ParserExternalLinks($text);
$links = $parser->getLinks();
foreach ($links as $link) {
echo 'URL: ' . $link->getLink() . PHP_EOL;
echo 'Label: ' . ($link->getText() ?: 'No label') . PHP_EOL;
}
```
### Parsing Citations
```php
use WikiConnect\ParseWiki\ParserCitations;
$text = 'Some text with a citation.This is a citation';
$parser = new ParserCitations($text);
$citations = $parser->getCitations();
foreach ($citations as $citation) {
echo 'Content: ' . $citation->getContent() . PHP_EOL;
echo 'Attributes: ' . $citation->getAttributes() . PHP_EOL;
}
```
### Parsing Categories
```php
use WikiConnect\ParseWiki\ParserCategories;
$text = 'Some text [[Category:Science]] and [[Category:Math|Displayed]].';
$parser = new ParserCategories($text);
$categories = $parser->getCategories();
foreach ($categories as $category) {
echo 'Category: ' . $category . PHP_EOL;
}
```
---
## ๐ Author
Developed with โค๏ธ by Gerges.