Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/camchenry/wikidump
A Rust library for parsing Mediawiki website XML dumps
https://github.com/camchenry/wikidump
mediawiki rust wikidump wikipedia
Last synced: 15 days ago
JSON representation
A Rust library for parsing Mediawiki website XML dumps
- Host: GitHub
- URL: https://github.com/camchenry/wikidump
- Owner: camchenry
- License: mit
- Created: 2019-09-10T12:15:03.000Z (about 5 years ago)
- Default Branch: master
- Last Pushed: 2024-09-01T03:29:32.000Z (2 months ago)
- Last Synced: 2024-10-10T02:28:53.860Z (26 days ago)
- Topics: mediawiki, rust, wikidump, wikipedia
- Language: Rust
- Homepage:
- Size: 476 KB
- Stars: 5
- Watchers: 3
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# wikidump
This crate processes Mediawiki XML dump files and turns them into easily
consumed pieces of data for language analysis, natural langauge processing,
and other applications.## Example
```rust
let parser = Parser::new()
.use_config(config::wikipedia::english());let site = parser
.parse_file("tests/enwiki-articles-partial.xml")
.expect("Could not parse wikipedia dump file.");assert_eq!(site.name, "Wikipedia");
assert_eq!(site.url, "https://en.wikipedia.org/wiki/Main_Page");
assert!(!site.pages.is_empty());for page in site.pages {
println!("Title: {}", page.title);for revision in page.revisions {
println!("\t{}", revision.text);
}
}
```