https://github.com/bxt/wptemplates
Collect template informations from MediaWiki markup
https://github.com/bxt/wptemplates
Last synced: 8 months ago
JSON representation
Collect template informations from MediaWiki markup
- Host: GitHub
- URL: https://github.com/bxt/wptemplates
- Owner: bxt
- License: mit
- Created: 2013-03-19T13:51:19.000Z (over 13 years ago)
- Default Branch: master
- Last Pushed: 2016-10-21T05:15:50.000Z (over 9 years ago)
- Last Synced: 2025-10-14T03:33:58.319Z (8 months ago)
- Language: Ruby
- Size: 77.1 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.txt
Awesome Lists containing this project
README
# Wptemplates
[](https://travis-ci.org/bxt/wptemplates)
[](http://badge.fury.io/rb/wptemplates)
Gem for collecting template informations from mediawiki markup.
It will help you to extract useful machine-readable data from
wikipedia articles, since there ist a lot of useful stuff
encoded as templates.
Currently only templates and links are parsed, all other markup is ignored.
## Installation
Add this line to your application's Gemfile:
gem 'wptemplates'
And then execute:
$ bundle
Or install it yourself as:
$ gem install wptemplates
## Usage
To parse a piece of markup simply call:
ast = Wptemplates.parse("{{foo | bar | x = 3 }} baz [[bam (2003)|]]y")
You will get an instance of Wptemplates::Soup which is an array of
Wptemplates::Template, Wptemplates::Link and Wptemplates::Text.
You can explore the AST with these methods:
ast.templates.is_a?(Array) && ast.templates.length # => 1
ast.text # => " baz bamy"
To find template data:
ast[0].name # => :foo
ast[0].params[0].text # => " bar "
ast[0].params[:x].text # => "3"
ast.all_templates_of(:foo).map{|t| t.params[:x].text} # => ["3"]
ast.navigate(:foo, :x) {|p|p.text} # => "3"
ast.navigate(:foo, :y) {|p|p.text} # => nil
You can access the links via:
ast.links.length # => 1
ast.links[0].text # => "bamy"
ast.all_links.map{|l| l.link} # => ["Bam (2003)"]
## Developing
Here's some useful info if you want to improve/customize this gem.
### Getting Started
Checkout the project, run `bundle` and then `rake` to see if the tests
pass. Run `rake -T` to see the rake tasks.
### Markup
MediaWiki markup is not trivial to parse and there might always
be compatibility issues. There's a useful help page about
[templates][tmplh] and a [markup spec][mspec]. For links there
is a page about [links][linkh] and about the [pipe trick][ptrkh].
Also, there is a page with [link's BNF][lnbnf].
### Known Issues
* If you have images in your templates the pipes cause a new parameter
* Namespaced links are not recognized
* Templates in links are not recognized
* Links contents are not htmldecoded
* nowiki, pre and math blocks might cause problems
## Contributing
1. Fork it
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create new Pull Request
[tmplh]: http://en.wikipedia.org/wiki/Help:Template#Usage_syntax "English Wikipedia Template help page, syntax section"
[mspec]: http://www.mediawiki.org/wiki/Markup_spec "MediaWiki Markup spec"
[linkh]: http://en.wikipedia.org/wiki/Help:Link "English Wikipedia Link help page"
[ptrkh]: http://en.wikipedia.org/wiki/Help:Pipe_trick "English Wikipedia Pipe trick help page"
[lnbnf]: http://www.mediawiki.org/wiki/Markup_spec/BNF/Links "MediaWiki Link BNF"