Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/m-elbably/wikinmongo
Simple node.js script to import Wikipedia XML dump into MongoDB database.
https://github.com/m-elbably/wikinmongo
mongodb nodejs
Last synced: 9 days ago
JSON representation
Simple node.js script to import Wikipedia XML dump into MongoDB database.
- Host: GitHub
- URL: https://github.com/m-elbably/wikinmongo
- Owner: m-elbably
- License: other
- Created: 2013-10-02T14:22:59.000Z (about 11 years ago)
- Default Branch: master
- Last Pushed: 2020-09-04T03:57:00.000Z (about 4 years ago)
- Last Synced: 2023-08-04T10:54:36.938Z (over 1 year ago)
- Topics: mongodb, nodejs
- Language: JavaScript
- Homepage: https://m-elbably.github.io/wikinmongo
- Size: 601 KB
- Stars: 3
- Watchers: 4
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
WikinMogo 0.1.0
==============Overview
--------This is just a simple node.js script to import Wikipedia XML dump into MongoDB database.
Environment
-----------* GNU/Linux, Windows, Mac
* Node.js `0.10.18` +
* Node.js Modules
* xml-object-stream : `0.2.0` +
* mongodb: `1.3.19` +
* cli-color: `0.2.3` + (can be remove and use only stdout)
* MongoDB `2.2` +Data Source
-----------Wikipedia XML dump file (uncompressed)
http://dumps.wikimedia.orgPage Document Structure
-----------------------
````js
{
title: string,
ns: string,
id: number,
revision: {
id: number,
parentid: number,
timestamp: date,
contributor: {
username: string,
id: number,
ip: string
},
comment: string,
text: string,
sha1: string,
model: string,
format: string
}
}
````Usage
-----node app.js `db` `dump` `drop`
Arguments:
````
db: MongoDB database
dump: Wikipedia dump XML file (uncomressed)
drop: Drop pages collection (if exists) before insterting new documents
````
Example:
`node app.js 'mongodb://localhost:27017/wiki' '/media/Data/enwiki.xml' drop`Notes
-----* Importing full Wikipedia dump will take a while (several hours, 3:30 to import 2012 dump on quad core machine)
* Importing full Wikipedia dump will require upto 40 GB of storage space
* index.coffee requires compile with the following coffee -c index.coffee before use.License
-------This project is BSD (2 clause) licensed.