https://github.com/codelibs/fess-ds-wikipedia
https://github.com/codelibs/fess-ds-wikipedia
Last synced: 3 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/codelibs/fess-ds-wikipedia
- Owner: codelibs
- License: apache-2.0
- Created: 2022-06-02T07:12:59.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2025-03-05T14:09:02.000Z (4 months ago)
- Last Synced: 2025-03-05T15:23:46.856Z (4 months ago)
- Language: Java
- Size: 88.9 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
Wikipedia Data Store for Fess
[](https://github.com/codelibs/fess-ds-wikipedia/actions/workflows/maven.yml)
==========================## Overview
Wikipedia Data Store crawls Wikipedia pages from a dump file.
## Download
See [Maven Repository](https://repo1.maven.org/maven2/org/codelibs/fess/fess-ds-wikipedia/).
## Installation
See [Plugin](https://fess.codelibs.org/14.2/admin/plugin-guide.html) of Administration guide.
### Crawling Setting
```
# Parameter
url=http://download.wikimedia.org/jawiki/latest/jawiki-latest-pages-articles.xml.bz2
limit=10000# Script
lang="ja"
filetype=format
filename=title
url="https://ja.wikipedia.org/wiki/" + encodedTitle
host="ja.wikipedia.org"
site="ja.wikipedia.org"
title=title
content=content
digest=digest
anchor=
content_length=content.length()
last_modified=timestamp
timestamp=timestamp
```