https://github.com/wbstack/queryservice-updater
A custom updater wrapping the regular Wikidata updater for updating multiple sites in multiple namespaces.
https://github.com/wbstack/queryservice-updater
blazegraph sparql wbstack wdqs wikibase
Last synced: 14 days ago
JSON representation
A custom updater wrapping the regular Wikidata updater for updating multiple sites in multiple namespaces.
- Host: GitHub
- URL: https://github.com/wbstack/queryservice-updater
- Owner: wbstack
- License: gpl-3.0
- Created: 2020-11-22T10:50:42.000Z (over 5 years ago)
- Default Branch: main
- Last Pushed: 2024-02-09T10:34:50.000Z (about 2 years ago)
- Last Synced: 2024-02-09T12:11:42.717Z (about 2 years ago)
- Topics: blazegraph, sparql, wbstack, wdqs, wikibase
- Language: Java
- Homepage:
- Size: 219 KB
- Stars: 0
- Watchers: 9
- Forks: 2
- Open Issues: 10
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.md
Awesome Lists containing this project
README
> ℹ️ Issues for this repository are tracked on [Phabricator](https://phabricator.wikimedia.org/project/board/5563/) - ([Click here to open a new one](https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?tags=wikibase_cloud
))
Try to push this upstream somehow...
https://gerrit.wikimedia.org/r/#/c/wikidata/query/rdf/+/589408/
**The idea of speeding up the updater from version 1**
The idea of this is to avoid the JVM startup etc, and basically have an updater running all the time rather than shelling out which is what the first version glue did.
I also investigated nailgun, but its not really maintained anymore so probably not a good direction to go in.
This can be trialed at https://repl.it/languages/java
Just copy the code from the class into the default Main.java there and run the 2 lines below.
```
javac -classpath .:/run_dir/junit-4.12.jar:target/dependency/* -d . Main.java
java -classpath .:/run_dir/junit-4.12.jar:target/dependency/* org.wikidata.query.rdf.tool.WbStackUpdate --sparqlUrl sparql
```
This updater approach would require minimum changes to query service code and use the WHOLE of the current updater.
Pulls of new code would be easy, the only thing we would need to look out for are changes to the params that are passed into the updater that we manipulate.
But the runUpdater.sh could be altered to take a main class from an ENV var, and voila!
TODO check with wdqs team about if there are any wdqs internals I'll mess up by doing this & get their general thoughts.
## Running locally
1. In IntelliJ IDEA, create a run configuration for `org.wikidata.query.rdf.tool.WbStackUpdate`
2. Set VM Options to `-Xmx64m` for example
3. Set environment values as:
```
WBSTACK_API_ENDPOINT=http://localhost:3030/
WBSTACK_BATCH_SLEEP=0
WBSTACK_LOOP_LIMIT=1000000000
WBSTACK_WIKIBASE_SCHEME=http
```
4. Start docker with `docker-compose up`
5. As everything has initialized you should be able to run the new configuration.
6. Every time the fake api gets polled new items will get inserted into wikibase, and the updater will keep running indefinitely.
7. (Optional) https://visualvm.github.io/ for profiling
## Github Actions Test CI
The test CI is running a wikibase instance that gets populated by the `seeder/` scripts, after some passes of the queryservice-updater, the queryservice is queried for any inserted rows.
When debugging the CI configuration locally you can run
```sh
docker-compose -f docker-compose.yml -f docker-compose.ci.yml up
```
If changes aren't taking effect you can try removing the image to force a rebuild
```sh
docker rmi queryservice-updater_wdqs-updater
```