https://github.com/rdmpage/wikicite-search
Search engine for publications in Wikidata
https://github.com/rdmpage/wikicite-search
Last synced: 11 months ago
JSON representation
Search engine for publications in Wikidata
- Host: GitHub
- URL: https://github.com/rdmpage/wikicite-search
- Owner: rdmpage
- Created: 2021-05-17T15:30:49.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2023-01-04T16:10:09.000Z (over 3 years ago)
- Last Synced: 2025-03-10T18:48:06.445Z (about 1 year ago)
- Language: JavaScript
- Size: 735 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# WikiCite Search
A search engine for publications in Wikidata. Rather than directly search Wikidata we download a subset of items corresponding to publications (for example, those relevant to taxonomy), convert to a simple JSON format (CSL) then add to an Elasticsearch index. We support basic search by metadata, and also a reconciliation service.
Goal is to have a simple search interface that (a) finds an article if we have it, and (b) display links to full text if available.
## API
### Search
### Search by page
### Reconciliation API
## Searching
### By [journal, volume, page] tuple
To locate an article by [journal, volume, page] tuple (e.g., `Int. J. Primatol., 21: 934.` we need to index the start and end pages of an article as integers so that we can do a range query. Hence when creating the index
```
"startpage": {
"type": "integer"
},
"endpage": {
"type": "integer"
}
```
Remaining challenge is mapping abbreviated journal names to actual journals. To help this I add ISO 4 abbreviations to journals where ever possible.
## Elasticsearch
Local server for development and debugging set up using Docker.
Remote set up on Bitnami. Note that following [Bitnami’s instructions](https://docs.bitnami.com/google/apps/elasticsearch/administration/connect-remotely/) we need to do this:
```
network.host: Specify the hostname or IP address where the server will be accessible. Set it to 0.0.0.0 to listen on every interface.
```
To do this launch the SSH console in Bitnami, then:
```
sudo nano /opt/bitnami/elasticsearch/config/elasticsearch.yml
```
and edit
```
network:
host: 127.0.0.1
```
Then we make sure to open the port using the [Google Console](https://docs.bitnami.com/google/faq/administration/use-firewall/). For direct connection to Elasticsearch (not recommended) open 9200, for connection via Apache with basic authentication open port 80 (note that this means you will need to add ELASTIC_USERNAME and ELASTIC_PASSWORD to the Heroku Config Vars. To set up Apache follow the instructions at [Add Basic Authentication And TLS Using Apache](https://docs.bitnami.com/google/apps/elasticsearch/administration/add-basic-auth-and-tls/).
## Icons

[prettycons on Flaticon](https://www.flaticon.com/free-icon/file_1126709?term=pdf&page=1&position=68&page=1&position=68&related_id=1126709&origin=search)