https://github.com/matijapiskorec/dict

Blazingly fast full-text Wiktionary search in command line
https://github.com/matijapiskorec/dict

Last synced: 16 days ago
JSON representation

Blazingly fast full-text Wiktionary search in command line

Host: GitHub
URL: https://github.com/matijapiskorec/dict
Owner: matijapiskorec
License: mit
Created: 2020-03-21T17:41:57.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2020-03-21T18:47:13.000Z (about 5 years ago)
Last Synced: 2024-11-01T18:37:45.093Z (6 months ago)
Language: Shell
Size: 20.3 MB
Stars: 3
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-blazingly-fast - dict - Blazingly fast full-text Wiktionary search in command line (Shell)

README

# dict

Blazingly fast full-text English Wiktionary search in command line. Includes word type and basic definitions of over 781K word meanings in English Wiktionary.

![dict screencast](https://raw.githubusercontent.com/matijapiskorec/dict/master/image/dict-tty-screencast.gif)

Run in command line (make sure it is executable with `chmod +x dict`):
```
dict
```

Use `Tab` to select multiple entries for output. By default the search is from the beginning, you can delete the prepended `^` to search anywhere in the text.

## Prerequisites

Make sure you have `gzip` and `fzf` installed. In Arch Linux you install them with:
```
sudo pacman -S gzip fzf
```

For retrieving and parsing English Wiktionary data you additionally need `wget`, `bzip2` and `perl`. In Arch Linux you install them with:
```
sudo pacman -S wget bzip2 perl
```

## Retrieving Wiktionary data

You can retrieve original Wiktionary database dump from the following [link](https://ftp.acc.umu.se/mirror/wikimedia.org/dumps/enwiktionary/). To download an exact version with `wget` (cca 705 MB):
```
wget https://ftp.acc.umu.se/mirror/wikimedia.org/dumps/enwiktionary/20200301/enwiktionary-20200301-pages-articles.xml.bz2
```

You can also preview just the first few 5 MB of the file online using `wget`, `bzcat` and `vim`:
```
wget -qO- https://ftp.acc.umu.se/mirror/wikimedia.org/dumps/enwiktionary/20200301/enwiktionary-20200301-pages-articles.xml.bz2 | bzcat | head --bytes=5M | vim -
```

Once you downloaded the database dump you can decompress it with `bzip2`:
```
bzip2 -d enwiktionary-20200301-pages-articles.xml.bz2
```

Or you can use `bzcat` to preview first 5 MB in `vim`:
```
bzcat enwiktionary-20200301-pages-articles.xml.bz2 | head --bytes=5MB | vim -
```

## Parsing Wiktionary data

Script `wiktionary-parse` in `parse/` folder parses the raw English Wiktionary data. Once you download the data with the instructions above you can parse it with (make sure it is executable with `chmod +x wiktionary-parse`):
```
data/wiktionary-parse enwiktionary-20200301-pages-articles.xml.bz2 > enwiktionary-20200301.txt
```

As `dict` works with compressed text file you have to compress the final output with `gzip`:
```
gzip -c enwiktionary-20200301.txt > enwiktionary-20200301.txt.gz
```

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/matijapiskorec/dict

Awesome Lists containing this project

README