https://github.com/pymorphy2-fork/dawg-python
Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension. Fork of https://github.com/pytries/DAWG-Python
https://github.com/pymorphy2-fork/dawg-python
dawg
Last synced: 8 months ago
JSON representation
Pure-python reader for DAWGs created by dawgdic C++ library or DAWG Python extension. Fork of https://github.com/pytries/DAWG-Python
- Host: GitHub
- URL: https://github.com/pymorphy2-fork/dawg-python
- Owner: pymorphy2-fork
- License: mit
- Created: 2023-09-03T19:26:18.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2025-04-04T06:35:46.000Z (9 months ago)
- Last Synced: 2025-04-04T07:31:10.066Z (9 months ago)
- Topics: dawg
- Language: Python
- Homepage:
- Size: 3.99 MB
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGES.md
- License: LICENSE
Awesome Lists containing this project
README
# DAWG2-Python
[](https://github.com/pymorphy2-fork/DAWG-Python/actions/workflows/python-tests.yml)
[](https://coveralls.io/github/pymorphy2-fork/DAWG-Python?branch=master)
This pure-python package provides read-only access for files created by
[dawgdic][1] C++ library and
[DAWG][2] python package.
This package is not capable of creating DAWGs. It works with DAWGs built
by [dawgdic][1] C++ library or
[DAWG][2] Python extension module. The main
purpose of DAWG-Python is to provide access to DAWGs without
requiring compiled extensions. It is also quite fast under PyPy (see
benchmarks).
# Installation
```commandline
pip install DAWG2-Python
```
# Usage
The aim of DAWG2-Python is to be API- and binary-compatible with
[DAWG][2] when it is possible.
First, you have to create a dawg using
[DAWG][2] module:
```python
import dawg
d = dawg.DAWG(data)
d.save('words.dawg')
```
And then this dawg can be loaded without requiring C extensions:
```python
import dawg_python
d = dawg_python.DAWG().load('words.dawg')
```
Please consult [DAWG][2] docs for detailed
usage. Some features (like constructor parameters or `save` method) are
intentionally unsupported.
# Benchmarks
Benchmark results (100k unicode words, integer values (lengths of the
words), PyPy 1.9, macbook air i5 1.8 Ghz):
dict __getitem__ (hits): 11.090M ops/sec
DAWG __getitem__ (hits): not supported
BytesDAWG __getitem__ (hits): 0.493M ops/sec
RecordDAWG __getitem__ (hits): 0.376M ops/sec
dict get() (hits): 10.127M ops/sec
DAWG get() (hits): not supported
BytesDAWG get() (hits): 0.481M ops/sec
RecordDAWG get() (hits): 0.402M ops/sec
dict get() (misses): 14.885M ops/sec
DAWG get() (misses): not supported
BytesDAWG get() (misses): 1.259M ops/sec
RecordDAWG get() (misses): 1.337M ops/sec
dict __contains__ (hits): 11.100M ops/sec
DAWG __contains__ (hits): 1.317M ops/sec
BytesDAWG __contains__ (hits): 1.107M ops/sec
RecordDAWG __contains__ (hits): 1.095M ops/sec
dict __contains__ (misses): 10.567M ops/sec
DAWG __contains__ (misses): 1.902M ops/sec
BytesDAWG __contains__ (misses): 1.873M ops/sec
RecordDAWG __contains__ (misses): 1.862M ops/sec
dict items(): 44.401 ops/sec
DAWG items(): not supported
BytesDAWG items(): 3.226 ops/sec
RecordDAWG items(): 2.987 ops/sec
dict keys(): 426.250 ops/sec
DAWG keys(): not supported
BytesDAWG keys(): 6.050 ops/sec
RecordDAWG keys(): 6.363 ops/sec
DAWG.prefixes (hits): 0.756M ops/sec
DAWG.prefixes (mixed): 1.965M ops/sec
DAWG.prefixes (misses): 1.773M ops/sec
RecordDAWG.keys(prefix="xxx"), avg_len(res)==415: 1.429K ops/sec
RecordDAWG.keys(prefix="xxxxx"), avg_len(res)==17: 36.994K ops/sec
RecordDAWG.keys(prefix="xxxxxxxx"), avg_len(res)==3: 121.897K ops/sec
RecordDAWG.keys(prefix="xxxxx..xx"), avg_len(res)==1.4: 265.015K ops/sec
RecordDAWG.keys(prefix="xxx"), NON_EXISTING: 2450.898K ops/sec
Under CPython expect it to be about 50x slower. Memory consumption of
DAWG-Python should be the same as of
[DAWG][2].
# Current limitations
- This package is not capable of creating DAWGs;
- all the limitations of [DAWG][2] apply.
Contributions are welcome!
# Contributing
- Development happens at GitHub:
- Issue tracker:
Feel free to submit ideas, bugs or pull requests.
## Running tests and benchmarks
Make sure [pytest][3] is installed and run
```commandline
$ pytest .
```
from the source checkout. Tests should pass under python 3.8, 3.9, 3.10, 3.11 and PyPy3 \>= 7.3.
In order to run benchmarks, type
```commandline
$ pypy3 -m bench.speed
```
This runs benchmarks under PyPy (they are about 50x slower under
CPython).
## Authors & Contributors
- Mikhail Korobov \
- [@bt2901](https://github.com/bt2901)
- [@insolor](https://github.com/insolor)
The algorithms are from [dawgdic][1]
C++ library by Susumu Yata & contributors.
# License
This package is licensed under MIT License.
[1]: https://code.google.com/p/dawgdic/
[2]: https://github.com/pymorphy2-fork/DAWG
[3]: https://docs.pytest.org/en/7.4.x/getting-started.html