https://github.com/gyorilab/adeft
Tool for disambiguating acronyms and abbreviations in text for NLP applications
https://github.com/gyorilab/adeft
acronym-disambiguation
Last synced: 23 days ago
JSON representation
Tool for disambiguating acronyms and abbreviations in text for NLP applications
- Host: GitHub
- URL: https://github.com/gyorilab/adeft
- Owner: gyorilab
- License: bsd-2-clause
- Created: 2018-11-05T20:08:03.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2024-06-12T17:32:23.000Z (11 months ago)
- Last Synced: 2025-04-30T14:27:06.389Z (23 days ago)
- Topics: acronym-disambiguation
- Language: Python
- Homepage:
- Size: 11.1 MB
- Stars: 22
- Watchers: 5
- Forks: 10
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
README
# Adeft
[](https://doi.org/10.21105/joss.01708)
[](https://zenodo.org/badge/latestdoi/156276061)
[](https://opensource.org/licenses/BSD-2-Clause)
[](https://github.com/indralab/adeft/actions/workflows/tests.yml)
[](https://adeft.readthedocs.io/en/latest/?badge=latest)
[](https://badge.fury.io/py/adeft)
[](https://www.python.org/downloads/release/python-357/)Adeft (Acromine based Disambiguation of Entities From Text context) is a
utility for building models to disambiguate acronyms and other abbreviations of
biological terms in the scientific literature. It makes use of an
implementation of the [Acromine](http://www.chokkan.org/research/acromine/)
algorithm developed by the [NaCTeM](http://www.nactem.ac.uk/index.php) at the
University of Manchester to identify possible longform expansions for
shortforms in a text corpus. It allows users to build disambiguation models to
disambiguate shortforms based on their text context. A growing number of
pretrained disambiguation models are publicly available to download through
adeft.#### Citation
If you use Adeft in your research, please cite the paper in the Journal of
Open Source Software:Steppi A, Gyori BM, Bachman JA (2020). Adeft: Acromine-based Disambiguation of
Entities from Text with applications to the biomedical literature. *Journal of
Open Source Software,* 5(45), 1708, https://doi.org/10.21105/joss.01708## Installation
Adeft works with Python versions 3.5 and above. It is available on PyPi and can be installed with the command
$ pip install adeft
Adeft's pretrained machine learning models can then be downloaded with the command
$ python -m adeft.download
If you choose to install by cloning this repository
$ git clone https://github.com/indralab/adeft.git
You should also run
$ python setup.py build_ext --inplace
at the top level of your local repository in order to build the extension module
for alignment based longform detection and scoring.## Using Adeft
A dictionary of available models can be imported with `from adeft import available_models`The dictionary maps shortforms to model names. It's possible for multiple equivalent
shortforms to map to the same model.Here's an example of running a disambiguator for ER on a list of texts
```python
from adeft.disambiguate import load_disambiguatorer_dd = load_disambiguator('ER')
...
er_dd.disambiguate(texts)
```Users may also build and train their own disambiguators. See the documention
for more info.## Documentation
Documentation is available at
[https://adeft.readthedocs.io](http://adeft.readthedocs.io)Jupyter notebooks illustrating Adeft workflows are available under `notebooks`:
- [Introduction](notebooks/introduction.ipynb)
- [Model building](notebooks/model_building.ipynb)## Testing
Adeft uses `pytest` for unit testing, and uses Github Actions as a
continuous integration environment. To run tests locally, make sure
to install the test-specific requirements listed in setup.py as```bash
pip install adeft[test]
```and download all pre-trained models as shown above.
Then run `pytest` in the top-level `adeft` folder.## Funding
Development of this software was supported by the Defense Advanced Research
Projects Agency under awards W911NF018-1-0124 and W911NF-15-1-0544, and the
National Cancer Institute under award U54-CA225088.