Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ghxm/eucy
Tools to analyse EU legal text using spaCy.
https://github.com/ghxm/eucy
Last synced: about 15 hours ago
JSON representation
Tools to analyse EU legal text using spaCy.
- Host: GitHub
- URL: https://github.com/ghxm/eucy
- Owner: ghxm
- License: mit
- Created: 2021-03-06T12:33:45.000Z (almost 4 years ago)
- Default Branch: master
- Last Pushed: 2024-06-10T15:36:35.000Z (7 months ago)
- Last Synced: 2024-11-06T02:42:56.866Z (about 2 months ago)
- Language: Python
- Homepage:
- Size: 440 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- Changelog: HISTORY.rst
- License: LICENSE
Awesome Lists containing this project
README
# euCy
> [!NOTE]
> Please note that this tool is still under active development and without proper documentation. The code is not stable and the API may change in the future. Feel free to use it but be aware of the risks and contribute to the development if you can.Tool to annotate EU legal text and compute some related measures based on spaCy.
## Installation
You can install the package from GitHub using pip:
```
pip install git+https://github.com/ghxm/euCy.git
```## Usage
> [!NOTE]
> This is a very bare bones example of how to use the package. There isn't a stable API or convenience functions yet.```
from eucy.eucy import EuWrapper
from eucy import utils
import spacy
import urllibnlp = spacy.blank('en')
eu_wrapper = EuWrapper(nlp)# get html from url
url = "https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A32014R1286"# open url and get html
html = urllib.request.urlopen(url).read().decode('utf-8')# extract text
text = utils.text_from_html(html)# read in text
doc = eu_wrapper(text)# print complexity stats
print(doc._.complexity)# get list of citations
citations = doc.spans['citations']# get list of recitals
recitals = doc.spans['recitals']# get list of articles
articles = doc.spans['articles']# get the article elements of the first article
ae_1 = doc._.article_elements[0]# print the first paragraph of the first article
print(ae_1['pars'][0])```
For a more extended overview of the usage and functionality, please see the `tests` folder in the meantime.
## Credits
This package was created with Cookiecutter_ and the `briggySmalls/cookiecutter-pypackage`_ project template.
- Cookiecutter: https://github.com/audreyr/cookiecutter
- `briggySmalls/cookiecutter-pypackage`: https://github.com/briggySmalls/cookiecutter-pypackage