https://github.com/andrefs/cetem_publico
Python WRAPPER for the CETEM Publico Corpus
https://github.com/andrefs/cetem_publico
Last synced: 10 months ago
JSON representation
Python WRAPPER for the CETEM Publico Corpus
- Host: GitHub
- URL: https://github.com/andrefs/cetem_publico
- Owner: andrefs
- License: mit
- Created: 2019-11-15T13:55:15.000Z (over 6 years ago)
- Default Branch: master
- Last Pushed: 2019-11-16T15:58:15.000Z (over 6 years ago)
- Last Synced: 2025-02-26T08:15:28.653Z (over 1 year ago)
- Language: Python
- Size: 10.7 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- License: LICENSE.rst
Awesome Lists containing this project
README
cetem-publico
=============
``cetem-publico`` is a Python wrapper for the CETEMPublico corpus. It
takes care of downloading, storing and importing the corpus into NLTK.
**THIS IS STILL A WORK IN PROGRESS, API MIGHT BREAK WITHOUT WARNING.**
Installing
----------
Install and update using `pip`:
.. code-block:: text
pip install [--user] cetem-publico
A Simple Example
----------------
.. code-block:: python
import CETEMPublico
cp = CETEMPublico.load() # loads a small 10KB sample
# or
cp = CETEMPublico.load(full=True) # loads the full 12GB
print(cp.tagged_sents())
Acknowledgements
----------------
This module only exists thanks to the `Publico `_ newspaper and the team responsible for the `CETEMPublico `_ corpus.
Bugs and stuff
--------------
Open a `GitHub issue `_ or, preferably, send me a pull request.
License
-------
MIT