An open API service indexing awesome lists of open source software.

https://github.com/andrefs/cetem_publico

Python WRAPPER for the CETEM Publico Corpus
https://github.com/andrefs/cetem_publico

Last synced: 10 months ago
JSON representation

Python WRAPPER for the CETEM Publico Corpus

Awesome Lists containing this project

README

          

cetem-publico
=============

``cetem-publico`` is a Python wrapper for the CETEMPublico corpus. It
takes care of downloading, storing and importing the corpus into NLTK.

**THIS IS STILL A WORK IN PROGRESS, API MIGHT BREAK WITHOUT WARNING.**

Installing
----------

Install and update using `pip`:

.. code-block:: text

pip install [--user] cetem-publico

A Simple Example
----------------

.. code-block:: python

import CETEMPublico

cp = CETEMPublico.load() # loads a small 10KB sample
# or
cp = CETEMPublico.load(full=True) # loads the full 12GB

print(cp.tagged_sents())

Acknowledgements
----------------

This module only exists thanks to the `Publico `_ newspaper and the team responsible for the `CETEMPublico `_ corpus.

Bugs and stuff
--------------

Open a `GitHub issue `_ or, preferably, send me a pull request.

License
-------

MIT