https://github.com/scrapy-plugins/scrapy-monkeylearn
A Scrapy pipeline to categorize items using MonkeyLearn
https://github.com/scrapy-plugins/scrapy-monkeylearn
Last synced: about 1 year ago
JSON representation
A Scrapy pipeline to categorize items using MonkeyLearn
- Host: GitHub
- URL: https://github.com/scrapy-plugins/scrapy-monkeylearn
- Owner: scrapy-plugins
- Created: 2015-03-03T15:00:34.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2017-04-28T12:51:05.000Z (about 9 years ago)
- Last Synced: 2025-04-07T08:02:05.217Z (about 1 year ago)
- Language: Python
- Homepage:
- Size: 43 KB
- Stars: 38
- Watchers: 5
- Forks: 13
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
Awesome Lists containing this project
README
scrapy-monkeylearn
==================
A `Scrapy`_ pipeline to categorize items using `MonkeyLearn`_.
Settings
--------
MONKEYLEARN_BATCH_SIZE
~~~~~~~~~~~~~~~~~~~~~~
The size of the item batches sent to MonkeyLearn.
Default: ``200``
Example:
.. code-block:: python
MONKEYLEARN_BATCH_SIZE = 200
MONKEYLEARN_MODULE
~~~~~~~~~~~~~~~~~~
The ID of the monkeylearn module.
Example:
.. code-block:: python
MONKEYLEARN_MODULE = 'cl_oFKL5wft'
MONKEYLEARN_USE_SANDBOX
~~~~~~~~~~~~~~~~~~~~~~~
In case of using a classifier, if the sandbox version should be used.
Default: ``False``
Example:
.. code-block:: python
MONKEYLEARN_USE_SANDBOX = True
MONKEYLEARN_TOKEN
~~~~~~~~~~~~~~~~~
The auth token.
Example:
.. code-block:: python
MONKEYLEARN_TOKEN = 'TWFuIGlzIGRp...'
MONKEYLEARN_FIELD_TO_PROCESS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A field or list of Item text fields to use for classification.
Also comma-separated string with field names is supported.
Example:
.. code-block:: python
MONKEYLEARN_FIELD_TO_PROCESS = 'title'
.. code-block:: python
MONKEYLEARN_FIELD_TO_PROCESS = ['title', 'description']
.. code-block:: python
MONKEYLEARN_FIELD_TO_PROCESS = 'title,description'
MONKEYLEARN_FIELD_OUTPUT
~~~~~~~~~~~~~~~~~~~~~~~~
The field where the MonkeyLearn output will be stored.
Example:
.. code-block:: python
MONKEYLEARN_FIELD_OUTPUT = 'categories'
An example value of the `MONKEYLEARN_FIELD_OUTPUT` field after classification is:
.. code-block:: python
[{'label': 'English', 'probability': 0.321}]
Usage
-----
In your *settings.py* file, add the previously described settings and add ``MonkeyLearnPipeline`` to your pipelines, e.g.:
.. code-block:: python
ITEM_PIPELINES = {
'scrapy_monkeylearn.pipelines.MonkeyLearnPipeline': 100,
}
License
-------
Copyright (c) 2015 `MonkeyLearn`_.
Released under the MIT license.
.. _Scrapy: http://scrapy.org/
.. _MonkeyLearn: http://www.monkeylearn.com/