https://github.com/thoth-station/selinon-worker
Selinon worker used for data gathering, data cleansing and experiments in project Thoth
https://github.com/thoth-station/selinon-worker
artificial-intelligence thoth
Last synced: 5 months ago
JSON representation
Selinon worker used for data gathering, data cleansing and experiments in project Thoth
- Host: GitHub
- URL: https://github.com/thoth-station/selinon-worker
- Owner: thoth-station
- License: gpl-3.0
- Archived: true
- Created: 2018-11-13T17:51:27.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2022-02-02T11:56:47.000Z (over 4 years ago)
- Last Synced: 2025-11-28T05:25:22.122Z (7 months ago)
- Topics: artificial-intelligence, thoth
- Language: Python
- Homepage:
- Size: 110 KB
- Stars: 1
- Watchers: 8
- Forks: 5
- Open Issues: 5
-
Metadata Files:
- Readme: README.rst
- License: LICENSE
Awesome Lists containing this project
README
thoth-worker
------------
A worker implementation for running workflows - used for data gathering for
project Thoth.
The worker implementation is an implementation of Selinon's worker - see `Selinon `_ and `Selinon `_ docs for more info.
Visualizing flows
=================
To visualize available flows, visually show which storage adapters are used for which flows as well as how tasks are grouped, you can issue the following command that places SVG images into ``fig/`` directory.
.. code-block::
pipenv run selinon-cli -vvv plot --nodes-definition thoth/worker/config/nodes.yaml --flow-definitions thoth/worker/config/flows --output-dir fig
Running thoth-worker in the cluster
===================================
There is assigned a namespace to run thoth-worker and data aggregation part (ping someone from Thoth team on AICoE channel). The main interaction point is an `API service `_.
Running thoth-worker locally
============================
All workflows are primarily designed to be run in the cluster, but Selinon offers a simple CLI interface to run flows locally.
Please configure AICoE gopass for passwords management (ping somebody on AICoE channel for more info), you can inject your passwords into your environment by running:
.. code-block:: console
eval $(gopass show aicoe/thoth/ceph.sh)
export THOTH_CEPH_BUCKET_PREFIX=data/thoth/$USER/
Please export ``THOTH_CEPH_BUCKET_PREFIX`` with your username as shown above so you do not clash with data used in the cluster - this way you will work with your own copy of data.
Before running flows, install all the requirements:
.. code-block:: console
pipenv install
Now you are able to run Selinon flows, to run keywords gathering flow, issue the following command:
.. code-block:: console
export PYTHONPATH=.
pipenv run selinon-cli -vvv execute --nodes-definition thoth/worker/config/nodes.yaml --flow-definitions thoth/worker/config/flows --flow-name keywords
If you would like to run only some of the tasks in defined flows, feel free to use selective flow runs (see `docs `_ for more info):
.. code-block:: console
pipenv run selinon-cli -vvv execute --nodes-definition thoth/worker/config/nodes.yaml --flow-definitions thoth/worker/config/flows --flow-name keywords --selective-task-names StackOverflowKeywordsAggregationTask
To see all the available flows, reach out to ``flows`` section in the ``thoth/worker/config/nodes.yaml`` file. It has a descriptive information with listing of all the available flows and arguments that are requested (you can specify arguments for CLI run via ``--node-args``, do not forget to use ``-j`` for JSON arguments).
It can be also useful to set ``--sleep-time`` to 0 for selinon-cli, not to wait for scheduler to schedule flows in large flow runs.