{"id":22574880,"url":"https://github.com/thoth-station/investigator","last_synced_at":"2025-04-10T16:11:32.661Z","repository":{"id":37556651,"uuid":"251260226","full_name":"thoth-station/investigator","owner":"thoth-station","description":"Thoth investigator is a Kafka based component that consumes all messages produced by Thoth components.","archived":false,"fork":false,"pushed_at":"2023-10-18T00:40:14.000Z","size":2678,"stargazers_count":4,"open_issues_count":4,"forks_count":10,"subscribers_count":6,"default_branch":"master","last_synced_at":"2025-03-24T13:51:25.355Z","etag":null,"topics":["aicoe","hacktoberfest","thoth","thoth-investigator"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/thoth-station.png","metadata":{"files":{"readme":"README.rst","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-30T09:33:10.000Z","updated_at":"2022-08-11T17:23:13.000Z","dependencies_parsed_at":"2023-02-13T03:15:27.487Z","dependency_job_id":null,"html_url":"https://github.com/thoth-station/investigator","commit_stats":null,"previous_names":[],"tags_count":72,"template":false,"template_full_name":"thoth-station/template-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thoth-station%2Finvestigator","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thoth-station%2Finvestigator/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thoth-station%2Finvestigator/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/thoth-station%2Finvestigator/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/thoth-station","download_url":"https://codeload.github.com/thoth-station/investigator/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248251020,"owners_count":21072685,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aicoe","hacktoberfest","thoth","thoth-investigator"],"created_at":"2024-12-08T03:08:21.117Z","updated_at":"2025-04-10T16:11:32.628Z","avatar_url":"https://github.com/thoth-station.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"Welcome to Thoth's investigator documentation\n---------------------------------------------\n\n.. image:: https://img.shields.io/github/v/tag/thoth-station/investigator?style=plastic\n  :target: https://github.com/thoth-station/investigator/releases\n  :alt: GitHub tag (latest by date)\n\n.. image:: https://quay.io/repository/thoth-station/investigator/status\n  :target: https://quay.io/repository/thoth-station/investigator?tab=tags\n  :alt: Quay - Build\n\nThoth's investigator is a Kafka based component that consumes all messages produced by Thoth components.\n\nIt has monitoring system in places that allow Thoth team to see what is happening in Thoth in terms of Kafka, Openshift, Argo for the different components\nand act when some alarms are received.\n\nThis agent relies mainly on:\n\n* `thoth-messaging \u003chttps://github.com/thoth-station/messaging\u003e`__ to handle Kafka messages.\n\n* `thoth-common \u003chttps://github.com/thoth-station/common\u003e`__ to schedule Argo workflows.\n\n* `thoth-storages \u003chttps://github.com/thoth-station/storages\u003e`__ to set/verify content in database.\n\n\nThis documentation corresponds to a component called \"investigator\". Sources can be\nfound on `GitHub \u003chttps://github.com/thoth-station/investigator\u003e`_.\n\nSee `thoth-station \u003chttps://thoth-station.ninja\u003e`_ website and `Thoth-Station\norganization on GitHub \u003chttps://github.com/thoth-station\u003e`_.\n\nGoals\n=====\n\n* Receive messages from different components and take action depending on the info about a package. (Consumer)\n\nEnvironment variables\n=====================\n\n**bold** indicates required, *italicized* indicates optional\n\nSee `thoth-messaging \u003chttps://github.com/thoth-station/messaging\u003e`__:\n\n* **KAFKA_BOOTSTRAP_SERVERS**: a comma seperated list of Kafka bootstrap servers.\n* *KAFKA_SECURITY_PROTOCOL*: specify what security protocol to use.\n\n  * *KAFKA_SSL_CERTIFICATE_LOCATION* (if security protocol is `SSL`).\n  * *KAFKA_SASL_USERNAME* and *KAFKA_SASL_PASSWORD* (if security protocol is `SASL`).\n\n* **KAFKA_CONSUMER_GROUP_ID**: specify Kafka consumer group, if two consumers have the same group then message\n  partitions are split between them. You can have a number of consumers equal to the number of message partitions.\n* *KAFKA_CONSUMER_MAX_POLL_INTERVAL_MS*: This is a timeout, if the consumer does not poll for messages for **N** seconds\n  then throws an error, when blocking for workflow limits this should be set moderately high. The default value is `300000`.\n* **KAFKA_CONSUMER_ENABLE_AUTOCOMMIT**: This should be set to `False` so that we don't commit messages which have not\n  been fully processed yet. Investigator will handle commiting messages.\n\n\nGit Services:\n\n* `THOTH_GITHUB_PRIVATE_TOKEN`: token for authenticating actions on GitHub repositories\n\n* `THOTH_GITLAB_PRIVATE_TOKEN`: token for authenticating actions on GitLab repositories\n\nEnforcing a workflow limit:\n\n* `ARGO_PENDING_SLEEP_TIME`: amount of time we wait between checking the number of workflows in progress\n\n* `ARGO_PENDING_WORKFLOW_LIMIT`: limit to enforce on argo for total number of pending workflows\n\nRetrying and Dealing with Exceptions:\n\n* *THOTH_INVESTIGATOR_MAX_RETRIES*: indicates the number of times that investigator should attempt to consume a message\nbefore pausing topic consumption or acking a failed message (default = 5)\n\n* *THOTH_INVESTIGATOR_BACKOFF*: how long to wait before trying to consume a failed message again. This backoff strategy\nis linear (default = 0.5)\n\n  * for 0 \u003c i \u003c MAX_RETRIES, wait time before attempting to consume again is i * BACKOFF\n\n* *THOTH_INVESTIGATOR_ACK_ON_FAIL*:\n  * **if type is integer**: if != 0 then if max retries is reached, instead of pausing the topic's consumption the\n  message will be acked and consumption will continue\n\n  * **if type is list**: if a list is passed as the envvar then all topics with a base name in that list will be acked\n  on failure\n\n\nService Paths\n=============\n\n* /metrics : exposes prometheus metrics to be scraped\n\n* /_health : indicates that web server can handle requests\n\n* /resume/{base_topic_name} : if message fails and forces topic to halt consumption then this endpoint can be used to manually resume consumption after issues have been addressed\n\n\nKafka/Argo combination in Project Thoth\n========================================\n\nThoth relies on Kafka and Argo for message handling and Argo workflows for services respectively.\n\nSeveral types of messages are handled by investigator and different type of actions are performed. In particular we can distinguish\ndifferent categories of messages in Thoth as described in the following sections.\n\nIncrease Thoth Knowledge\n=========================\n\nThe following messages are sent by different Thoth components:\n\n* `PackageReleasedMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/package_released/README.md\u003e`__.\n\n* `UnresolvedPackageMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/unresolved_package/README.md\u003e`__.\n\n* `UnrevsolvedPackageMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/unrevsolved_package/README.md\u003e`__.\n\n* `SIUnanalyzedPackageMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/si_unanalyzed_package/README.md\u003e`__.\n\n* `SolvedPackageMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/solved_package/README.md\u003e`__.\n\n* `CVEProvidedMessage \u003chttps://github.com/thoth-station/messaging/blob/master/thoth/messaging/cve_provided.py\u003e`__.\n\nMonitor Thoth results and knowledge\n===================================\n\nThe following message is sent by `advise reporter producer \u003chttps://github.com/thoth-station/advise-reporter\u003e`__ to show the use of recomendations across all Thoth integrations:\n\n* `AdviseJustificationMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/advise_justification/README.md\u003e`__.\n\nThe following messages are sent by `package update producer \u003chttps://github.com/thoth-station/package-update-job\u003e`__ to keep knowledge in the database up to date:\n\n* `HashMismatchMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/hash_mismatch/README.md\u003e`__.\n\n* `MissingPackageMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/missing_package/README.md\u003e`__\n\n* `MissingVersionMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/missing_version/README.md\u003e`__\n\n* `UpdateProvidesSourceDistroMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/update_provide_source_distro/README.md\u003e`__\n\nThe following message is sent by `solver \u003chttps://github.com/thoth-station/solver\u003e`__ when Thoth acquired all missing knowledge required to provide advice to a user (human or bot):\n\n* `AdviserReRunMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/advise_justification/README.md\u003e`__.\n\nTrigger User requests\n=====================\n\nThe following messages are sent by `User-API producer \u003chttps://github.com/thoth-station/user-api\u003e`__ when users (humans or bots)\ninteract with `Thoth integrations \u003chttps://github.com/thoth-station/adviser/blob/master/docs/source/integration.rst\u003e`__:\n\n* `AdviserTriggerMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/adviser_trigger/README.md\u003e`__.\n\n* `KebechetTriggerMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/kebechet_trigger/README.md\u003e`__\n\n* `PackageExtractTriggerMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/package_extract_trigger/README.md\u003e`__\n\n* `ProvenanceCheckerTriggerMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/provenance_checker_trigger/README.md\u003e`__\n\nThe following message is triggered internally to keep user repositories fresh when new Thoth knowledge is encountered:\n\n* `KebechetRunUrlTriggerMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/kebechet_run_url_trigger/README.md\u003e`__\n\n\nInvestigator scenarios description\n==================================\n\nThoth knowledge increase using investigator\n###########################################\n\n.. image:: https://raw.githubusercontent.com/thoth-station/investigator/master/thoth/investigator/images/IncreaseThothKnowledge.jpg\n   :align: center\n   :alt: Thoth knowledge increase using investigator.\n\nThe image above shows how Thoth keeps learning automatically using two fundamental components that produce messages described in this section:\n\n* `package release producer \u003chttps://github.com/thoth-station/package-releases-job\u003e`__ to acquire knowledge on newly released package version from a certain index.\n\n* `graph-refresh producer \u003chttps://github.com/thoth-station/graph-refresh-job\u003e`__ to allow Thoth continuosly learn and keep the internal knowledge up to date.\n\nThoth self-learn on errors during knowledge acquisition\n########################################################\n\n.. image:: https://raw.githubusercontent.com/thoth-station/investigator/master/thoth/investigator/images/UpdateProvidesSourceDistro.jpg\n   :align: center\n   :alt: Thoth self-learn on errors during knowledge acquisition.\n\nThe image above shows how Thoth is able to self-learn and act on known errors during knowledge acquisition about Security for a certain package:\n\n* if a package, version from a certain index cannot be downloaded because the source distro is missing or the package is missing SI workflow will send messages\n(`UpdateProvidesSourceDistroMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/update_provide_source_distro/README.md\u003e`__ or\n`MissingVersionMessage \u003chttps://github.com/thoth-station/investigator/blob/master/thoth/investigator/missing_version/README.md\u003e`__ respectively)\n\n* Investigator takes the messages and acts setting flags for those packages in Thoth knowledge graph so that next time Thoth is not going to schedule security analysis\nfor that package. (In the image below what Grafana dashboard shows)\n\n.. image:: https://raw.githubusercontent.com/thoth-station/investigator/master/thoth/investigator/images/SIAnalysisOverview.png\n   :align: center\n   :alt: Thoth SI Analysis monitoring.\n\nThoth self-heal when knowledge is missing in providing an advise\n#################################################################\n\n.. image:: https://raw.githubusercontent.com/thoth-station/investigator/master/thoth/investigator/images/FailedAdviceAdviserReRun.jpg\n   :align: center\n   :alt: Thoth self-heal when knowledge is missing in providing an advise.\n\nThe image above shows how Thoth is able to self-heal when knowledge is missing in providing an advise:\n\n* When a user requests Thoth advice, but there is missing information to provide it, the adviser Argo workflow\nwill send a message to Kafka (`UnresolvedPackageMessage \u003chttps://github.com/thoth-station/messaging/blob/master/thoth/messaging/unresolved_package.py\u003e`__)\nthrough one of its tasks which depends on `thoth-messaging \u003chttps://github.com/thoth-station/messaging\u003e`__ library.\n\n* investigator will consume these event messages and schedule solver workflows accordingly so that Thoth can learn about missing information.\n\n* During solver workflow two Kafka messages are sent out:\n  * `SolvedPackageMessage \u003chttps://github.com/thoth-station/messaging/blob/master/thoth/messaging/solved_package.py\u003e`__, used by investigator to schedule the next information that needs to be learned by Thoth e.g security information.\n  * `AdviserTriggerMessage \u003chttps://github.com/thoth-station/messaging/blob/master/thoth/messaging/adviser_trigger.py\u003e`__, that contains all information required by investigator to reschedule an adviser that previously failed.\n\n* The loop is closed once the adviser workflow re-run is successful in providing advice.\n\nThis self-learning data-driven pipeline with Argo and Kafka is fundamental for all Thoth integrations because it will make Thoth learn about new packages\nand keep its knowledge up to date to what users use in their software stacks.\n\nUsers interaction with Thoth services\n#####################################\n\n.. image:: https://raw.githubusercontent.com/thoth-station/investigator/master/thoth/investigator/images/UserAPIKafkaProducer.jpg\n   :align: center\n   :alt: Users interaction with Thoth services.\n\nThe image above explains what happen when a User of Thoth (Human or Bot) interacts with one of Thoth integrations.\n\n\nDev Guide\n=========\n\nMost of the additions to this repository will entail adding new messages to process. That is what is being documented\nhere, if you feel that any information is missing please feel free to open an issue.\n\nFor each message there are two things you should implement:\n\n1. message processing\n2. consumer metrics\n\ncreate a new directory in thoth/investigator which looks like this:\n\n* message_name\n\n  * `__init__.py`\n  * investigate_\u003cmessage_name\u003e.py\n  * metrics_\u003cmessage_name\u003e.py\n  * `README.md` describing the message and what happens once consumed by investigator.\n\nMessage Parsing\n================\n\nThe implentation of this portion is highly specific to your own problem so not much can be advised in terms of rules\nand regulations. In general calling the function `parse_\u003cmessage_name\u003e_message` is best practice.  Make sure to include\nthe three basic metrics to your function:\n\n.. code-block:: python\n\n  @foo_exceptions.count_exceptions()\n  @foo_in_progress.track_inprogress()\n  def parse_foo_message(message):\n      # do stuff\n      foo_success.inc()\n\n  # \u003cmessage_name\u003e = foo\n\n\nConsumer Metrics\n================\n\nFor consumer metrics you should at least have the following three:\n\n* \u003cmessage_name\u003e_exceptions (prometheus Counter)\n* \u003cmessage_name\u003e_success (prometheus Counter)\n* \u003cmessage_name\u003e_in_progress (prometheus Gauge)\n\nThese are extensions of the metrics in `thoth/investigator/metrics.py`\n\nThe following is an example of a basic metrics file for a message `foo`:\n\n.. code-block:: python\n\n  from ..metrics import in_progress, success, exceptions\n\n  foo_in_progress = in_progress.labels(message_type=\"foo\")\n  foo_success = success.labels(message_type=\"foo\")\n  foo_exceptions = exceptions.labels(message_type=\"foo\")\n\nYou can add metrics as you see fit, but if the metric is not specific only to your messages please move it to\nthoth/investigator/metrics.py and set the proper labels to differentiate between messages.\n\nOther additions\n================\n\n* `thoth/investigator/\u003cmessage_name\u003e/__init__.py`, please add the function for parsing messages\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthoth-station%2Finvestigator","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fthoth-station%2Finvestigator","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fthoth-station%2Finvestigator/lists"}