{"id":15287978,"url":"https://github.com/fkie-cad/logprep","last_synced_at":"2026-03-02T17:12:56.570Z","repository":{"id":36967808,"uuid":"364565857","full_name":"fkie-cad/Logprep","owner":"fkie-cad","description":"log data pre processing, generation and shipping in python","archived":false,"fork":false,"pushed_at":"2025-08-15T06:43:30.000Z","size":9955,"stargazers_count":33,"open_issues_count":15,"forks_count":8,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-08-15T08:28:19.177Z","etag":null,"topics":["etl","kafka","log","logdata","loggenerator","logshipper","opensearch","preprocessing","python","soar","sre"],"latest_commit_sha":null,"homepage":"https://logprep.readthedocs.io/en/latest/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"lgpl-2.1","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fkie-cad.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-05-05T12:16:23.000Z","updated_at":"2025-08-08T07:00:54.000Z","dependencies_parsed_at":"2023-10-02T09:03:33.945Z","dependency_job_id":"0bcbd138-2f33-4bd0-926d-67aa2fb467c1","html_url":"https://github.com/fkie-cad/Logprep","commit_stats":{"total_commits":601,"total_committers":22,"mean_commits":"27.318181818181817","dds":0.6156405990016639,"last_synced_commit":"8bbb38e676881a13187261f6cd6b833d92c8298d"},"previous_names":[],"tags_count":70,"template":false,"template_full_name":null,"purl":"pkg:github/fkie-cad/Logprep","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2FLogprep","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2FLogprep/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2FLogprep/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2FLogprep/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fkie-cad","download_url":"https://codeload.github.com/fkie-cad/Logprep/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fkie-cad%2FLogprep/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271321440,"owners_count":24739472,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-20T02:00:09.606Z","response_time":69,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["etl","kafka","log","logdata","loggenerator","logshipper","opensearch","preprocessing","python","soar","sre"],"created_at":"2024-09-30T15:43:39.382Z","updated_at":"2026-02-03T17:09:25.207Z","avatar_url":"https://github.com/fkie-cad.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003ch1 align=\"center\"\u003eLogprep\u003c/h1\u003e\n\u003ch3 align=\"center\"\u003e\n\n![GitHub release (latest by date)](https://img.shields.io/github/v/release/fkie-cad/Logprep)\n![GitHub Workflow Status (branch)](https://img.shields.io/github/actions/workflow/status/fkie-cad/logprep/main.yml?branch=main)\n[![Documentation Status](https://readthedocs.org/projects/logprep/badge/?version=latest)](http://logprep.readthedocs.io/?badge=latest)\n![GitHub contributors](https://img.shields.io/github/contributors/fkie-cad/Logprep)\n\u003ca href=\"https://codecov.io/github/fkie-cad/Logprep\" target=\"_blank\"\u003e\n    \u003cimg src=\"https://img.shields.io/codecov/c/github/fkie-cad/Logprep?color=%2334D058\" alt=\"Coverage\"\u003e\n\u003c/a\u003e\n![GitHub Repo stars](https://img.shields.io/github/stars/fkie-cad/logprep?style=social)\n\u003c/h3\u003e\n\n## Introduction\n\nLogprep allows to collect, process and forward log messages from various data sources.\nLog messages are being read and written by so-called connectors.\nCurrently, connectors for Kafka, Opensearch, S3, HTTP and JSON(L) files exist.\n\nThe log messages are processed in serial by a pipeline of processors,\nwhere each processor modifies an event that is being passed through.\nThe main idea is that each processor performs a simple task that is easy to carry out.\nOnce the log message is passed through all processors in the pipeline the resulting\nmessage is sent to a configured output connector.\n\nLogprep is primarily designed to process log messages. Generally, Logprep can handle JSON messages,\nallowing further applications besides log handling.\n\n- [About Logprep](https://github.com/fkie-cad/Logprep/blob/main/README.md#about-logprep)\n- [Installation](https://logprep.readthedocs.io/en/latest/installation.html)\n- [Deployment Examples](https://logprep.readthedocs.io/en/latest/examples/index.html)\n- [Event Generation](https://logprep.readthedocs.io/en/latest/user_manual/execution.html#event-generation)\n- [Documentation](https://logprep.readthedocs.io/en/latest)\n- [Container signatures](https://github.com/fkie-cad/Logprep/blob/main/README.md#container-signatures)\n- [Container SBOM](https://github.com/fkie-cad/Logprep/blob/main/README.md#container-sbom)\n- [Contributing](https://github.com/fkie-cad/Logprep/blob/main/CONTRIBUTING.md)\n- [License](https://github.com/fkie-cad/Logprep/blob/main/LICENSE)\n- [Changelog](https://github.com/fkie-cad/Logprep/blob/main/CHANGELOG.md)\n\n## About Logprep\n\n### Pipelines\n\nLogprep processes incoming log messages with a configured pipeline that can be spawned\nmultiple times via multiprocessing.\nThe following chart shows a basic setup that represents this behaviour.\nThe pipeline consists of three processors: the `Dissector`, `Geo-IP Enricher` and the\n`Dropper`.\nEach pipeline runs concurrently and takes one event from it's `Input Connector`.\nOnce the log messages is fully processed the result will be forwarded to the `Output Connector`,\nafter which the pipeline will take the next message, repeating the processing cycle.\n\n```mermaid\nflowchart LR\nA1[Input\\nConnector] --\u003e B\nA2[Input\\nConnector] --\u003e C\nA3[Input\\nConnector] --\u003e D\nsubgraph Pipeline 1\nB[Dissector] --\u003e E[Geo-IP Enricher]\nE --\u003e F[Dropper]\nend\nsubgraph Pipeline 2\nC[Dissector] --\u003e G[Geo-IP Enricher]\nG --\u003e H[Dropper]\nend\nsubgraph Pipeline n\nD[Dissector] --\u003e I[Geo-IP Enricher]\nI --\u003e J[Dropper]\nend\nF --\u003e K1[Output\\nConnector]\nH --\u003e K2[Output\\nConnector]\nJ --\u003e K3[Output\\nConnector]\n```\n\n### Processors\n\nEvery processor has one simple task to fulfill.\nFor example, the `Dissector` can split up long message fields into multiple subfields\nto facilitate structural normalization.\nThe `Geo-IP Enricher`, for example, takes an ip-address and adds the geolocation of it to the\nlog message, based on a configured geo-ip database.\nOr the `Dropper` deletes fields from the log message.\n\nAs detailed overview of all processors can be found in the\n[processor documentation](https://logprep.readthedocs.io/en/latest/configuration/processor.html).\n\nTo influence the behaviour of those processors, each can be configured with a set of rules.\nThese rules define two things.\nFirstly, they specify when the processor should process a log message\nand secondly they specify how to process the message.\nFor example which fields should be deleted or to which IP-address the geolocation should be\nretrieved.\n\n\n### Connectors\n\nConnectors are responsible for reading the input and writing the result to a desired output.\nThe main connectors that are currently used and implemented are a kafka-input-connector and a\nkafka-output-connector allowing to receive messages from a kafka-topic and write messages into a\nkafka-topic. Addionally, you can use the Opensearch or Opensearch output connectors to ship the\nmessages directly to Opensearch or Opensearch after processing.\n\nThe details regarding the connectors can be found in the\n[input connector documentation](https://logprep.readthedocs.io/en/latest/configuration/input.html)\nand\n[output connector documentation](https://logprep.readthedocs.io/en/latest/configuration/output.html).\n\n### Configuration\n\nTo run Logprep, certain configurations have to be provided. Because Logprep is designed to run in a\ncontainerized environment like Kubernetes, these configurations can be provided via the filesystem or\nhttp. By providing the configuration via http, it is possible to control the configuration change via\na flexible http api. This enables Logprep to quickly adapt to changes in your environment.\n\nFirst, a general configuration is given that describes the pipeline and the connectors,\nand lastly, the processors need rules in order to process messages correctly.\n\nThe following yaml configuration shows an example configuration for the pipeline shown\nin the graph above:\n\n```yaml\nprocess_count: 3\ntimeout: 0.1\n\npipeline:\n  - dissector:\n      type: dissector\n      rules:\n        - https://your-api/dissector/\n        - rules/01_dissector/rules/\n  - geoip_enricher:\n      type: geoip_enricher\n      rules:\n        - https://your-api/geoip/\n        - rules/02_geoip_enricher/rules/\n      tree_config: artifacts/tree_config.json\n      db_path: artifacts/GeoDB.mmdb\n  - dropper:\n      type: dropper\n      rules:\n        - rules/03_dropper/rules/\n\ninput:\n  mykafka:\n    type: confluentkafka_input\n    bootstrapservers: [127.0.0.1:9092]\n    topic: consumer\n    group: cgroup\n    auto_commit: true\n    session_timeout: 6000\n    offset_reset_policy: smallest\noutput:\n  opensearch:\n    type: opensearch_output\n    hosts:\n        - 127.0.0.1:9200\n    default_index: default_index\n    error_index: error_index\n    message_backlog_size: 10000\n    timeout: 10000\n    max_retries:\n    user: the username\n    secret: the passord\n    cert: /path/to/cert.crt\n```\n\nThe following yaml represents a dropper rule which according to the previous configuration\nshould be in the `rules/03_dropper/rules/` directory.\n\n```yaml\nfilter: \"message\"\ndrop:\n  - message\ndescription: \"Drops the message field\"\n```\n\nThe condition of this rule would check if the field `message` exists in the log.\nIf it does exist then the dropper would delete this field from the log message.\n\nDetails about the rule language and how to write rules for the processors can be found in the\n[rule configuration documentation](https://logprep.readthedocs.io/en/latest/configuration/rules.html).\n\n## Documentation\n\nThe documentation for Logprep is online at https://logprep.readthedocs.io/en/latest/ or it can\nbe built locally via:\n\n```\nsudo apt install pandoc\nuv sync --frozen --extra doc\ncd ./doc/\nmake html\n```\n\nA HTML documentation can be then found in `doc/_build/html/index.html`.\n\n## Container signatures\n\nFrom release 15 on, Logprep containers are signed using the\n[cosign](https://github.com/sigstore/cosign) tool.\nTo verify the container, you can copy the following public key into a file\n`logprep.pub`:\n\n```\n-----BEGIN PUBLIC KEY-----\nMFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEgkQXDi/N4TDFE2Ao0pulOFfbGm5g\nkVtARE+LJfSFI25BanOG9jaxxRGVt+Sa1KtQbMcy7Glxu0s7XgD9VFGjTA==\n-----END PUBLIC KEY-----\n```\n\nAnd use it to verify the signature:\n\n```\ncosign verify --key logprep.pub ghcr.io/fkie-cad/logprep:py3.11-latest\n```\n\nThe output should look like:\n\n```\nVerification for ghcr.io/fkie-cad/logprep:py3.11-latest --\nThe following checks were performed on each of these signatures:\n  - The cosign claims were validated\n  - Existence of the claims in the transparency log was verified offline\n  - The signatures were verified against the specified public key\n\n[{\"critical\":{\"identity\":{\"docker-reference\":\"ghcr.io/fkie-cad/logprep\"}, ...\n```\n\n## Container SBOM\n\nFrom release 15 on, Logprep container images are shipped with a generated sbom.\nTo verify the attestation and extract the SBOM use\n[cosign](https://github.com/sigstore/cosign) with:\n\n```\ncosign verify-attestation --key logprep.pub ghcr.io/fkie-cad/logprep:py3.11-latest | jq '.payload | @base64d | fromjson | .predicate | .Data | fromjson' \u003e sbom.json\n```\n\nThe output should look like:\n\n```\nVerification for ghcr.io/fkie-cad/logprep:py3.11-latest --\nThe following checks were performed on each of these signatures:\n  - The cosign claims were validated\n  - Existence of the claims in the transparency log was verified offline\n  - The signatures were verified against the specified public key\n```\n\nFinally, you can view the extracted sbom with:\n\n```\ncat sbom.json | jq\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffkie-cad%2Flogprep","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffkie-cad%2Flogprep","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffkie-cad%2Flogprep/lists"}