{"id":19174624,"url":"https://github.com/equalitie/deflect-analytics-ecosystem","last_synced_at":"2026-02-26T20:47:28.288Z","repository":{"id":56599597,"uuid":"263368660","full_name":"equalitie/deflect-analytics-ecosystem","owner":"equalitie","description":"A collection of Dockerfiles to be used with Baskerville","archived":false,"fork":false,"pushed_at":"2022-01-24T21:08:23.000Z","size":905,"stargazers_count":2,"open_issues_count":1,"forks_count":2,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-01-04T01:36:45.088Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Dockerfile","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/equalitie.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-05-12T14:58:49.000Z","updated_at":"2022-05-26T18:17:19.000Z","dependencies_parsed_at":"2022-08-15T21:40:37.250Z","dependency_job_id":null,"html_url":"https://github.com/equalitie/deflect-analytics-ecosystem","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fdeflect-analytics-ecosystem","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fdeflect-analytics-ecosystem/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fdeflect-analytics-ecosystem/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/equalitie%2Fdeflect-analytics-ecosystem/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/equalitie","download_url":"https://codeload.github.com/equalitie/deflect-analytics-ecosystem/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240254182,"owners_count":19772386,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T10:18:33.255Z","updated_at":"2026-02-26T20:47:23.242Z","avatar_url":"https://github.com/equalitie.png","language":"Dockerfile","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Docker files for the  Deflect Analytics Ecosystem\n## Usage\n- Before running: \n    - for the kafka service: `export DOCKER_KAFKA_HOST=$(ipconfig getifaddr en0)` where `en0` is the name of the interface you are currently using.\n    - if you are using the baskerville container, make sure you have a `.env` file where `docker-compose.yaml` is. You can use the `dot_env_file` example file by renaming it to `.env` and\nchanging the variable values.\n- To run: `docker-compose up -d` to create and bring up the containers defined in `docker-compose.yaml`\n- To see the logs: `docker-compose logs \u003cname of your service that was defined in docker-compose file, e.g. baskerville\u003e`\n- To stop: in the `docker-compose.yaml` directory, run `docker-compose down` to stop all containers.\n- To permanently stop and remove `docker `containers:\n```bash\ndocker stop $(docker ps -a -q)\ndocker rm $(docker ps -a -q)\n```\n- Useful urls:\n    - Baskerville metrics exporter: http://localhost:8998\n    - Grafana: http://localhost:3000\n    - Prometheus: http://localhost:9090\n    - Prometheus Status: http://localhost:9090/targets (to check which services are up)\n    - Prometheus Push Gateway: http://localhost:9091\n    - Spark: http://localhost:4040 (available only when Baskerville is running)\n\n### Baskerville specific usage:\nTo run the following pipelines, change the `command` part in docker-compose baskerville section:\n- rawlog: \n    ```yaml\n     # -e is optional, used to export metrics\n     python ./main.py -c /app/baskerville/conf/baskerville.yaml rawlog -e -t\n    ```\n- kafka:\n    ```yaml\n     # -s is used to start a script that will post logs in kafka, to simulate incoming log traffic\n     python ./main.py -c /app/baskerville/conf/baskerville.yaml kafka -e -s -t\n    ```\n- es:\n    ```yaml\n     # note that an elasticsearch service is not currently provided. There is a dockerfile that can be used\n     # but with no sample data.\n     python ./main.py -c /app/baskerville/conf/baskerville.yaml es -e -t\n    ```\n  \n Note: ` -t adds a test model in database`\n  \nAll the services defined here are to support [Baskerville](https://github.com/equalitie/baskerville), an Analytics Engine that leverages Machine Learning to identify anomalies in web traffic.\nComponents to Pipelines:\n\nFor Baskerville `kafka` you'll need:\n  - Kafka\n  - Zookeeper\n  - Postgres\n  - Prometheus  [optional]\n  - Grafana     [optional]\n\nFor Baskerville `rawlog` you'll need:\n  - Postgres\n  - Elastic Search\n  - Prometheus  [optional]\n  - Grafana     [optional]\n\nFor Baskerville `es` you'll need:\n  - Postgres\n  - Elastic Search\n  - Prometheus  [optional]\n  - Grafana     [optional]\n\n__An ElasticSearch service is not provided.__\n\nFor Baskerville `training` you'll need:\n  - Postgres\n  - Prometheus  [optional]\n  - Grafana     [optional]\n\n## The containers:\n- Postgres: a Postgres instance with the Timescale extension\n- Postgres Exporter (postgres-exporter): to monitor Postgres with Prometheus\n- Prometheus: To gather metrics about components\n- Prometheus Postgresql Adapter (prometheus-postgresql-adapter): To use Postgres as a backend for Prometheus\n- Prometheus Push Gateway (prometheus_push_gw): For short lived metrics, like Spark metrics\n- Grafana: For metrics visualisation and alerts. Grafana comes preconfigured with two Datasources: Prometheus and Postgres. The Baskerville, spark and kafka-related dashboards are also pre-loaded.\n- Zookeeper: For managing the Kafka instance\n- Kafka: For log publishing/ queuing so that Baskerville can consume and process the incoming weblogs (practically used to transport logs from the web servers to Baskerville)\n- Kafka Exporter: To monitor Kafka through Prometheus\n- Baskerville: the analytics engine\n\n### Grafana:\n- Kafka metrics:\n    - https://grafana.com/dashboards/721\n    - https://grafana.com/dashboards/5484\n- Postgres metrics:\n    - Two ways:\n        - Directly connect to a Postgres database and set up a dashboard with queries like:\n            ```sql\n             SELECT\n              $__time(request_sets.created_at),\n              count(id)\n            FROM\n              request_sets\n            GROUP BY target, time\n            \n            OR \n  \n            SELECT\n              $__time(request_sets.created_at),\n              count(id),\n              CAST(target AS TEXT) AS metric\n            FROM\n              request_sets\n            WHERE $__timeFilter(request_sets.created_at)\n            GROUP BY target, time\n            ORDER BY time\n            ```\n           to have an overview of what is going on in the data\n           \n           NOTE: A user for grafana must be created to avoid running weird queries, like delete * from...\n           ```sql\n             CREATE USER grafanareader WITH PASSWORD 'password';\n             GRANT USAGE ON SCHEMA schema TO grafanareader;\n             GRANT SELECT ON schema.table TO grafanareader;\n          ```\n          \n          More on this [here](https://github.com/grafana/grafana/blob/master/docs/sources/features/datasources/postgres.md)\n        - Set up a postgres exporter to monitor requests, ram etc\n\n\n\n\u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by/4.0/\"\u003e\n\u003cimg alt=\"Creative Commons Licence\" style=\"border-width:0\" src=\"https://i.creativecommons.org/l/by/4.0/80x15.png\" /\u003e\u003c/a\u003e\u003cbr /\u003e\nThis work is copyright (c) 2020, eQualit.ie inc., and is licensed under a \u003ca rel=\"license\" href=\"http://creativecommons.org/licenses/by/4.0/\"\u003eCreative Commons Attribution 4.0 International License\u003c/a\u003e.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fequalitie%2Fdeflect-analytics-ecosystem","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fequalitie%2Fdeflect-analytics-ecosystem","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fequalitie%2Fdeflect-analytics-ecosystem/lists"}