https://github.com/bieli/big-data-architecture-advisor

Kind of "expert system" for BigData architects roles ;-) For fun & learning!
https://github.com/bieli/big-data-architecture-advisor

advisor architecture expert-system rust

Last synced: 2 months ago
JSON representation

Kind of "expert system" for BigData architects roles ;-) For fun & learning!

Host: GitHub
URL: https://github.com/bieli/big-data-architecture-advisor
Owner: bieli
License: bsd-3-clause
Created: 2025-12-16T22:49:54.000Z (7 months ago)
Default Branch: main
Last Pushed: 2025-12-16T23:45:51.000Z (7 months ago)
Last Synced: 2025-12-20T13:23:20.998Z (7 months ago)
Topics: advisor, architecture, expert-system, rust
Language: Rust
Homepage:
Size: 19.5 KB
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

          # big-data-architecture-advisor

Kind of "expert system" for BigData architects roles ;-) For fun & learning!

## Motivation

Modern data systems are complex, distributed, and often difficult to design. Architects must balance multiple factors such as **data volume, workload type, SLA requirements, budget constraints, and observability needs**. Traditional decision-making is often ad hoc and opaque, making it hard to justify choices or reproduce them later.

This project was motivated by the need for a **transparent, rule-based expert assistant** that:

- Encodes architectural knowledge as declarative rules.

- Produces clear recommendations with explanations.

- Supports multiple criteria, including observability, cost, and compliance.

- Allows dynamic weighting of priorities (e.g., SLA vs. budget vs. observability).

- Provides reproducible, explainable decisions for engineers and architects.

## How It Works?

The advisor is implemented in **Rust** using the [`crepe`](https://crates.io/crates/crepe) Datalog engine. It works as follows:

```rust

crepe! {

    @input

    struct Volume(&'static str);

    @input

    struct Workload(&'static str);

    @input

    struct SLA(&'static str);

    @input

    struct Budget(&'static str);

    @input

    struct Observability(&'static str);

    @output

    struct Recommendation(&'static str, i32);

    @output

    struct Explanation(&'static str);

...

```

### Important points

1. **Inputs**: The user provides facts about their scenario:

   - Data Volume: `small`, `medium`, `large`

   - Workload Type: `batch`, `streaming`, `mixed`

   - SLA Requirement: `low`, `high`

   - Budget: `low`, `high`

   - Observability: `low`, `medium`, `high`

2. **Rules**: The system encodes architectural knowledge as declarative rules. Examples:

   - Large batch + low SLA + low budget → Hadoop/Spark

   - Streaming + high SLA → Kafka/Flink

   - Large mixed + high SLA + high budget → Lakehouse

   - High observability → Prometheus/Grafana, OpenTelemetry/Jaeger, lineage tools

3. **Scoring**: Each recommendation has a base score. The user can assign **weights** to criteria (SLA, budget, volume, workload, observability). Final scores are adjusted dynamically based on these weights.

4. **CLI Interface**: The program runs interactively:

   - Prompts the user for inputs.

   - Prompts for weights (or uses defaults).

   - Produces ranked recommendations.

   - By default shows the top suggestion; with `--multi` flag shows all ranked suggestions.

   - With `--history` flag, prints past saved decisions.

## Multi Mode (`--multi`)

By default, the advisor uses **strict rule-based matching**: only architectures that exactly fit the provided inputs (volume, workload, SLA, budget, observability) are recommended.

With the `--multi` flag, the advisor switches to **fuzzy similarity scoring**:

- Every architecture profile is evaluated against the user’s inputs.

- Exact matches contribute full points; `"any"` or partial matches contribute partial points.

- User-defined weights (SLA, budget, volume, workload, observability) influence the scoring.

- All architectures are ranked and displayed, not just the top match.

This mode is useful when:

- You want to explore **alternative architectures** beyond the strict best fit.

- You need to see **trade-offs** between SLA, cost, observability, and workload types.

- You’re experimenting with different weights to understand how priorities shift recommendations.

## Example Run

### Strict onr recommendation mode

```bash

$ cargo run

=== Big Data Architecture Advisor CLI ===

*** STRICT Rule-Based Mode (using Crepe library) ***

Enter data volume (small/medium/large):

medium

Enter workload type (batch/streaming/mixed):

mixed

Enter SLA requirement (low/high):

low

Enter budget (low/high):

high

Enter observability requirement (low/medium/high):

medium

Enter SLA weight (default 5):

2

Enter Budget weight (default 3):

3

Enter Volume weight (default 2):

1

Enter Workload weight (default 1):

1

Enter Observability weight (default 4):

4

=== Recommendations ===

-> Cloud-native Monitoring (CloudWatch/Stackdriver/Azure Monitor) (score: 82)

=== Explanations ===

- Cloud-native monitoring chosen for integrated observability in cloud ecosystems.

```

### Multi recommendations mode

```bash

$ cargo run -- --multi

=== Big Data Architecture Advisor CLI ===

*** MULTI Recommendations Mode ***

Enter data volume (small/medium/large):

large

Enter workload type (batch/streaming/mixed):

mixed

Enter SLA requirement (low/high):

low

Enter budget (low/high):

high

Enter observability requirement (low/medium/high):

medium

Enter SLA weight (default 5):

3

Enter Budget weight (default 3):

2

Enter Volume weight (default 2):

1

Enter Workload weight (default 1):

1

Enter Observability weight (default 4):

3

=== Recommendations ===

-> Cloud-native Monitoring (CloudWatch/Stackdriver/Azure Monitor) (score: 150)

-> Lakehouse (score: 135)

-> ELK Stack (Elasticsearch/Logstash/Kibana) (score: 125)

-> ML Platform (Kubeflow/MLflow) (score: 123)

-> Data Mesh (score: 120)

-> Monitoring Stack (Prometheus/Grafana) (score: 120)

-> Data Lineage Tools (OpenLineage/Marquez) (score: 120)

-> Distributed Tracing (OpenTelemetry/Jaeger) (score: 118)

-> Hybrid Cloud (score: 110)

-> Orchestration (Airflow/Prefect) (score: 110)

-> Alerting & Incident Response (PagerDuty/OpsGenie) (score: 110)

-> Data Warehouse (score: 110)

-> Kafka/Flink (score: 105)

-> Data Lake (S3/ADLS) (score: 105)

-> FinOps Dashboards (score: 103)

-> Hadoop/Spark (score: 100)

-> Governance Layer (Collibra/Alation) (score: 95)

-> ETL Pipelines (score: 85)

```

## TODO list

- [ ] Add --clear-history flag.

- [ ] Extend rules for serverless, data fabric, AI pipelines.

- [ ] Add YAML/JSON config for batch runs.

- [ ] Integrate with web UI for interactive decision support.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bieli/big-data-architecture-advisor

Awesome Lists containing this project

README