https://github.com/lucadibello/apache-storm-starter
Everything you need to start hacking on Apache Storm: preconfigured devcontainer, Gradle build, and a running example out of the box.
https://github.com/lucadibello/apache-storm-starter
apache-storm gradle java-21-lts starter
Last synced: 4 months ago
JSON representation
Everything you need to start hacking on Apache Storm: preconfigured devcontainer, Gradle build, and a running example out of the box.
- Host: GitHub
- URL: https://github.com/lucadibello/apache-storm-starter
- Owner: lucadibello
- Created: 2025-09-22T15:02:18.000Z (9 months ago)
- Default Branch: main
- Last Pushed: 2025-11-13T00:20:11.000Z (7 months ago)
- Last Synced: 2025-11-13T02:29:42.729Z (7 months ago)
- Topics: apache-storm, gradle, java-21-lts, starter
- Language: Java
- Homepage:
- Size: 969 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# apache-storm-starter
Everything you need to prototype Apache Storm topologies quickly: a Gradle build, a ready-to-use devcontainer, and a joke-driven word count example that proves the toolchain end to end.
## Example topology
- `RandomJokeSpout` (2 executors) reads the bundled `jokes.json` dataset and emits random jokes (id, category, rating, body).
- `SplitSentenceBolt` (3 executors) tokenizes each joke body into lowercase words.
- `WordCounterBolt` (3 executors) maintains per-word counters and emits the running total.
- `HistogramBolt` (single executor) collects the counts into a global histogram and writes a timestamped snapshot to `data/histogram.txt` every 5 seconds.
`WordCountTopology` wires these components with shuffle and fields groupings and, in local mode, keeps the embedded Storm cluster alive for about one minute so you have time to inspect the output. Production mode can be toggled via the `STORM_PROD` environment variable or the `-Dstorm.prod` system property (both default to `false`).
## Topology diagram

Dataset source:
## Run it locally
1. From the repo root run `./gradlew run`.
2. Watch the console logs; each spout/bolt uses SLF4J to report the tuples it processes.
3. Open `data/histogram.txt` while the topology is running (or right after shutdown) to see the aggregated word frequencies.
Tip: remove `data/histogram.txt` between runs if you prefer a clean snapshot.
Need to submit directly from the Gradle task? Use `STORM_PROD=true ./gradlew run`, `./gradlew run -Dstorm.prod=true`, or pass an explicit flag with `./gradlew run --args='--prod'` so the topology is submitted to Nimbus instead of the embedded LocalCluster.
## Devcontainer tasks
> [!IMPORTANT]
> These commands rely on the [go-task](https://taskfile.dev/) runner. If it is not installed locally, either install it (`brew install go-task`, `scoop install task`, or download a binary from the releases page) or run them from within the devcontainer where it is preinstalled.
- `task devcontainer`: build, start, and attach to the devcontainer (runs build → up → attach).
- `task devcontainer-recreate`: force a teardown and rebuild from scratch.
- `task devcontainer-build`: build only.
- `task devcontainer-up`: start or reuse the container.
- `task devcontainer-attach`: exec into the container and attach to the tmux session.
- `task devcontainer-down`: stop and remove the container plus its volumes.
## Submit to a remote Storm cluster
1. Toggle production mode at runtime by exporting `STORM_PROD=true` **or** passing `-Dstorm.prod=true` when invoking the JVM/Gradle task (no code changes needed).
2. Build the fat jar: `./gradlew clean jar`. The artifact lands in `build/libs/apache-storm-starter.jar`, bundles your application dependencies, and relies on the Storm runtime provided by the cluster (Storm jars stay external to avoid resource clashes).
3. Enter the devcontainer (`task devcontainer` or `task devcontainer-attach`). It already ships with a Storm CLI configured via `/root/storm.yaml`, including Nimbus and ZooKeeper endpoints, so no extra flags are required.
4. Submit the topology from inside the container (remember to enable production mode, e.g. `STORM_PROD=true`):
```bash
STORM_PROD=true storm jar build/libs/apache-storm-starter.jar \
org.apache.storm.example.WordCountTopology \
WordCountTopology
```
Replace the last argument if you want a different topology name.
5. Monitor the deployment through the Storm UI (`http://:8080`) or the CLI (`storm list`). When you're done, stop it with `storm kill WordCountTopology` (or your chosen name).
If you need to submit from outside the devcontainer, copy both `build/libs/apache-storm-starter.jar` and the provided `conf/storm.yaml` to the target machine and adjust the hostnames to match your cluster.