{"id":48514384,"url":"https://github.com/rmoff/kafka-connect-ais","last_synced_at":"2026-06-09T08:00:20.805Z","repository":{"id":346752384,"uuid":"1190909995","full_name":"rmoff/kafka-connect-ais","owner":"rmoff","description":"Kafka Connect source connector for AIS maritime data via TCP","archived":false,"fork":false,"pushed_at":"2026-06-09T07:08:52.000Z","size":270,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2026-06-09T07:24:32.004Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/rmoff.png","metadata":{"files":{"readme":"README.adoc","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2026-03-24T18:30:38.000Z","updated_at":"2026-06-09T07:08:57.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/rmoff/kafka-connect-ais","commit_stats":null,"previous_names":["rmoff/kafka-connect-ais"],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/rmoff/kafka-connect-ais","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmoff%2Fkafka-connect-ais","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmoff%2Fkafka-connect-ais/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmoff%2Fkafka-connect-ais/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmoff%2Fkafka-connect-ais/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/rmoff","download_url":"https://codeload.github.com/rmoff/kafka-connect-ais/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/rmoff%2Fkafka-connect-ais/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34096955,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-09T02:00:06.510Z","response_time":63,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2026-04-07T18:32:54.749Z","updated_at":"2026-06-09T08:00:20.778Z","avatar_url":"https://github.com/rmoff.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"= Kafka Connect AIS Source Connector\n\nimage:https://github.com/rmoff/kafka-connect-ais/actions/workflows/build.yml/badge.svg[Build]\n\nNOTE: This is a proof of concept, built to explore using Claude Code to write a Kafka Connect connector from scratch. The connector has not been tested beyond the basic quickstart below. Use at your own risk.\n\nA Kafka Connect source connector that ingests live https://en.wikipedia.org/wiki/Automatic_identification_system[AIS] (Automatic Identification System) maritime data from TCP endpoints and produces structured records to Kafka.\n\nIt handles TCP connection management, NMEA sentence parsing, multi-sentence fragment assembly, and AIS message decoding natively.\n\n== Architecture\n\nimage::docs/architecture.excalidraw.png[Architecture diagram]\n\n== What it does\n\n* Connects to AIS TCP data feeds (e.g., the Norwegian Coastal Administration's live feed at `153.44.253.27:5631`)\n* Decodes all 28 AIS message types into structured fields (position, vessel identity, base station reports, aids to navigation, safety messages, etc.)\n* Keys records by MMSI (vessel identifier) for natural partitioning\n* Reconnects automatically with exponential backoff when the TCP connection drops\n* Optionally routes messages to per-type topics (`.position`, `.static`, `.base_station`, etc.)\n\n== Quickstart\n\nYou'll need Java 11+, Maven, Docker, and https://github.com/kcctl/kcctl[kcctl].\n\nBuild the connector JAR:\n\n[source,bash]\n----\nmvn clean package -DskipTests\n----\n\nStart Kafka, Schema Registry, and Kafka Connect:\n\n[source,bash]\n----\ndocker compose up -d --build\n----\n\nPoint kcctl at the Connect worker:\n\n[source,bash]\n----\nkcctl config set-context --cluster=http://localhost:8083 local\n----\n\nWait for Connect to be ready, then check that the plugin is loaded:\n\n[source,bash]\n----\nkcctl get plugins\n----\n\nYou should see `net.rmoff.connect.ais.AisSourceConnector` in the list.\n\nCreate the connector:\n\n[source,bash]\n----\nkcctl apply -f configs/connector-ais.json\n----\n\nCheck it's running:\n\n[source,bash]\n----\nkcctl describe connector ais-source\n----\n\nConsume some records with https://github.com/edenhill/kcat[kcat]:\n\n[source,bash]\n----\nkcat -b localhost:9092 -t ais -C -s value=avro -r http://localhost:8081 -c 5 | jq '.'\n----\n\nOr use the console consumer from the Schema Registry container:\n\n[source,bash]\n----\ndocker exec schema-registry kafka-avro-console-consumer \\\n  --bootstrap-server broker:29092 \\\n  --topic ais \\\n  --from-beginning \\\n  --max-messages 5 \\\n  --property schema.registry.url=http://localhost:8081\n----\n\nShut it all down:\n\n[source,bash]\n----\nkcctl delete connector ais-source\ndocker compose down\n----\n\n=== Per-type topics\n\nInstead of a single topic, you can split messages into separate topics by category. Use `configs/connector-ais-per-type.json`:\n\n[source,bash]\n----\nkcctl apply -f configs/connector-ais-per-type.json\n----\n\nThis produces records on separate topics:\n\n* `ais.position` -- vessel position reports (types 1, 2, 3, 9, 18, 19, 27)\n* `ais.static` -- vessel identity and voyage data (types 5, 24)\n* `ais.base_station` -- base station reports (types 4, 11)\n* `ais.aton` -- aids to navigation (type 21)\n* `ais.safety` -- safety messages (types 12, 14)\n* `ais.binary` -- application-specific binary (types 6, 8, 25, 26)\n* `ais.other` -- control/management (everything else)\n\nNOTE: The connector does **not** create these topics. On a cluster with auto-create enabled (the default in the bundled `docker-compose.yml`) they'll appear on first write, but on clusters where auto-create is off -- including all Confluent Cloud clusters -- you have to create each per-type topic up front, otherwise the producer will fail with `UNKNOWN_TOPIC_OR_PARTITION`.\n\n=== Working with the data\n\nShow live vessel positions:\n\n[source,bash]\n----\nkcat -b localhost:9092 -t ais.position -C -s value=avro -r http://localhost:8081 -u | \\\n  jq -r '\"\\(.mmsi)  \\(.latitude.double)  \\(.longitude.double)  sog=\\(.speed_over_ground.double)  \\(.nav_status_text.string // \"\")\"'\n----\n\n....\n258006890  70.286049  23.470331  sog=0.0    Moored\n257125290  66.395016  12.77815   sog=23.5   Under way using engine\n259225000  69.63761   18.003028  sog=0.1    Engaged in fishing\n257733500  62.340093  5.689755   sog=0.0    Under way sailing\n....\n\nShow vessel names and destinations:\n\n[source,bash]\n----\nkcat -b localhost:9092 -t ais.static -C -s value=avro -r http://localhost:8081 -u | \\\n  jq -r '\"\\(.mmsi)  \\(.ship_name.string // \"-\")  \\(.ship_type_text.string // \"-\")  dest=\\(.destination.string // \"-\")\"'\n----\n\n....\n257073700  STOLMASUND    Cargo                   dest=CH 16\n257062790  TAIFUN        HSC                     dest=SALMAR TEKNISK\n259027590  FROY MASTER   Dredging/underwater ops  dest=FISHFARMS\n257275800  BJORNFJELL    Port tender             dest=NONVK\n....\n\nShow aids to navigation:\n\n[source,bash]\n----\nkcat -b localhost:9092 -t ais.aton -C -s value=avro -r http://localhost:8081 -u | \\\n  jq -r '\"\\(.mmsi)  \\(.aid_name.string // \"-\")  \\(.latitude.double)  \\(.longitude.double)  virtual=\\(.virtual_aid.boolean)\"'\n----\n\n....\n992651014  V 10        58.399653  12.320668  virtual=true\n992576420  TETRA SPAR  59.150701  5.013845   virtual=true\n....\n\nCheck the registered Avro schemas:\n\n[source,bash]\n----\ncurl -s http://localhost:8081/subjects | jq .\n----\n\n== Deploy to Confluent Cloud Custom Connectors\n\nThe same shaded JAR runs as a https://docs.confluent.io/cloud/current/connectors/bring-your-connector/overview.html#install-custom-connectors-for-ccloud[Custom Connector for Confluent Cloud]. The flow is: build the JAR, upload it as a plugin, then create the connector against your CC cluster.\n\nYou'll need the https://docs.confluent.io/confluent-cli/current/overview.html[`confluent` CLI] logged into your environment.\n\nBuild the JAR (same as the local path):\n\n[source,bash]\n----\nmvn clean package -DskipTests\n----\n\nUpload as a CC Custom Connector plugin (returns a `ccp-XXXXXX` plugin ID, keep it):\n\n[source,bash]\n----\nconfluent connect custom-plugin create kafka-connect-ais \\\n  --plugin-file target/kafka-connect-ais-0.1.0-SNAPSHOT.jar \\\n  --connector-class net.rmoff.connect.ais.AisSourceConnector \\\n  --connector-type source \\\n  --cloud aws\n----\n\nPre-create the destination Kafka topic (CC Standard clusters do not auto-create):\n\n[source,bash]\n----\nconfluent kafka topic create ais --partitions 6 \\\n  --cluster \u003clkc-...\u003e --environment \u003cenv-...\u003e\n----\n\nIf you're using the per-type variant (`configs/confluent-cloud/connector-ais-per-type.json`), pre-create **all** of them up front -- the connector will not create them itself:\n\n[source,bash]\n----\nfor t in ais.position ais.static ais.base_station ais.aton \\\n         ais.safety ais.binary ais.other; do\n  confluent kafka topic create \"$t\" --partitions 6 \\\n    --cluster \u003clkc-...\u003e --environment \u003cenv-...\u003e\ndone\n----\n\nEdit `configs/confluent-cloud/connector-ais.json` (provided in this repo) and fill in the placeholders:\n\n* `confluent.custom.plugin.id` -- the `ccp-XXXXXX` from the upload step\n* `confluent.custom.connection.endpoints` -- the AIS source endpoint (see \"Egress\" below)\n* `kafka.api.key` / `kafka.api.secret` -- a CC Kafka API key/secret for the target cluster\n\nThe Schema Registry URL, credentials, and SR egress are all injected automatically by `confluent.custom.schema.registry.auto=true` -- see \"Schema Registry\" below.\n\nCreate the connector:\n\n[source,bash]\n----\nconfluent connect cluster create \\\n  --config-file configs/confluent-cloud/connector-ais.json \\\n  --cluster \u003clkc-...\u003e --environment \u003cenv-...\u003e\n----\n\n=== Custom Connector-specific config fields you will not guess\n\nThese are required for any Custom Connector and have no equivalent in self-managed Kafka Connect, so they're not in the `Configuration` table further down. The `configs/confluent-cloud/connector-ais.json` template includes them all:\n\n* `confluent.connector.type=CUSTOM` -- marks this as a Custom Connector deployment\n* `confluent.custom.plugin.id=\u003cccp-XXXXXX\u003e` -- the plugin ID returned by `confluent connect custom-plugin create`\n* `confluent.custom.connection.endpoints=\u003chost:port:PROTO\u003e;...` -- egress allowlist (see next section)\n\n=== Egress: only the AIS source endpoint\n\nA Custom Connector for Confluent Cloud runs in a sandbox with no outbound network access by default. Every host it needs to reach must appear in `confluent.custom.connection.endpoints`.\n\nFor this connector, that's just the AIS source:\n\n[source]\n----\nconfluent.custom.connection.endpoints=153.44.253.27:5631:TCP\n----\n\nThe Schema Registry egress is added to the allowlist for you by `confluent.custom.schema.registry.auto=true` (see next section) -- you do **not** need to list the `psrc-...` FQDN here.\n\nCC docs say egress endpoints must be FQDN, but for the Norwegian Coastal Administration's AIS feed only an IP literal is published (no PTR / forward DNS). The IP literal **is** accepted by the CC control plane and the egress is honored.\n\n=== Schema Registry: let `schema.registry.auto` wire it up (lowercase `true`)\n\nThe default value path of this connector is Avro to a Schema Registry. The shipped template uses Confluent Cloud's auto-mode:\n\n[source]\n----\nconfluent.custom.schema.registry.auto=true\nvalue.converter=io.confluent.connect.avro.AvroConverter\n----\n\nThe platform injects the SR URL, credentials, and SR egress allowlist entry for you. Do **not** also set `value.converter.schema.registry.url`, `value.converter.basic.auth.credentials.source`, or `value.converter.basic.auth.user.info` -- auto-mode rejects them with `Unsupported connector config(s) with schema registry auto mode enabled`.\n\nCAUTION: The value is **case-sensitive**. Use lowercase `\"true\"`. `\"TRUE\"` (which is what some Confluent docs examples show) is silently ignored -- the connector reaches RUNNING but fails on startup with `ConfigException: Missing required configuration \"schema.registry.url\"` because no SR config gets injected.\n\nIf you'd rather wire SR manually (e.g. to use a non-default SR API key), omit `confluent.custom.schema.registry.auto` and set all four fields yourself, **and** add the SR FQDN to `confluent.custom.connection.endpoints`:\n\n[source]\n----\nconfluent.custom.connection.endpoints=\\\n  153.44.253.27:5631:TCP;\\\n  \u003cpsrc-XXXXXX\u003e.\u003cregion\u003e.\u003ccloud\u003e.confluent.cloud:443:TCP\n\nvalue.converter=io.confluent.connect.avro.AvroConverter\nvalue.converter.schema.registry.url=https://\u003cpsrc-XXXXXX\u003e.\u003cregion\u003e.\u003ccloud\u003e.confluent.cloud\nvalue.converter.basic.auth.credentials.source=USER_INFO\nvalue.converter.basic.auth.user.info=\u003cSR_KEY\u003e:\u003cSR_SECRET\u003e\n----\n\nIf you don't want Avro/SR at all, swap the value converter and drop the SR config:\n\n[source]\n----\nvalue.converter=org.apache.kafka.connect.json.JsonConverter\nvalue.converter.schemas.enable=false\n----\n\n=== Debugging failed deploys: read the app-logs topic\n\nThe user-facing failure surface of a Custom Connector is minimal -- `confluent connect cluster list` will show `FAILED` with a generic \"review the connector's common logs in the Kafka topic `clcc-XXXXXX-app-logs` to debug the issue\" message. That topic is the only useful signal.\n\nTo see what actually went wrong, consume from `\u003cconnector-id\u003e-app-logs` (it's created in your own cluster, on first connector launch) and filter by `level=ERROR`. The interesting payload is usually in the structured `exception.stacktrace` field of the log JSON, not the `message` field. Example with kcat:\n\n[source,bash]\n----\nkcat -b \u003cbootstrap\u003e -X security.protocol=SASL_SSL \\\n     -X sasl.mechanisms=PLAIN \\\n     -X sasl.username=\u003ckey\u003e -X sasl.password=\u003csecret\u003e \\\n     -t clcc-XXXXXX-app-logs -C -o end -q | \\\n  jq -r 'select(.level==\"ERROR\") | \"\\(.timestamp)\\n\\(.message)\\n\\(.exception.stacktrace // \"\")\\n---\"'\n----\n\n== Configuration\n\n[cols=\"1,1,1,3\"]\n|===\n|Property |Type |Default |Description\n\n|`ais.hosts`\n|STRING\n|_(required)_\n|Comma-separated `host:port` pairs for AIS TCP endpoints\n\n|`topic`\n|STRING\n|_(required)_\n|Kafka topic name (or topic prefix when `topic.per.type=true`)\n\n|`topic.per.type`\n|BOOLEAN\n|`false`\n|Route messages to per-type topics: `\u003ctopic\u003e.position`, `\u003ctopic\u003e.static`, `\u003ctopic\u003e.base_station`, `\u003ctopic\u003e.safety`, `\u003ctopic\u003e.aton`, `\u003ctopic\u003e.binary`, `\u003ctopic\u003e.other`\n\n|`poll.timeout.ms`\n|LONG\n|`100`\n|Max ms to spend reading per `poll()` call\n\n|`batch.max.size`\n|INT\n|`500`\n|Max records per `poll()` batch\n\n|`reconnect.backoff.initial.ms`\n|LONG\n|`1000`\n|Initial reconnect delay\n\n|`reconnect.backoff.max.ms`\n|LONG\n|`60000`\n|Max reconnect delay (doubles each failure)\n\n|`fragment.timeout.ms`\n|LONG\n|`30000`\n|Timeout for incomplete multi-sentence messages\n\n|`decode.common.only`\n|BOOLEAN\n|`true`\n|When true, only the most useful message types get full field decoding (position, identity, base station, safety, AtoN). Binary/control types get common fields + `raw_nmea` only.\n|===\n\n== Output schema\n\nEvery record includes these common fields:\n\n* `mmsi` (INT32) -- vessel identifier, also used as the record key\n* `msg_type` (INT32) -- AIS message type number\n* `receive_timestamp` (INT64) -- milliseconds since epoch from the tag block\n* `source_station` (STRING) -- receiving station ID from the tag block\n* `raw_nmea` (STRING) -- the original NMEA sentence(s), always present\n\nPosition reports (types 1, 2, 3, 18, 19, 27) add: `latitude`, `longitude`, `speed_over_ground`, `course_over_ground`, `true_heading`, `nav_status`, `nav_status_text`, `rate_of_turn`, `timestamp_second`.\n\nStatic/voyage data (types 5, 24) adds: `imo_number`, `callsign`, `ship_name`, `ship_type`, `ship_type_text`, `dimension_to_bow/stern/port/starboard`, `draught`, `destination`, `eta`.\n\nSee the source for the full field list per message type.\n\nInvalid sentinel values (latitude 91.0, longitude 181.0, heading 511, etc.) are converted to `null`.\n\nThe Avro schema is registered automatically in Schema Registry. You can inspect it at http://localhost:8081/subjects/ais-value/versions/latest.\n\n=== Subjects registered in Schema Registry\n\nWith the default `TopicNameStrategy`, the subject names follow the topic names:\n\nIn single-topic mode (`topic.per.type=false`, default), one subject is registered:\n\n* `ais-value` -- Avro record `net.rmoff.connect.ais.AisValue`, the flat schema containing every field across all message types\n\nIn per-type mode (`topic.per.type=true`), one subject per category is registered, each with its own narrower Avro record:\n\n* `ais.position-value` -- `Position`\n* `ais.static-value` -- `Static`\n* `ais.base_station-value` -- `BaseStation`\n* `ais.safety-value` -- `Safety`\n* `ais.aton-value` -- `AtoN`\n* `ais.binary-value` -- `Binary`\n* `ais.other-value` -- `Other`\n\n(Replace `ais` with whatever prefix you've set in the `topic` config.)\n\nRecords also have a key (the INT32 MMSI). Whether a `*-key` subject is registered depends on which key converter your worker / connector config uses: the local `docker-compose.yml` here configures `AvroConverter` for keys (so you'll also see `ais-key` etc.), whereas the supplied Confluent Cloud templates use `StringConverter` for keys (so no key subjects are registered).\n\n== Headers\n\nEvery record carries these Kafka headers for routing/filtering without deserialization:\n\n* `ais.msg_type` -- the message type as a string (e.g., `\"1\"`, `\"5\"`, `\"18\"`)\n* `ais.source_station` -- the receiving station ID\n\n== AIS data sources\n\nThe default endpoint in the quickstart is the Norwegian Coastal Administration's public AIS feed. It streams roughly 15 messages per second of live vessel traffic around the Norwegian coast. No authentication required.\n\nOther public AIS TCP feeds exist -- search for \"AIS TCP feed\" or check https://www.aishub.net/[AISHub].\n\n== How it works\n\nThis is a TCP stream source connector, which is a bit different from the typical database/API source pattern:\n\n* The TCP connection is opened in `start()` and persists across `poll()` calls\n* `poll()` reads whatever's buffered on the socket, parses NMEA sentences, decodes AIS messages, and returns SourceRecords\n* There's no replay -- if the connector is down, messages during downtime are lost. The data source doesn't buffer.\n* Reconnection with exponential backoff is the main failure-handling mechanism\n* `SO_TIMEOUT` on the socket prevents `poll()` from blocking forever on a stalled connection\n* `stop()` closes the socket from a different thread, which unblocks any pending read\n\nThe connector declares `ExactlyOnceSupport.SUPPORTED`. AIS NMEA sentences are independently parseable and MMSI-keyed, the per-poll transaction boundary is safe, and each task tracks a `{connection_epoch, message_count}` offset per host. Practically this means \"no duplicate writes within a connection\" -- not end-to-end exactly-once, because the upstream TCP feed itself has no replay (messages during a connector outage are simply lost).\n\nAIS message decoding is handled by https://github.com/dma-ais/AisLib[AisLib] from the Danish Maritime Authority.\n\n== Building\n\nRequires Java 11+ and Maven.\n\n[source,bash]\n----\nmvn clean package\n----\n\nThe build produces two usable artifacts:\n\n* `target/kafka-connect-ais-0.1.0-SNAPSHOT.jar` -- the shaded fat JAR. Drop it into a Connect worker's `plugin.path` directory, or upload it to Confluent Cloud via `confluent connect custom-plugin create`.\n* `target/components/packages/rmoff-kafka-connect-ais-0.1.0-SNAPSHOT.zip` -- a Confluent component archive with a `manifest.json` (produced by the `kafka-connect-maven-plugin`). This is the canonical https://www.confluent.io/hub/[Confluent Hub] / Marketplace format and is also accepted by `confluent connect custom-plugin create --plugin-file ...`.\n\n== Credits\n\nDirected by @rmoff, built with https://claude.ai/[Claude Code] using the `kafka-connect` Claude Code skill from @rmoff.\n\nAIS message decoding by https://github.com/dma-ais/AisLib[AisLib] (Danish Maritime Authority, Apache 2.0 license).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frmoff%2Fkafka-connect-ais","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frmoff%2Fkafka-connect-ais","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frmoff%2Fkafka-connect-ais/lists"}