https://github.com/conduktor/kafka-connect-wikimedia
https://github.com/conduktor/kafka-connect-wikimedia
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/conduktor/kafka-connect-wikimedia
- Owner: conduktor
- License: apache-2.0
- Created: 2022-03-14T11:55:34.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-03-14T12:21:14.000Z (over 4 years ago)
- Last Synced: 2023-03-04T00:48:36.078Z (over 3 years ago)
- Language: Java
- Size: 133 KB
- Stars: 17
- Watchers: 2
- Forks: 19
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-kafka-connect - conduktor/kafka-connect-wikimedia - Wikimedia recent changes stream source connector (Data Generators & Testing / FreeSWITCH)
README
# Kafka Connect Wikimedia
[](https://github.com/conduktor/kafka-connect-wikimedia/actions/workflows/ci.yml)
A Kafka Connect Source connector that streams real-time events from [Wikimedia's EventStreams](https://stream.wikimedia.org/v2/stream/recentchange) into Kafka topics.

## Features
- Streams Wikimedia recent changes (edits, new pages, etc.) in real-time
- Configurable reconnection handling
- Custom User-Agent header support (required by Wikimedia API)
- Dead letter queue support for error handling
## Installation
Download the JAR from [Releases](https://github.com/conduktor/kafka-connect-wikimedia/releases) and place it in your Kafka Connect plugins directory.
## Configuration
| Property | Required | Default | Description |
|----------|----------|---------|-------------|
| `topic` | Yes | - | Kafka topic to publish events to |
| `url` | Yes | - | Wikimedia EventStream URL |
| `reconnect.duration` | No | `3000` | Reconnection delay in milliseconds |
| `http.headers.user.agent` | No | `WikimediaKafkaConnector/1.0` | User-Agent header for HTTP requests |
### Example Configuration
```properties
name=wikimedia-source-connector
tasks.max=1
connector.class=io.conduktor.demos.kafka.connect.wikimedia.WikimediaConnector
topic=wikimedia.recentchange
url=https://stream.wikimedia.org/v2/stream/recentchange
http.headers.user.agent=MyApp/1.0 (contact@example.com)
```
See [connector/wikimedia.properties](connector/wikimedia.properties) for a complete example.
## Building from Source
Requires Java 11+.
```bash
./gradlew shadowJar
```
Output: `build/libs/kafka-connect-wikimedia--all.jar`
## Local Development
A Docker Compose setup is provided for local testing.
### Quick Start
```bash
# Build the connector
./gradlew shadowJar
# Start Kafka + Connect
docker compose up -d
# Wait for Connect to be ready (~20s)
curl http://localhost:8083/connector-plugins | grep -i wikimedia
# Deploy the connector
curl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d '{
"name": "wikimedia-source",
"config": {
"connector.class": "io.conduktor.demos.kafka.connect.wikimedia.WikimediaConnector",
"tasks.max": "1",
"topic": "wikimedia.recentchange",
"url": "https://stream.wikimedia.org/v2/stream/recentchange"
}
}'
# Check connector status
curl http://localhost:8083/connectors/wikimedia-source/status
# Consume messages
docker exec kafka /opt/kafka/bin/kafka-console-consumer.sh \
--bootstrap-server localhost:29092 \
--topic wikimedia.recentchange \
--from-beginning \
--max-messages 5
# Cleanup
docker compose down
```
### Endpoints
| Service | URL |
|---------|-----|
| Kafka (external) | `localhost:9092` |
| Kafka Connect REST API | `http://localhost:8083` |
## Related Resources
- [Awesome Kafka Connect](https://github.com/conduktor/awesome-kafka-connect) - Curated list of Kafka Connect connectors
- [Wikimedia EventStreams](https://wikitech.wikimedia.org/wiki/Event_Platform/EventStreams) - Documentation
## License
Apache License 2.0
---
Built by [Conduktor](https://www.conduktor.io) - Making Kafka accessible to everyone.