Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/mneedham/kafka-connect-neo4j
https://github.com/mneedham/kafka-connect-neo4j
Last synced: 12 days ago
JSON representation
- Host: GitHub
- URL: https://github.com/mneedham/kafka-connect-neo4j
- Owner: mneedham
- Created: 2019-05-28T17:31:56.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-02-01T19:37:55.000Z (almost 5 years ago)
- Last Synced: 2024-04-14T09:10:04.757Z (9 months ago)
- Size: 400 KB
- Stars: 11
- Watchers: 3
- Forks: 2
- Open Issues: 1
-
Metadata Files:
- Readme: README.adoc
Awesome Lists containing this project
README
= Kafka Connect - Neo4j
This repository shows how to load Tweets from the twint library into Kafka, and then from Kafka into Neo4j using the Kafka Connect Sink.
== Launch Kafka Cluster + Neo4j
[source, bash]
----
git clone [email protected]:mneedham/kafka-connect-neo4j.git && cd kafka-connect-neo4j
docker-compose up
----== Dependencies
[source, bash]
----
pip install --upgrade -e git+https://github.com/twintproject/twint.git@origin/master#egg=twint
pip install confluent-kafka[avro]
----== Loading Data into Kafka
[source, python]
----
import twint
import sys
import jsonfrom confluent_kafka import avro
from confluent_kafka.avro import AvroProducervalue_schema_str = """
{
"namespace": "my.test",
"name": "value",
"type": "record",
"fields" : [
{ "name": "id", "type": "long" },
{ "name": "tweet", "type": "string" },
{ "name": "datetime", "type": "long" },
{ "name": "username", "type": "string" },
{ "name": "user_id", "type": "long" },
{ "name": "hashtags", "type": {"type": "array", "items": "string"} }
]
}
"""key_schema_str = """
{
"namespace": "my.test",
"name": "key",
"type": "record",
"fields" : [
{
"name" : "name",
"type" : "string"
}
]
}
"""kafka_broker = 'localhost:9092'
schema_registry = 'http://localhost:8081'value_schema = avro.loads(value_schema_str)
key_schema = avro.loads(key_schema_str)producer = AvroProducer({
'bootstrap.servers': kafka_broker,
'schema.registry.url': schema_registry
}, default_key_schema=key_schema, default_value_schema=value_schema)module = sys.modules["twint.storage.write"]
def Json(obj, config):
tweet = obj.__dict__
print(tweet)
producer.produce(topic='tweets10', value=tweet, key={"name": "Key"})
producer.flush()module.Json = Json
c = twint.Config()
c.Search = "neo4j OR \"graph database\" OR \"graph databases\" OR graphdb OR graphconnect OR @neoquestions OR @Neo4jDE OR @Neo4jFr OR neotechnology"
c.Store_json = True
c.Custom["user"] = ["id", "tweet", "user_id", "username", "hashtags", "mentions"]
c.User_full = True
c.Output = "tweets.json"
c.Since = "2019-05-20"
c.Hide_output = Truetwint.run.Search(c)
----== Create Neo4j Sink Connector
```
curl -i -X POST -H "Accept:application/json" \
-H "Content-Type:application/json" http://localhost:8083/connectors/ \
-d '{
"name": "connect.sink.neo4j.tweets",
"config": {
"topics": "tweets10",
"connector.class": "streams.kafka.connect.sink.Neo4jSinkConnector",
"neo4j.server.uri": "bolt://neo4j:7687",
"neo4j.authentication.basic.username": "neo4j",
"neo4j.authentication.basic.password": "neo",
"neo4j.topic.cypher.tweets10": "WITH event AS data MERGE (t:Tweet {id: data.id}) SET t.text = data.tweet, t.createdAt = datetime({epochmillis:data.datetime}) MERGE (u:User {username: data.username}) SET u.id = data.user_id MERGE (u)-[:POSTED]->(t) FOREACH (ht IN data.hashtags | MERGE (hashtag:HashTag {value: ht}) MERGE (t)-[:HAS_HASHTAG]->(hashtag))"
}
}'
```== Browse Data in Neo4j
Navigate to `http://localhost:7474` and paste the following into the query pane:
[source, cypher]
----
MATCH path = (u:User)-[:POSTED]->(t:Tweet)-[:HAS_HASHTAG]->(ht)
RETURN path
LIMIT 100
----image::graph.png[]