{"id":13491491,"url":"https://github.com/twitter/GraphJet","last_synced_at":"2025-03-28T08:33:16.707Z","repository":{"id":10945226,"uuid":"53354223","full_name":"twitter/GraphJet","owner":"twitter","description":"GraphJet is a real-time graph processing library.","archived":false,"fork":false,"pushed_at":"2023-04-10T11:24:19.000Z","size":866,"stargazers_count":714,"open_issues_count":25,"forks_count":111,"subscribers_count":39,"default_branch":"master","last_synced_at":"2024-10-31T05:34:48.601Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/twitter.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null}},"created_at":"2016-03-07T19:51:18.000Z","updated_at":"2024-10-29T00:42:11.000Z","dependencies_parsed_at":"2024-01-16T09:18:42.951Z","dependency_job_id":null,"html_url":"https://github.com/twitter/GraphJet","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twitter%2FGraphJet","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twitter%2FGraphJet/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twitter%2FGraphJet/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/twitter%2FGraphJet/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/twitter","download_url":"https://codeload.github.com/twitter/GraphJet/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245996754,"owners_count":20707310,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-07-31T19:00:57.463Z","updated_at":"2025-03-28T08:33:16.076Z","avatar_url":"https://github.com/twitter.png","language":"Java","funding_links":[],"categories":["Java","Open Source Projects","Candidate Generators"],"sub_categories":["Data Structures, Systems and Frameworks","GraphJet"],"readme":"# GraphJet\n\n[![Build Status](https://travis-ci.org/twitter/GraphJet.svg?branch=master)](https://travis-ci.org/twitter/GraphJet)\n\nGraphJet is a real-time graph processing library written in Java that\nmaintains a full graph index over a sliding time window in memory on a\nsingle server. This index supports a variety of graph algorithms\nincluding personalized recommendation algorithms based on\ncollaborative filtering. These algorithms power a variety of real-time\nrecommendation services within Twitter, notably content (tweets/URLs)\nrecommendations that require collaborative filtering over a\nheterogeneous, rapidly evolving graph.\n\nGraphJet is able to support rapid ingestion of edges in an evolving\ngraph while concurrently serving lookup queries through a combination\nof compact edge encoding and a dynamic memory allocation scheme. Each\nGraphJet server can ingest up to one million graph edges per second,\nand in steady state, computes up to 500 recommendations per second,\nwhich translates into several million edge read operations per\nsecond. More information about the internals of GraphJet can be found\nin the\n[VLDB'16 paper](http://www.vldb.org/pvldb/vol9/p1281-sharma.pdf).\n\n# Quick Start and Example\n\nAfter cloning the repo, build as follows (for the impatient, use option `-DskipTests` to skip tests):\n\n```\n$ mvn package install\n```\n\nGraphJet includes a demo that reads from the Twitter public sample stream using the [Twitter4j library](http://twitter4j.org/en/) and maintains two separate in-memory bipartite graphs:\n\n+ A bipartite graph of user-tweet interactions. The left-hand side vertices represent users, the right-hand side vertices represent tweets, and the edges represent tweet posts and retweets.\n+ A bipartite graph of tweet-hashtag contents. The left-hand side vertices represent tweets, the right-hand side vertices represent hashtags, and the edges represent content association (e.g., a tweet contains a hashtag).\n\nTo run the demo, create a file called `twitter4j.properties` in the GraphJet base directory with your Twitter credentials (replace `xxxx` with actual credentials):\n\n```\noauth.consumerKey=xxxx\noauth.consumerSecret=xxxx\noauth.accessToken=xxxx\noauth.accessTokenSecret=xxxx\n```\n\nFor obtaining the credentials, see [documentation on obtaining Twitter OAuth tokens](https://dev.twitter.com/oauth/overview/application-owner-access-tokens). The public sample stream is available to registered users, see [documentation about Twitter streaming APIs](https://dev.twitter.com/streaming/overview) for more details.\n\nOnce you've built GraphJet, start the demo as follows:\n\n```\n$ mvn exec:java -pl graphjet-demo -Dexec.mainClass=com.twitter.graphjet.demo.TwitterStreamReader\n```\n\nOnce the demo starts up, it begins ingesting the Twitter public sample stream. The program will print out a sequence of status messages indicating the internal state of the user-tweet graph and the tweet-hashtag graph.\n\nYou can interact with the graph via a REST API, running on port 8888 by default; use ` -Dexec.args=\"-port xxxx\"` to specify a different port.\n\nThe following calls are available to query the state of the in-memory bipartite graph of user-tweet interactions:\n\n+ `userTweetGraph/topTweets`: queries for the top tweets in terms of interactions (retweets). Use parameter `k` to specify number of results to return (default ten). Sample invocation:\n\n```\ncurl http://localhost:8888/userTweetGraph/topTweets?k=5\n```\n\n+ `userTweetGraph/topUsers`: queries for the top users in terms of interactions (retweets).  Use parameter `k` to specify number of results to return (default ten). Sample invocation:\n\n```\ncurl http://localhost:8888/userTweetGraph/topUsers?k=5\n```\n\n+ `userTweetGraphEdges/tweets`: queries for the edges incident to a particular tweet in the user-tweet graph, i.e., users who have interacted with the tweet. Use parameter `id` to specify tweetId (e.g., from `userTweetGraph/topTweets` above). Sample invocation:\n\n```\ncurl http://localhost:8888/userTweetGraphEdges/tweets?id=xxx\n```\n\n+ `userTweetGraphEdges/users`: queries for the edges incident to a particular user in the user-tweet graph, i.e., tweets the user interacted with. Use parameter `id` to specify userId (e.g., from `userTweetGraph/topUsers` above). Sample invocation:\n\n```\ncurl http://localhost:8888/userTweetGraphEdges/users?id=xxx\n```\n\nThe following calls are available to query the state of the in-memory bipartite graph of tweet-hashtag contents:\n\n+ `tweetHashtagGraph/topTweets`: queries for the top tweets in terms of hashtags. Use parameter `k` to specify number of results to return (default ten). Sample invocation:\n\n```\ncurl http://localhost:8888/tweetHashtagGraph/topTweets?k=5\n```\n\n+ `tweetHashtagGraph/topHashtags`: queries for the top hashtags in terms of tweets.  Use parameter `k` to specify number of results to return (default ten). Sample invocation:\n\n```\ncurl http://localhost:8888/tweetHashtagGraph/topHashtags?k=5\n```\n\n+ `tweetHashtagGraphEdges/tweets`: queries for the edges incident to a particular tweet in the tweet-hashtag graph, i.e., hashtags contained in the tweet. Use parameter `id` to specify tweetId (e.g., from `tweetHashtagGraph/topTweets` above). Sample invocation:\n\n```\ncurl http://localhost:8888/tweetHashtagGraphEdges/tweets?id=xxx\n```\n\n+ `tweetHashtagGraphEdges/hashtags`: queries for the edges incident to a particular hashtag in the tweet-hashtag graph, i.e., tweets the given hashtag is contained in. Use parameter `id` to specify hashtagId (e.g., from `tweetHashtagGraph/topHashtags` above). Sample invocation:\n\n```\ncurl http://localhost:8888/tweetHashtagGraphEdges/hashtags?id=xxx\n```\n\nThe demo program illustrates collaborative filtering via similarity\nqueries on the tweet-hashtag graph. Note that the demo does not offer\npersonalized recommendation algorithms on the user-tweet graph (as is\ndeployed inside Twitter) because the public sample stream API is too\nsparse in terms of interactions to give good results. The following\nendpoint for similarity queries offers related hashtags given an input\nhashtag:\n\n+ `similarHashtags`: computes similar hashtag to the input hashtag based on real time data. Use parameter `hashtag` to specify hashtag (e.g., from `tweetHashtagGraph/topHashtags` above). Sample invocation:\n\n```\ncurl http://localhost:8888/similarHashtags?hashtag=trump\u0026k=10\n```\n\n# License\n\nCopyright 2016 Twitter, Inc.\n\nLicensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwitter%2FGraphJet","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftwitter%2FGraphJet","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftwitter%2FGraphJet/lists"}