{"id":13628111,"url":"https://github.com/claws-lab/jodie","last_synced_at":"2025-04-17T00:33:16.945Z","repository":{"id":36150254,"uuid":"193973813","full_name":"claws-lab/jodie","owner":"claws-lab","description":"A PyTorch implementation of ACM SIGKDD 2019 paper \"Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks\"","archived":false,"fork":false,"pushed_at":"2024-05-03T19:46:34.000Z","size":35,"stargazers_count":340,"open_issues_count":16,"forks_count":81,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-05-21T17:21:08.716Z","etag":null,"topics":["dynamic-networks","embedding-trajectories","embeddings","kdd2019","machine-learning","network-embedding","representation-learning","temporal-network"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/claws-lab.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-06-26T20:33:35.000Z","updated_at":"2024-08-01T22:36:02.420Z","dependencies_parsed_at":"2024-08-01T22:51:49.507Z","dependency_job_id":null,"html_url":"https://github.com/claws-lab/jodie","commit_stats":null,"previous_names":["srijankr/jodie"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claws-lab%2Fjodie","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claws-lab%2Fjodie/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claws-lab%2Fjodie/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/claws-lab%2Fjodie/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/claws-lab","download_url":"https://codeload.github.com/claws-lab/jodie/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223735267,"owners_count":17194073,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dynamic-networks","embedding-trajectories","embeddings","kdd2019","machine-learning","network-embedding","representation-learning","temporal-network"],"created_at":"2024-08-01T22:00:45.791Z","updated_at":"2024-11-08T18:31:22.443Z","avatar_url":"https://github.com/claws-lab.png","language":"Python","funding_links":[],"categories":["Uncategorized"],"sub_categories":["Uncategorized"],"readme":"## Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks (ACM SIGKDD 2019)\n#### Authors: [Srijan Kumar](http://cs.stanford.edu/~srijan), [Xikun Zhang](), [Jure Leskovec](https://cs.stanford.edu/people/jure/)\n\u003c!--#### [Project website with links to the datasets](http://snap.stanford.edu/jodie/)--\u003e\n#### [Link to the paper](https://cs.stanford.edu/~srijan/pubs/jodie-kdd2019.pdf)\n#### [Link to the slides](https://cs.stanford.edu/~srijan/pubs/jodie-kdd2019-slides.pdf)\n#### [Brief video explanation](https://www.youtube.com/watch?v=ItBmU8681j0)\n\n### Introduction\nTemporal networks are ubiquitous in e-commerce (users clicking, purchasing, saving items), social networks (users talking with one another and interacting with content), finance (transactions between users and merchants), and education (students taking courses). In all domains, the entities (users, items, content) can be represented as nodes and their interaction as edges. \n\n**JODIE** is a representation learning framework for all nodes in temporal networks. Given a sequence of node actions, JODIE learns a dynamic embedding trajectory for every node (as opposed to a static embedding). These trajectories are useful for downstream machine learning tasks, such as link prediction, node classification, and clustering. JODIE is fast and makes accurate predictions about future interactions and anomaly detection.\n\nIn this paper, JODIE has been used for two broad category of tasks:\n1. **Temporal Link Prediction**: Which two nodes will interact next? Example applications are recommender systems and modeling network evolution.\n2. **Temporal Node Classification**: When does the state of an node change from normal to abnormal? Example applications are anomaly detection, ban prediction, dropout and churn prediction, and fraud and account compromise.\n\n### Motivation \nTemporal networks provide an expressive language to represent time-evolving and dynamic interactions between nodes. Think of users interacting (click, purchase, view) with items. Representation learning provides a powerful tool to model and reason on  networks. However, as networks evolve over time, a single (static) embedding becomes insufficient to represent the changing behavior of the entities and the dynamics of the network.\n\n![JODIE at work](http://snap.stanford.edu/jodie/jodie-example.png)\n\nJODIE is a representation learning framework that embeds every node in a Euclidean space and their evolution is modeled by an embedding trajectory in this space. JODIE learns to forecast the embedding trajectories into the future to make predictions about the entities and their interactions. These trajectories can be trained for downstream tasks, such as recommendations and predictions. JODIE is scalable to large networks by employing a novel data batching algorithm, called t-Batch, that creates batches of independent edges that can be processed simulaneously.\n\nIf you make use of this code, the JODIE algorithm, the T-batch algorithm, or the datasets in your work, please cite the following paper:\n```\n @inproceedings{kumar2019predicting,\n\ttitle={Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks},\n\tauthor={Kumar, Srijan and Zhang, Xikun and Leskovec, Jure},\n\tbooktitle={Proceedings of the 25th ACM SIGKDD international conference on Knowledge discovery and data mining},\n\tyear={2019},\n\torganization={ACM}\n }\n```\n\n### Short Video Explanation of JODIE (External Link to YouTube)\n\n[![JODIE short video](https://cs.stanford.edu/~srijan/img/jodie-thumbnail-youtube.png)](https://www.youtube.com/watch?v=ItBmU8681j0)\n\n\n### Datasets \nLinks to datasets used in the paper:\n- [Reddit](http://snap.stanford.edu/jodie/reddit.csv)\n- [Wikipedia](http://snap.stanford.edu/jodie/wikipedia.csv)\n- [LastFM](http://snap.stanford.edu/jodie/lastfm.csv)\n- [MOOC](http://snap.stanford.edu/jodie/mooc.csv)\n\n\n### Dataset format\n\nThe networks are stored under the `data/` folder, one file per network. The filename should be `\u003cnetwork\u003e.csv`.\n\nThe network should be in the following format:\n- One line per interaction/edge.\n- Each line should be: *user, item, timestamp, state label, comma-separated array of features*.\n- First line is the network format. \n- *User* and *item* fields can be alphanumeric.\n- *Timestamp* should be in cardinal format (not in datetime).\n- *State label* should be 1 whenever the user state changes, 0 otherwise. If there are no state labels, use 0 for all interactions.\n- *Feature list* can be as long as desired. It should be atleast 1 dimensional. If there are no features, use 0 for all interactions.\n\nFor example, the first few lines of a dataset can be:\n```\nuser,item,timestamp,state_label,comma_separated_list_of_features\n0,0,0.0,0,0.1,0.3,10.7\n2,1,6.0,0,0.2,0.4,0.6\n5,0,41.0,0,0.1,15.0,0.6\n3,2,49.0,1,100.7,0.8,0.9\n```\n\n\n### Code setup and Requirements\n\nRecent versions of PyTorch, numpy, sklearn, tqdm, and gpustat. You can install all the required packages using the following command:\n```\n    $ pip install -r requirements.txt\n```\n\nTo initialize the directories needed to store data and outputs, use the following command. This will create `data/`, `saved_models/`, and `results/` directories.\n```\n    $ chmod +x initialize.sh\n    $ ./initialize.sh\n```\n\nTo download the datasets used in the paper, use the following command. This will download four datasets under the `data/` directory: `reddit.csv`, `wikipedia.csv`, `mooc.csv`, and `lastfm.csv`.\n```\n    $ chmod +x download_data.sh\n    $ ./download_data.sh\n```\n\n### Running the JODIE code\n\nTo train the JODIE model using the `data/\u003cnetwork\u003e.csv` dataset, use the following command. This will save a model for every epoch in the `saved_models/\u003cnetwork\u003e/` directory.\n```\n   $ python jodie.py --network \u003cnetwork\u003e --model jodie --epochs 50\n```\n\nThis code can be given the following command-line arguments:\n1. `--network`: this is the name of the file which has the data in the `data/` directory. The file should be named `\u003cnetwork\u003e.csv`. The dataset format is explained below. This is a required argument. \n2. `--model`: this is the name of the model and the file where the model will be saved in the `saved_models/` directory. Default value: jodie.\n3. `--gpu`: this is the id of the gpu where the model is run. Default value: -1 (to run on the GPU with the most free memory).\n4. `--epochs`: this is the maximum number of interactions to train the model. Default value: 50.\n5. `--embedding_dim`: this is the number of dimensions of the dynamic embedding. Default value: 128.\n6. `--train_proportion`: this is the fraction of interactions (from the beginning) that are used for training. The next 10% are used for validation and the next 10% for testing. Default value: 0.8\n7. `--state_change`: this is a boolean input indicating if the training is done with state change prediction along with interaction prediction. Default value: True.\n\n### Evaluate the model\n\n#### Interaction prediction\n\nTo evaluate the performance of the model for the interaction prediction task, use the following command. The command iteratively evaluates the performance for all epochs of the model and outputs the final test performance. \n```\n    $ chmod +x evaluate_all_epochs.sh\n    $ ./evaluate_all_epochs.sh \u003cnetwork\u003e interaction\n```\n\nTo evaluate the trained model's performance for predicting interactions in **only one epoch**, use the following command. This will output the performance numbers to the `results/interaction_prediction_\u003cnetwork\u003e.txt` file.\n```\n    $ python evaluate_interaction_prediction.py --network \u003cnetwork\u003e --model jodie --epoch 49\n```\n\nThe file `get_final_performance_numbers.py` reads all the outputs of each epoch, stored in the `results/` folder, and finds the best validation epoch. \n\n#### State change prediction\n\nTo evaluate the performance of the model for the state change prediction task, use the following command. The command iteratively evaluates the performance for all epochs of the model and outputs the final test performance. \n```\n    $ chmod +x evaluate_all_epochs.sh\n    $ ./evaluate_all_epochs.sh \u003cnetwork\u003e state\n```\nTo evaluate the trained model's performance for predicting state change in **only one epoch**, use the following command. This will output the performance numbers to the `results/state_change_prediction_\u003cnetwork\u003e.txt` file.\n```\n   $ python evaluate_state_change_prediction.py --network \u003cnetwork\u003e --model jodie --epoch 49\n```\n\n### Run the T-Batch code\n\nTo create T-Batches of a temporal network, use the following command. This will save a file with T-Batches in the `results/tbatches_\u003cnetwork\u003e.csv` file. Note that the entire input will be converted to T-Batches. To convert only training data, please input a file with only the training interactions. \n\n```\n   $ python tbatch.py --network \u003cnetwork\u003e\n```\n\nThis code can be given the following command-line arguments:\n1. `--network`: this is the name of the file which has the data in the `data/` directory. The file should be named `\u003cnetwork\u003e.csv`. The dataset format is explained below. This is a required argument. \n\n\n### References \n*Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks*. Srijan Kumar, Xikun Zhang, Jure Leskovec. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2019. \n\nIf you make use of this code, the JODIE algorithm, the T-batch algorithm, or the datasets in your work, please cite the following paper:\n```\n @inproceedings{kumar2019predicting,\n\ttitle={Predicting Dynamic Embedding Trajectory in Temporal Interaction Networks},\n\tauthor={Kumar, Srijan and Zhang, Xikun and Leskovec, Jure},\n\tbooktitle={Proceedings of the 25th ACM SIGKDD international conference on Knowledge discovery and data mining},\n\tyear={2019},\n\torganization={ACM}\n }\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclaws-lab%2Fjodie","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fclaws-lab%2Fjodie","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fclaws-lab%2Fjodie/lists"}