{"id":11563743,"url":"https://github.com/alasdairtran/radflow","last_synced_at":"2026-02-12T09:12:00.753Z","repository":{"id":41584286,"uuid":"226467329","full_name":"alasdairtran/radflow","owner":"alasdairtran","description":"[TheWebConf 2021] Radflow: A Recurrent, Aggregated, and Decomposable Model for Networks of Time Series","archived":false,"fork":false,"pushed_at":"2023-03-05T20:29:02.000Z","size":2052,"stargazers_count":32,"open_issues_count":5,"forks_count":3,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-22T07:40:15.502Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alasdairtran.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-07T06:34:28.000Z","updated_at":"2024-08-06T06:02:49.000Z","dependencies_parsed_at":"2023-01-24T12:30:48.225Z","dependency_job_id":null,"html_url":"https://github.com/alasdairtran/radflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alasdairtran%2Fradflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alasdairtran%2Fradflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alasdairtran%2Fradflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alasdairtran%2Fradflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alasdairtran","download_url":"https://codeload.github.com/alasdairtran/radflow/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":235139112,"owners_count":18942110,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-06-23T05:59:19.977Z","updated_at":"2025-10-03T14:31:00.039Z","avatar_url":"https://github.com/alasdairtran.png","language":"Python","funding_links":[],"categories":["时间序列"],"sub_categories":["网络服务_其他"],"readme":"# Radflow: A Recurrent, Aggregated, and Decomposable Model for Networks of Time Series\n\n![Teaser](figures/teaser.png)\n\nThis repository contains the code to reproduce the results in our TheWebConf\n2021 paper [Radflow: A Recurrent, Aggregated, and Decomposable Model for\nNetworks of Time Series](https://arxiv.org/abs/2102.07289). We propose a new\nmodel for networks of time series that influence each other. Graph structures\namong time series are found in diverse domains, such as web traffic influenced\nby hyperlinks, product sales influenced by recommendation, or urban transport\nvolume influenced by road networks and weather. There has been recent progress\nin graph modeling and in time series forecasting, respectively, but an\nexpressive and scalable approach for a network of series does not yet exist.\n\nWe introduce Radflow, a novel model that embodies three key ideas: a recurrent\nneural network to obtain node embeddings that depend on time, the aggregation\nof the flow of influence from neighboring nodes with multi-head attention, and\nthe multi-layer decomposition of time series. Radflow naturally takes into\naccount dynamic networks where nodes and edges change over time, and it can be\nused for prediction and data imputation tasks. On real-world datasets ranging\nfrom a few hundred to a few hundred thousand nodes, we observe that Radflow\nvariants are the best performing model across a wide range of settings. The\nrecurrent component in Radflow also outperforms N-BEATS, the state-of-the-art\ntime series model. We show that Radflow can learn different trends and seasonal\npatterns, that it is robust to missing nodes and edges, and that correlated\ntemporal patterns among network neighbors reflect influence strength.\n\nWe curate WikiTraffic, the largest dynamic network of time series with 366K\nnodes and 22M time-dependent links spanning five years. This dataset provides\nan open benchmark for developing models in this area, with applications that\ninclude optimizing resources for the web. More broadly, Radflow has the\npotential to improve forecasts in correlated time series networks such as the\nstock market, and impute missing measurements in geographically dispersed\nnetworks of natural phenomena.\n\nPlease cite with the following BibTeX:\n\n```raw\n@InProceedings{Tran2021Radflow,\n  author = {Tran, Alasdair and Mathews, Alexander and Ong, Cheng Soon and Xie, Lexing},\n  title = {Radflow: A Recurrent, Aggregated, and Decomposable Model for Networks of Time Series},\n  year = {2021},\n  publisher = {Association for Computing Machinery},\n  url = {https://doi.org/10.1145/3442381.3449945},\n  booktitle = {Proceedings of The Web Conference 2021}\n}\n```\n\n## Getting Started\n\n```sh\nconda env create -f conda.yaml\nconda activate radflow\npython -m ipykernel install --user --name radflow --display-name \"radflow\"\npython setup.py develop\n\n# Install apex\ngit submodule init lib/apex \u0026\u0026 git submodule update --init lib/apex\ncd lib/apex\npip install -v --no-cache-dir --global-option=\"--pyprof\" --global-option=\"--cpp_ext\" --global-option=\"--cuda_ext\" ./\ncd ../..\n\n# Install PyTorch Geometric\npip install -U torch-scatter==latest+cu102 -f https://pytorch-geometric.com/whl/torch-1.6.0.html\npip install -U torch-sparse==latest+cu102 -f https://pytorch-geometric.com/whl/torch-1.6.0.html\npip install -U torch-cluster==latest+cu102 -f https://pytorch-geometric.com/whl/torch-1.6.0.html\npip install -U torch-spline-conv==latest+cu102 -f https://pytorch-geometric.com/whl/torch-1.6.0.html\npip install -U torch-geometric\n```\n\n## Preparing the Datasets\n\nYou can download all the pre-processed datasets and pretrained models as\nfollows:\n\n```sh\n# Datasets (70GB)\nwget --continue https://object-store.rc.nectar.org.au/v1/AUTH_c0e4d64401cf433fb0260d211c3f23f8/radflow/data.tar.gz\ntar -zxvf data.tar.gz\n\n# Pretrained models and results (123GB)\nwget --continue https://object-store.rc.nectar.org.au/v1/AUTH_c0e4d64401cf433fb0260d211c3f23f8/radflow/expt.tar.gz\ntar -zxvf expt.tar.gz\n```\n\nOr if the archives are too big, you can also browse the individual directories\nand files at: https://cloudstor.aarnet.edu.au/plus/s/wQbOswE7qi50mci\n\nFinally, the steps provided below are for constructing the dataset from\nscratch:\n\n```sh\n# Start an empty mongodb database\nmongod --bind_ip_all --dbpath data/mongodb --wiredTigerCacheSizeGB 10\n\n# Process vevo data\npython scripts/prepare_vevo_network.py\n\n# Download wiki dump. This takes about three days.\npython scripts/download_wikidump.py\n\n# With wiki dump, we get 668 files. Each file has on average 290M lines.\n# If we use a single thread (no parallelization), it takes between 3-7 hours\n# to go through each file. The following scripts construct a mongo database\n# for the entire wiki graph. This takes about 40 hours.\npython scripts/extract_graph.py --dump /data4/u4921817/radflow/data/wikidump --host dijkstra --n-jobs 24 --total 232 --split 0 # braun\npython scripts/extract_graph.py --dump /data4/u4921817/radflow/data/wikidump --host dijkstra --n-jobs 20 --total 232 --split 1 # cray\npython scripts/extract_graph.py --dump /data4/u4921817/radflow/data/wikidump --host dijkstra --n-jobs 20 --total 232 --split 2 # cray\n\n# Remove duplicate titles. Generate a cache title2pageid.pkl that maps\n# the title to the original page id. We also reindex the page IDs, taking 3h.\n# We end up with 17,380,550 unqiue IDs/titles.\npython scripts/extract_graph.py --reindex\n\n# Get page view counts directly from wiki API. Takes around 3 days.\npython scripts/get_traffic.py -m localhost -b 0 -t 3 # dijkstra\npython scripts/get_traffic.py -m dijkstra -b 1 -t 3 # cray\npython scripts/get_traffic.py -m dijkstra -b 2 -t 3 # braun\n\n# Store wiki graph in hdf5\npython scripts/extract_wiki_subgraph.py\n\ndocker build -t alasdairtran/radflow .\ndocker push alasdairtran/radflow\n\n# On the server with GPU\ndocker build -t alasdairtran/radflow .\ndocker run -p 44192:44192 --ipc=host -v $HOME/projects/phd/radflow:/radflow alasdairtran/radflow\n\n# On the client, find internal IP address\nhostname -I\n\n# Back up databases\nmongodump --db wiki2 --host=localhost --port=27017 --gzip --archive=data/mongobackups/wiki-2020-09-17.gz\nmongodump --db vevo --host=localhost --port=27017 --gzip --archive=data/mongobackups/vevo-2020-09-17.gz\n```\n\n## Data Description\n\nFor the WikiTraffic dataset, the most important file is `data/wiki/wiki.h5df`. Two of the keys\nin that file are:\n\n* `views` of shape `(366802, 1827)`: Each row contains the view count of a wiki page over 1827 days,\n\n* `edges` of shape `(366802, 1827, max_edges)`: Each row contains the neighbors of a page on each day,\n\nfrom which we can reconstruct the entire graph. The structure of `data/vevo/vevo.h5df` is the same.\n\n## Training\n\n```sh\n# Some experiments don't utilize the whole GPU, so we can run many parallel\n# experiments on the same GPU.\n# When using MPS it is recommended to use EXCLUSIVE_PROCESS mode to ensure that\n# only a single MPS server is using the GPU, which provides additional insurance that the\n# MPS server is the single point of arbitration between all CUDA processes for that GPU.\n# Setting this does not persist across reboot\nsudo nvidia-smi -i 0,1 -c EXCLUSIVE_PROCESS\nCUDA_VISIBLE_DEVICES=0,1 \\\n    CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps \\\n    CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log \\\n    nvidia-cuda-mps-control -f\n\nexport CUDA_MPS_PIPE_DIRECTORY=/tmp/nvidia-mps\nexport CUDA_MPS_LOG_DIRECTORY=/tmp/nvidia-log\n\n# Naive baselines (no training is needed)\nCUDA_VISIBLE_DEVICES= radflow evaluate expt/pure_time_series/vevo/01_copying_previous_day/config.yaml\nCUDA_VISIBLE_DEVICES= radflow evaluate expt/pure_time_series/vevo/02_copying_previous_week/config.yaml\n\n# Example training and evaluation\nCUDA_VISIBLE_DEVICES=1 radflow train expt/network_aggregation/vevo_dynamic/imputation/one_hop/15_radflow/config.yaml -f\nCUDA_VISIBLE_DEVICES=1 radflow evaluate expt/network_aggregation/vevo_dynamic/imputation/one_hop/15_radflow/config.yaml -m expt/network_aggregation/vevo_dynamic/imputation/one_hop/15_radflow/serialization/best.th\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falasdairtran%2Fradflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falasdairtran%2Fradflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falasdairtran%2Fradflow/lists"}