Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/matthewvowels1/Awesome-Video-Generation

A curated list of awesome work on video generation and video representation learning, and related topics.
https://github.com/matthewvowels1/Awesome-Video-Generation

List: Awesome-Video-Generation

Last synced: 7 days ago
JSON representation

A curated list of awesome work on video generation and video representation learning, and related topics.

Awesome Lists containing this project

README

        

# Awesome-Video-Generation
A curated list of awesome work (currently 257 papers) a on video generation and video representation learning, and related topics (such as RL). Feel free to contribute or email me if I've missed your paper off the list : ]

They are ordered by year (new to old). I provide a link to the paper as well as to the github repo where available.

## 2020

Disentangling multiple features in video sequences using Gaussian processes in variational autoencoders. Bhagat, Uppal, Yin, Lim https://arxiv.org/abs/2001.02408

Generative adversarial networks for spatio-temporal data: a survey. Gao, Xue, Shao, Zhao, Qin, Prabowo, Rahaman, Salim https://arxiv.org/pdf/2008.08903.pdf

Deep state-space generative model for correlated time-to-event predictions. Xue, Zhou, Du, Dai, Xu, Zhang, Cui https://dl.acm.org/doi/abs/10.1145/3394486.3403206

Toward discriminating and synthesizing motion traces using deep probabilistic generative models. Zhou, Liu, Zhang, Trajcevski https://ieeexplore.ieee.org/abstract/document/9165954/

Sample-efficient robot motion learning using Gaussian process latent variable models. Delgado-Guerrero, Colome, Torras http://www.iri.upc.edu/files/scidoc/2320-Sample-efficient-robot-motion-learning-using-Gaussian-process-latent-variable-models.pdf

Sequence prediction using spectral RNNS . Wolter, Gall, Yao https://www.researchgate.net/profile/Moritz_Wolter2/publication/329705630_Sequence_Prediction_using_Spectral_RNNs/links/5f36b9d892851cd302f44a57/Sequence-Prediction-using-Spectral-RNNs.pdf

Self-supervised video representation learning by pace prediction. Wang, Joai, Liu https://arxiv.org/pdf/2008.05861.pdf

RhyRNN: Rhythmic RNN for recognizing events in long and complex videos. Yu, Li, Li http://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123550137.pdf

4D forecasting: sequential forecasting of 100,000 points. Weng, Wang, Levine, Kitani, Rhinehart http://www.xinshuoweng.com/papers/SPF2_eccvw/camera_ready.pdf

Multimodal deep generative models for trajectory prediction: a conditional variational autoencoder approach. Ivanovic, Leung, Schmerling, Pavone https://arxiv.org/pdf/2008.03880.pdf

Memory-augmented dense predictive coding for video representation learning. Han, Xie, Zisserman https://arxiv.org/pdf/2008.01065.pdf

SeCo: exploring sequence supervision for unsupervised representation learning. Yao, Zhang, Qiu, Pan, Mei https://arxiv.org/pdf/2008.00975.pdf

PDE-driven spatiotemporal disentanglement. Dona, Franceschi, Lamprier, Gallinari https://arxiv.org/pdf/2008.01352.pdf

Dynamics generalization via information bottleneck in deep reinforcement learning. Lu, Lee, Abbeel, Tiomkin https://arxiv.org/pdf/2008.00614.pdf

Latent space roadmap for visual action planning. Lippi, Poklukar, Welle, Varava, Yin, Marino, Kragic https://rss2020vlrrm.github.io/papers/3_CameraReadySubmission_RSS_workshop_latent_space_roadmap.pdf

Weakly-supervised learning of human dynamics. Zell, Rosenhahn, Wandt https://arxiv.org/pdf/2007.08969.pdf

Deep variational Leunberger-type observer for stochastic video prediction. Wang, Zhou, Yan, Yao, Liu, Ma, Lu https://arxiv.org/pdf/2003.00835.pdf

NewtonianVAE: proportional control and goal identification from pixels via physical latent spaces. Jaques, Burke, Hospedales https://arxiv.org/pdf/2006.01959.pdf

Constrained variational autoencoder for improving EEG based speech recognition systems. Krishna, Tran, Carnahan, Tewfik https://arxiv.org/pdf/2006.02902.pdf

Latent video transformer. Rakhumov, Volkhonskiy https://arxiv.org/pdf/2006.10704.pdf

Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness . Ribeiro, Tiels, Aguirre, Schon http://proceedings.mlr.press/v108/ribeiro20a/ribeiro20a.pdf

Towards recurrent autoeregressive flow models . Mern, Morales, Kochenderfer https://arxiv.org/pdf/2006.10096.pdf

Learning to combine top-down and bottom-up signals in recurrent neural networks with attention over modules. Mittal, Lamb, Goyal, Voleti et al. https://www.cs.colorado.edu/~mozer/Research/Selected%20Publications/reprints/Mittaletal2020.pdf

Unmasking the inductive biases of unsupervised object representations for video sequences. Weis, Chitta, Sharma et al. https://arxiv.org/pdf/2006.07034.pdf

G3AN: disentnagling appearance and motion for video generation. Wang, Bilinski, Bermond, Dantcheva http://openaccess.thecvf.com/content_CVPR_2020/papers/Wang_G3AN_Disentangling_Appearance_and_Motion_for_Video_Generation_CVPR_2020_paper.pdf

Learning dynamic relationships for 3D human motion prediction . Cui, Sun, Yang http://openaccess.thecvf.com/content_CVPR_2020/papers/Cui_Learning_Dynamic_Relationships_for_3D_Human_Motion_Prediction_CVPR_2020_paper.pdf

Joint training of variational auto-encoder and latent energy-based model. Han, Nijkamp, Zhou, Pang, Zhu, Wu http://openaccess.thecvf.com/content_CVPR_2020/papers/Han_Joint_Training_of_Variational_Auto-Encoder_and_Latent_Energy-Based_Model_CVPR_2020_paper.pdf

Learning invariant representations for reinforcement learning without reconstruction. Zhang, McAllister, Calandra, Gal, Levine https://arxiv.org/pdf/2006.10742.pdf

Variational inference for sequential data with future likelihood estimates. Kim, Jang, Yang, Kim http://ailab.kaist.ac.kr/papers/pdfs/KJYK2020.pdf

Video prediction via example guidance. Xu, Xu, Ni, Yang, Darrell https://arxiv.org/pdf/2007.01738.pdf

Hierarchical path VAE-GAN: generating diverse videos from a single sample. Gur, Benaim, Wolf https://arxiv.org/pdf/2006.12226.pdf

Dynamic facial expression generation on Hilbert Hypersphere with conditional Wasserstein Generative adversarial nets. Otberdout, Daoudi, Kacem, Ballihi, Berretti https://arxiv.org/abs/1907.10087

HAF-SVG: hierarchical stochastic video generation with aligned features. Lin, Yuan, Li https://www.ijcai.org/Proceedings/2020/0138.pdf

Improving generative imagination in object-centric world models. Lin, Wu, Peri, Fu, Jiang, Ahn https://proceedings.icml.cc/static/paper_files/icml/2020/4995-Paper.pdf

Deep generative video compression with temporal autoregressive transforms. Yang, Yang, Marino, Yang, Mandt https://joelouismarino.github.io/files/papers/2020/seq_flows_compression/seq_flows_compression.pdf

Spatially structured recurrent modules. Rahaman, Goyal, Gondal, Wuthrich, Bauer, Sharma, Bengio, Scholkopf https://arxiv.org/pdf/2007.06533.pdf

Unsupervised object-centric video generation and decomposition in 3D. Henderson, Lampert https://arxiv.org/pdf/2007.06705.pdf

Planning from images with deep latent gaussian process dynamics. Bosch, Achterhold, Leal-Taixe, Stuckler https://arxiv.org/pdf/2005.03770.pdf

Planning to explore via self-supervised world models . Sekar, Rybkin, Daniilidis, Abbeel, Hafner, Pathak https://arxiv.org/pdf/2005.05960.pdf

Mutual information maximization for robust plannable representations. Ding, Clavera, Abbeel https://arxiv.org/pdf/2005.08114.pdf

Supervised contrastive learning. Khosla, Teterwak, Wang, Sarna https://arxiv.org/pdf/2004.11362.pdf

Blind source extraction based on multi-channel variational autoencoder and x-vector-based speaker selection trained with data augmentation. Gu, Liao, Lu https://arxiv.org/pdf/2005.07976.pdf

BiERU: bidirectional emotional recurrent unit for conversational sentiment analysis. Li, Shao, Ji, Cambria https://arxiv.org/pdf/2006.00492.pdf

S3VAE: self-supervised sequential VAE for representation disentanglement and data generation. Zhu, Min, Kadav, Graf, https://arxiv.org/pdf/2005.11437.pdf

Probably approximately correct vision-based planning using motion primitives. Veer, Majumdar https://arxiv.org/abs/2002.12852

MoVi: a large multipurpose motion and video dataset . Ghorbani, Mahdaviani, Thaler, Kording, Cook, Blohm, Troje https://arxiv.org/abs/2003.01888

Temporal convolutional attention-based network for sequence modeling. Hao, Wang, Xia, Shen, Zhao https://arxiv.org/abs/2002.12530

Neuroevolution of self-interpretable agents. Tang, Nguyen, Ha https://arxiv.org/abs/2003.08165

Attentional adversarial variational video generation via decomposing motion and content. Talafha, Rekabdar, Ekenna, Mousas https://ieeexplore.ieee.org/document/9031476

Imputer sequence modelling via imputation and dynamic programming. Chan, Sharahia, Hinton, Norouzi, Jaitly https://arxiv.org/abs/2002.08926

Variational conditioning of deep recurrent networks for modeling complex motion dynamics. Buckchash, Raman https://ieeexplore.ieee.org/document/9055015?denied=

Training of deep neural networks for the generation of dynamic movement primitives. Pahic, Ridge, Gams, Morimoti, Ude https://www.sciencedirect.com/science/article/pii/S0893608020301301

PreCNet: next frame video prediction based on predictive coding. Straka, Svoboda, Hoffmann https://arxiv.org/pdf/2004.14878.pdf

Dimensionality reduction of movement primitives in parameter space. Tosatto, Stadtmuller, Peters https://arxiv.org/abs/2003.02634

Disentangling physical dynamics from unknown factors for unsupervised video prediction. Guen, Thorne https://arxiv.org/abs/2003.01460

A real-robot dataset for assessing transferability of learned dynamics models. Agudelo-Espana, Zadaianchuk, Wenk, Garg, Akpo et al https://www.is.mpg.de/uploads_file/attachment/attachment/589/ICRA20_1157_FI.pdf

Hierarchical decomposition of nonlinear dynamics and control for system indentification and policy distillation. Abdulsamad, Peters https://arxiv.org/pdf/2005.01432.pdf

Occlusion resistant learning of intuitive physics from videos. Riochet, Sivic, Laptev, Dupoux https://arxiv.org/pdf/2005.00069.pdf

Scalable learning in altent state sequence models Aicher https://digital.lib.washington.edu/researchworks/bitstream/handle/1773/45550/Aicher_washington_0250E_21152.pdf?sequence=1

How useful is self-supervised pretraining for visual tasks? Newell, Deng https://arxiv.org/pdf/2003.14323.pdf

q-VAE for disentangled representation learning and latent dynamical systems Koboyashi https://arxiv.org/pdf/2003.01852.pdf

Variational recurrent models for solving partially observable control tasks. Han, Doya, Tani https://openreview.net/forum?id=r1lL4a4tDB

Stochastic latent residual video prediction. Franceschi, Delasalles, Chen, Lamprier, Gallinari https://arxiv.org/pdf/2002.09219.pdf https://sites.google.com/view/srvp

Disentangled speech embeddings using cross-modal self-supervision. Nagrani, Chung, Albanie, Zisserman https://arxiv.org/abs/2002.08742

TwoStreamVAM: improving motion modeling in video generation. Sun, Xu, Saenko https://arxiv.org/abs/1812.01037

Variational hyper RNN for sequence modeling. Deng, Cao, Chang, Sigal, Mori, Brubaker https://arxiv.org/abs/2002.10501

Exploring spatial-temporal multi-frequency analysis for high-fidelity and temporal-consistency video prediction. Jin, Hu,Tang, Niu, Shi, Han, Li https://arxiv.org/abs/2002.09905

## 2019

Representing closed transformation paths in encoded network latent space. Connor, Rozell https://arxiv.org/pdf/1912.02644.pdf

Animating arbitrary objects via deep motion transfer. Siarohin, Lathuiliere, Tulyakov, Ricci, Sebe https://arxiv.org/abs/1812.08861

Feedback recurrent autoencoder. Yang, Sautiere, Ryu, Cohen https://arxiv.org/abs/1911.04018

First order motion model for image animation. Siarohin, Lathuiliere, Tulyakov, Ricci, Sebe https://papers.nips.cc/paper/8935-first-order-motion-model-for-image-animation

Point-to-point video generation. Wang, Cheng, Lin, Chen, Sun https://arxiv.org/pdf/1904.02912.pdf

Learning deep controllable and structured representations for image synthesis, structured prediction and beyond. Yan https://deepblue.lib.umich.edu/handle/2027.42/153334

Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics. Raffin, Hill, Traore, Lesort, Diaz-Rodriguez, Filliat https://arxiv.org/abs/1901.08651

Task-Conditioned variational autoencoders for learning movement primitives. Noseworthy, Paul, Roy, Park, Roy https://groups.csail.mit.edu/rrg/papers/noseworthy_corl_19.pdf

Spatio-temporal alignments: optimal transport through space and time. Janati, Cuturi, Gramfort https://arxiv.org/pdf/1910.03860.pdf

Action Genome: actions as composition of spatio-temporal scene graphs. Ji, Krishna, Fei-Fei, Niebles https://arxiv.org/pdf/1912.06992.pdf

Video-to-video translation for visual speech synthesis. Doukas, Sharmanska, Zafeiriou https://arxiv.org/pdf/1905.12043.pdf
Predictive coding, variational autoencoders, and biological connections Marino https://openreview.net/pdf?id=SyeumQYUUH

Single Headed Attention RNN: stop thinking with your head. Merity https://arxiv.org/pdf/1911.11423.pdf

Hamiltonian neural networks. Greydanus, Dzamba, Yosinski https://arxiv.org/pdf/1906.01563.pdf https://github.com/greydanus/hamiltonian-nn

Learning what you can do before doing anything. Rybkin, Pertsch, Derpanis, Daniilidis, Jaegle https://openreview.net/pdf?id=SylPMnR9Ym https://daniilidis-group.github.io/learned_action_spaces

Deep Lagrangian networks: using physics as model prior for deep learning. Lutter, Ritter, Peters https://arxiv.org/pdf/1907.04490.pdf

A general framework for structured learning of mechanical systems. Gupta, Menda, Manchester, Kochenderfer https://arxiv.org/pdf/1902.08705.pdf https://github.com/sisl/machamodlearn

Learning predictive models from observation and interaction. Schmeckpeper, Xie, Rybkin, Tian, Daniilidis, Levine, Finn https://arxiv.org/pdf/1912.12773.pdf

A multigrid method for efficiently training video models. Wu, Girshick, He, Feichtenhofer, Krahenbuhl https://arxiv.org/pdf/1912.00998.pdf

Deep variational Koopman models: inferring Koopman observations for uncertainty-aware dynamics modeling and control . Morton, Witherden, Kochenderfer https://arxiv.org/pdf/1902.09742.pdf

Symplectic ODE-NET: learning hamiltonian dynamics with control. Zhong, Dey, Chakraborty https://arxiv.org/pdf/1909.12077.pdf

Hamiltonian graph networks with ODE integrators. Sanchez-Gonzalez, Bapst, Cranmer, Battaglia https://arxiv.org/pdf/1909.12790.pdf

Neural ordinary differential equations. Chen, Rubanova, Bettencourt, Duvenaud https://arxiv.org/pdf/1806.07366.pdf https://github.com/rtqichen/torchdiffeq

Variational autoencoder trajectory primitives and discrete latent codes. Osa, Ikemoto https://arxiv.org/pdf/1912.04063.pdf

Newton vs the machine: solving the chaotic three-body problem using deep neural networks. Breen, Foley, Boekholt, Zwart https://arxiv.org/pdf/1910.07291.pdf

Learning dynamical systems from partial observations. Ayed, de Bezenac, Pajot, Brajard, Gallinari https://arxiv.org/pdf/1902.11136.pdf

GP-VAE: deep probabilistic time series imputation. Fortuin, Baranchuk, Ratsch, Mandt https://arxiv.org/pdf/1907.04155.pdf https://github.com/ratschlab/GP-VAE

Ghost hunting in the nonlinear dynamic machine. Butner, Munion, Baucom, Wong https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0226572

Faster attend-infer-repeat with tractable probabilistic models. Stelzner, Peharz, Kersting http://proceedings.mlr.press/v97/stelzner19a/stelzner19a.pdf https://github/stelzner/supair

Tree-structured recurrent switching linear dynamical systems for multi-scale modeling. Nassar, Linderman, Bugallo, Park https://arxiv.org/pdf/1811.12386.pdf

DynaNet: neural Kalman dynamical model for motion estimation and prediction. Chen, Lu, Wang, Trigoni, Markham https://arxiv.org/pdf/1908.03918.pdf

Disentangled behavioral representations. Dezfouli, Ashtiani, Ghattas, Nock, Dayan, Ong https://papers.nips.cc/paper/8497-disentangled-behavioural-representations.pdf

Structured object-aware physics prediction for video modeling and planning. Kossen, Stelzner, Hussing, Voelcker, Kersting https://arxiv.org/pdf/1910.02425.pdf https://github.com/jlko/STOVE

Recurrent attentive neural process for sequential data. Qin, Zhu, Qin, Wang, Zhao https://arxiv.org/pdf/1910.09323.pdf https://kasparmartens.rbind.io/post/np/

DeepMDP: learning continuous latent space models for representation learning. Gelada, Kumar, Buckman, Nachum, Bellemare https://arxiv.org/pdf/1906.02736.pdf

Genesis: generative scene inference and sampling with object-centric latent representations. Engelcke, Kosiorek, Jones, Posner https://arxiv.org/pdf/1907.13052.pdf https://github.com/applied-ai-lab/genesis

Deep conservation: a latent dynamics model for exact satisfaction of physical conservation laws. Lee, Carlberg https://arxiv.org/pdf/1909.09754.pdf

Switching linear dynamics for variational bayes filtering. Becker-Ehmck, Peters, van der Smagt https://arxiv.org/pdf/1905.12434.pdf https://arxiv.org/pdf/1905.12434.pdf

Approximate Bayesian inference in spatial environments Mirchev, Kayalibay, Soelch, van der Smagt, Bayer https://arxiv.org/pdf/1805.07206.pdf

beta-DVBF: learning state-space models for control from high dimensional observations. Das, Karl, Becker-Ehmck, van der Smagt https://arxiv.org/pdf/1911.00756.pdf

SSA-GAN: End-to-end time-lapse video generation with spatial self-attention. Horita, Yanai http://img.cs.uec.ac.jp/pub/conf19/191126horita_0.pdf

Learning energy-based spatial-temporal generative convnets for dynamic patterns. Xie, Zhu, Wu https://arxiv.org/pdf/1909.11975.pdf http://www.stat.ucla.edu/~jxie/STGConvNet/STGConvNet.html

Multiplicative interactions and where to find them. Anon https://openreview.net/pdf?id=rylnK6VtDH

Time-series generative adversarial networks. Yoon, Jarrett, van der Schaar https://papers.nips.cc/paper/8789-time-series-generative-adversarial-networks.pdf

Explaining and interpreting LSTMs. Arras, Arjona-Medina, Widrich, Montavon, Gillhofer, Muller, Hochreiter, Samek https://arxiv.org/pdf/1909.12114.pdf

Gating revisited: deep multi-layer RNNs that can be trained. Turkoglu, D'Aronco, Wegner, Schindler https://arxiv.org/pdf/1911.11033.pdf

Re-examination of the role of latent variables in sequence modeling. Lai, Dai, Yang, Yoo https://arxiv.org/pdf/1902.01388.pdf

Improving sequential latent variable models with autoregressive flows. Marino, Chen, He, Mandt https://openreview.net/pdf?id=HklvmlrKPB

Learning stable and predictive structures in kinetic systems: benefits of a causal approach. Pfister, Bauer, Peters https://arxiv.org/pdf/1810.11776.pdf

Learning to disentangle latent physical factors for video prediction. Zhu, Munderloh, Rosenhahn, Stuckler https://link.springer.com/chapter/10.1007/978-3-030-33676-9_42

Adversarial video generation on complex datasets. Clark, Donahue, Simonyan https://arxiv.org/pdf/1907.06571.pdf

Learning to predict without looking ahead: world models without forward prediction. Freeman, Metz, Ha https://arxiv.org/pdf/1910.13038.pdf

Learning video representations using contrastive bidirectional transformer. Sun, Baradel, Murphy, Schmid https://arxiv.org/pdf/1906.05743.pdf

STCN: stochastic temporal convolution networks. Aksan, Hilliges https://arxiv.org/pdf/1902.06568.pdf http://jacobcwalker.com/DTP/DTP.html https://ait.ethz.ch/projects/2019/stcn/

Zero-shot generation of human-object interaction videos. Nawhal, Zhai, Lehrmann, Sigal https://arxiv.org/pdf/1912.02401.pdf http://www.sfu.ca/~mnawhal/projects/zs_hoi_generation.html

Learning a generative model for multi-step human-object interactions from videos. Wang, Pirk, Yumer, Kim, Sener, Sridhar, Guibas http://www.pirk.info/papers/Wang.etal-2019-LearningInteractions.pdf http://www.pirk.info/projects/learning_interactions/index.html

Dream to control: learning behaviors by latent imagining. Hafner, Lillicrap, Ba, Norouzi https://arxiv.org/pdf/1912.01603.pdf

Multistage attention network for multivariate time series prediction. Hu, Zheng https://www.sciencedirect.com/science/article/abs/pii/S0925231219316625

Predicting video-frames using encoder-convLSTM combination. Mukherjee, Ghosh, Ghosh, Kumar, Roy https://ieeexplore.ieee.org/document/8682158

A variational auto-encoder model for stochastic point processes. Mehrasa, Jyothi, Durand, He, Sigal, Mori https://arxiv.org/pdf/1904.03273.pdf

Unsupervised speech representation learning using WaveNet encoders. Chorowski, Weiss, Bengio, van den Oord https://arxiv.org/pdf/1901.08810.pdf

Local aggregation for unsupervised learning of visual embeddings. Zhuang, Zhai, Yamins http://openaccess.thecvf.com/content_ICCV_2019/papers/Zhuang_Local_Aggregation_for_Unsupervised_Learning_of_Visual_Embeddings_ICCV_2019_paper.pdf

Hamiltonian generative Networks. Toth, Rezende, Jaegle, Racaniere, Botev, higgins https://arxiv.org/pdf/1909.13789.pdf

VideoBERT: a joint model for video and language representations learning. Sun, Myers, Vondrick, Murphy, Schmid https://arxiv.org/pdf/1904.01766.pdf

Video representation learning via dense predictive coding. Han, Xie, Zisserman http://openaccess.thecvf.com/content_ICCVW_2019/papers/HVU/Han_Video_Representation_Learning_by_Dense_Predictive_Coding_ICCVW_2019_paper.pdf https://github.com/TengdaHan/DPC

Hamiltonian Generative Networks Toth, Rezende, Jaegle, Racaniere, Botev, higgins https://arxiv.org/pdf/1909.13789.pdf

Unsupervised state representation learning in Atari. Anand, Racah, Ozair, Bengio, Cote, Hjelm https://arxiv.org/pdf/1906.08226.pdf

Temporal cycle-consistency learning. Dwibedi, Aytar, Tompson, Sermanet, Zisserman http://openaccess.thecvf.com/content_CVPR_2019/papers/Dwibedi_Temporal_Cycle-Consistency_Learning_CVPR_2019_paper.pdf

Self-supervised learning by cross-modal audio-video clustering. Alwassel, Mahajan, Torresani, Ghanem, Tran https://arxiv.org/pdf/1911.12667.pdf

Human action recognition with deep temporal pyramids. Mazari, Sahbi https://arxiv.org/pdf/1905.00745.pdf

Evolving losses for unlabeled video representation learning. Piergiovanni, Angelova, Ryoo https://arxiv.org/pdf/1906.03248.pdf

MoGlow: probabilistic and controllable motion synthesis using normalizing flows. Henter, Alexanderson, Beskow https://arxiv.org/pdf/1905.06598.pdf https://www.youtube.com/watch?v=lYhJnDBWyeo

High fidelity video prediction with large stochastic recurrent neural networks. Villegas, Pathak, Kannan, Erhan, Le, Lee https://arxiv.org/pdf/1911.01655.pdf https://sites.google.com/view/videopredictioncapacity

Spatiotemporal pyramid network for video action recognition. Wang, Long, Wan, Yu https://arxiv.org/pdf/1903.01038.pdf

Attentive temporal pyramid network for dynamic scene classification. Huang, Cao, Zhen, Han https://www.aaai.org/ojs/index.php/AAAI/article/view/5184

Disentangling video with independent prediction. Whitney, Fergus https://arxiv.org/pdf/1901.05590.pdf

Disentangling state space representations. Miladinovic, Gondal, Scholkopf, Buhmann, Bauer https://arxiv.org/pdf/1906.03255.pdf

Cycle-SUM: cycle-consistent adversarial LSTM networks for unsupervised video summarization Yuan, Tay, Li, Zhou, Feng https://arxiv.org/pdf/1904.08265.pdf
Unsupervised learning from video with deep neural embeddings Zhuang, Andonian, Yamins https://arxiv.org/pdf/1905.11954.pdf
Scaling and benchmarking self-supervised visual representation learning. Goyal, Mahajan, Gupta, Misra https://arxiv.org/pdf/1905.01235.pdf

Self-supervised visual feature learning with deep neural networks: a survey. Jing, Tian https://arxiv.org/pdf/1902.06162.pdf
Unsupervised learning of object structure and dynamics from videos Minderer, Sun, Villegas, Cole, Murphy, Lee https://arxiv.org/pdf/1906.07889.pdf

Learning correspondence from the cycle-consistency of time. Wang, Jabri, Efros https://arxiv.org/pdf/1903.07593.pdf https://ajabri.github.io/timecycle/

DistInit: learning video representations without a single labeled video . Girdhar, Tran, Torresani, Ramanan https://arxiv.org/pdf/1901.09244.pdf

VideoFlow: a flow-based generative model for video. Kumar, Babaeizadeh, Erhan, Finn, Levine, Dinh, Kingma https://arxiv.org/pdf/1903.01434.pdf find code in tensor2tensor library

Learning latent dynamics for planning from pixels. Hafner, Lillicrap, Fischer, Villegas, Ha, Lee, Davidson https://arxiv.org/pdf/1811.04551.pdf https://github.com/google-research/planet

View-LSTM: Novel-view video synthesis trough view decomposition. Lakhal, Lanz, Cavallaro http://openaccess.thecvf.com/content_ICCV_2019/papers/Lakhal_View-LSTM_Novel-View_Video_Synthesis_Through_View_Decomposition_ICCV_2019_paper.pdf

Likelihood conribution based multi-scale architecture for generative flows. Das, Abbeel, Spanos https://arxiv.org/pdf/1908.01686.pdf

Adaptive online planning for continual lifelong learning. Lu, Mordatch, Abbeel https://arxiv.org/pdf/1912.01188.pdf
Exploiting video sequences for unsupervised disentangling in generative adversarial networks Tuesca, Uzal https://arxiv.org/pdf/1910.11104.pdf

Memory in memory: a predictive neural network for learning higher-order non-stationarity from spatiotemporal dynamics. Wang, Zhang, Zhe, Long, Wang, Yu https://arxiv.org/pdf/1811.07490.pdf

Improved conditional VRNNs for video prediction. Castrejon, Ballas, Courville https://arxiv.org/pdf/1904.12165.pdf

Temporal difference variational auto-encoder. Gregor, Papamakarios, Besse, Buesing, Weber https://arxiv.org/pdf/1806.03107.pdf

Time-agnostic prediction: predicting predictable video frames. Jayaraman, Ebert, Efros, Levine https://arxiv.org/pdf/1808.07784.pdf https://sites.google.com/view/ta-pred

Variational tracking and prediction with generative disentangled state-space models. Akhundov, Soelch, Bayer, van der Smagt https://arxiv.org/pdf/1910.06205.pdf

Self-supervised spatiotemporal learning via video clip order prediction. Xu, Xiao, Zhao, Shao, Xie, Zhuang https://pdfs.semanticscholar.org/558a/eb7aa38cfcf8dd9951bfd24cf77972bd09aa.pdf https://github.com/xudejing/VCOP

Self-supervised spatio-temporal representation learning for videos by predicting motion and appearance statistics. Wang, Jiao, Bao, He, Liu, Liu http://openaccess.thecvf.com/content_CVPR_2019/papers/Wang_Self-Supervised_Spatio-Temporal_Representation_Learning_for_Videos_by_Predicting_Motion_and_CVPR_2019_paper.pdf

Spatio-temporal associative representation for video person re-identification. Wu, Zhu, Gong http://www.eecs.qmul.ac.uk/~sgg/papers/WuEtAl_BMVC2019.pdf

Object segmentation using pixel-wise adversarial loss. Durall, Pfreundt, Kothe, Keuper https://arxiv.org/pdf/1909.10341.pdf

## 2018

The dreaming variational autoencoder for reinforcement learning environments. Andersen, Goodwin, Granmo https://arxiv.org/pdf/1810.01112v1.pdf

MT-VAE: Learning Motion Transformations to Generate Multimodal Human Dynamics. Yan, Rastogi, Villegas, Sunkavalli, Shechtman, Hadap, Yumer, Lee http://openaccess.thecvf.com/content_ECCV_2018/html/Xinchen_Yan_Generating_Multimodal_Human_ECCV_2018_paper.html

Deep learning for universal linear embeddings of nonlinear dynamics. Lusch, Kutz, Brunton https://www.nature.com/articles/s41467-018-07210-0

Variational attention for sequence-to-sequence models. Bahuleyan, Mou, Vechtomova, Poupart https://arxiv.org/pdf/1712.08207.pdf https://github.com/variational-attention/tf-var-attention

Understanding image motion with group representations. Jaegle, Phillips, Ippolito, Daniilidis https://openreview.net/forum?id=SJLlmG-AZ

Relational neural expectation maximization: unsupervised discovery of objects and their interactions. van Steenkiste, Chang, Greff, Schmidhuber https://arxiv.org/pdf/1802.10353.pdf https://sites.google.com/view/r-nem-gifs https://github.com/sjoerdvansteenkiste/Relational-NEM

A general method for amortizing variational filtering. Marino, Cvitkovic, Yue https://arxiv.org/pdf/1811.05090.pdf https://github.com/joelouismarino/amortized-variational-filtering

Deep learning for physical processes: incorporating prior scientific knowledge de Bezenac, Pajot, Gallinari https://arxiv.org/pdf/1711.07970.pdf https://github.com/emited/flow

Probabilistic recurrent state-space models . Doerr, Daniel, Schiegg, Nguyen-Tuong, Schaal, Toussaint, Trimpe https://arxiv.org/pdf/1801.10395.pdf https://github.com/boschresearch/PR-SSM

TGANv2: efficient training of large models for video generation with multiple subsampling layers. Saito, Saito https://arxiv.org/abs/1811.09245

Towards high resolution video generation with progressive growing of sliced Wasserstein GANs. Acharya, Huang, Paudel, Gool https://arxiv.org/abs/1810.02419

Representation learning with contrastive predictive coding. van den Oord, Li, Vinyas https://arxiv.org/pdf/1807.03748.pdf

Deconfounding reinforcement learning in observational settings . Lu, Scholkopf, Hernandez-Lobato https://arxiv.org/pdf/1812.10576.pdf

Flow-grounded spatial-temporal video prediction from still images. Li, Fang, Yang, Wang, Lu, Yang https://arxiv.org/pdf/1807.09755.pdf

Adaptive skip intervals: temporal abstractions for recurrent dynamical models. Neitz, Parascandolo, Bauer, Scholkopf https://arxiv.org/pdf/1808.04768.pdf

Disentangled sequential autoencoder. Li, Mandt https://arxiv.org/abs/1803.02991 https://github.com/yatindandi/Disentangled-Sequential-Autoencoder

Video jigsaw: unsupervised learning of spatiotemporal context for video action recognition. Ahsan, Madhok, Essa https://arxiv.org/pdf/1808.07507.pdf

Iterative reoganization with weak spatial constraints: solving arbitrary jigsaw puzzels for unsupervised representation learning. Wei, Xie, Ren, Xia, Su, Liu, Tian, Yuille https://arxiv.org/pdf/1812.00329.pdf

Stochastic adversarial video prediction. Lee, Zhang, Ebert, Abbeel, Finn, Levine https://arxiv.org/pdf/1804.01523.pdf https://alexlee-gk.github.io/video_prediction/

Stochastic variational video prediction. Babaeizadeh, Finn, Erhan, Campbell, Levine https://arxiv.org/pdf/1710.11252.pdf https://github.com/alexlee-gk/video_prediction

Folded recurrent neural networks for future video prediction. Oliu, Selva, Escalera https://arxiv.org/pdf/1712.00311.pdf

PredRNN++: Towards a resolution of the deep-in-time dilemma in spatiotemporal predictive learning. Wang, Gao, Long, Wang, Yu https://arxiv.org/pdf/1804.06300.pdf https://github.com/Yunbo426/predrnn-pp

Stochastic video generation with a learned prior. Denton, Fergus https://arxiv.org/pdf/1802.07687.pdf https://sites.google.com/view/svglp

Unsupervised learning from videos using temporal coherency deep networks. Redondo-Cabrera, Lopez-Sastre https://arxiv.org/pdf/1801.08100.pdf

Time-contrastive networks: self-supervised learning from video. Sermanet, Lynch, Chebotar, Hsu, Jang, Schaal, Levine https://arxiv.org/pdf/1704.06888.pdf

Learning to decompose and disentangle representations for video prediction. Hsieh, Liu, Huang, Fei-Fei, Niebles https://arxiv.org/pdf/1806.04166.pdf https://github.com/jthsieh/DDPAE-video-prediction

Probabilistic video generation using holistic attribute control. He, Lehrmann, Marino, Mori, Sigal https://arxiv.org/pdf/1803.08085.pdf

Interpretable intuitive physics model. Ye, Wang, Davidson, Gupta https://arxiv.org/pdf/1808.10002.pdf https://github.com/tianye95/interpretable-intuitive-physics-model

Video synthesis from a single image and motion stroke. Hu, Walchli, Portenier, Zwicker, Facaro https://arxiv.org/pdf/1812.01874.pdf

Graph networks as learnable physics engines for inference and control. Sanchez-Gonzalez, Heess, Springenberg, Merel, Riedmiller, Hadsell, Battaglia https://arxiv.org/pdf/1806.01242.pdf https://drive.google.com/file/d/14eYTWoH15T53a7qejvCkDLItOOE9Ve7S/view

Deep dynamical modeling and control of unsteady fluid flows. Morton, Witherden, Jameson, Kochenderfer https://arxiv.org/pdf/1805.07472.pdf https://github.com/sisl/deep_flow_control

Sequential attend, infer, repeat: generative modelling of moving objects. Kosiorek, Kim, Posner, Teh https://arxiv.org/pdf/1806.01794.pdf https://github.com/akosiorek/sqair https://www.youtube.com/watch?v=-IUNQgSLE0c&feature=youtu.be

Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks. Xiong, Luo, Ma, Liu, Luo https://arxiv.org/pdf/1709.07592.pdf

Integrating accounts of behavioral and neuroimaging data using flexible recurrent neural network models. Dezfouli, Morris, Ramos, Dayan, Balleine https://papers.nips.cc/paper/7677-integrated-accounts-of-behavioral-and-neuroimaging-data-using-flexible-recurrent-neural-network-models.pdf

## 2017

Autoregressive attention for parallel sequence modeling. Laird, Irvin https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1174/reports/2755456.pdf

Physics informed deep learning: data-driven solutions of nonlinear partial differential equations. Raissi, Perdikaris, Karniadakis https://arxiv.org/pdf/1711.10561.pdf https://github.com/maziarraissi/PINNs

Unsupervised real-time control through variational empowerment. Karl, Soelch, Becker-Ehmck, Benbouzid, van de Smagt, Bayer https://arxiv.org/pdf/1710.05101.pdf https://github.com/tessavdheiden/Empowerment

z-forcing: training stochastic recurrent networks. Goyal, Sordoni, Cote, Ke, Bengio https://arxiv.org/abs/1711.05411 https://github.com/ujjax/z-forcing

View synthesis by appearance flow. Zhou, Tulsiani, Sun, Malik, Efros https://arxiv.org/pdf/1605.03557.pdf

Learning to see physics via visual de-animation . Wu, Lu, Kohli, Freeman, Tenenbaum https://jiajunwu.com/papers/vda_nips.pdf https://github.com/pulkitag/pyphy-engine

Deep predictive coding networks for video prediction and unsupervised learning. Lotter, Kreiman, Cox https://arxiv.org/pdf/1605.08104.pdf

The predictron: end-to-end learning and planning. Silver, Hasselt, Hessel, Schaul, Guez, Harley, Dulac-Arnold, Reichert, Rabinowitz, Barreto, Degris https://arxiv.org/pdf/1612.08810.pdf

Recurrent ladder networks. Premont-Schwarz, Llin, Hao, Rasmus, Boney, Valpola https://arxiv.org/pdf/1707.09219.pdf

A disentangled recognition and nonlinear dynamics model for unsupervised learning. Fraccaro, Kamronn, Paquet, Winther https://arxiv.org/pdf/1710.05741.pdf

MoCoGAN: decomposing motion and content for video generation. Tulyakov, Liu, Yang, Kautz https://arxiv.org/pdf/1707.04993.pdf

Temporal generative adversarial nets with singular value clipping. Saito, Matsumoto, Saito https://arxiv.org/pdf/1611.06624.pdf

Multi-task self-supervised visual learning. Doersch, Zisserman https://arxiv.org/pdf/1708.07860.pdf

Prediction under uncertainty with error-encoding networks . Henaff, Zhao, LeCun https://arxiv.org/pdf/1711.04994.pdf https://github.com/mbhenaff/EEN.

Unsupervised learning of disentangled representations from video. Denton, Birodkar https://papers.nips.cc/paper/7028-unsupervised-learning-of-disentangled-representations-from-video.pdf https://github.com/ap229997/DRNET

Self-supervised visual planning with temporal skip connections. Erbert, Finn, Lee, Levine https://arxiv.org/pdf/1710.05268.pdf

Unsupervised learning of disentangled and interpretable representations from sequential data. Hsu, Zhang, Glass https://papers.nips.cc/paper/6784-unsupervised-learning-of-disentangled-and-interpretable-representations-from-sequential-data.pdf https://github.com/wnhsu/FactorizedHierarchicalVAE https://github.com/wnhsu/ScalableFHVAE

Decomposing motion and content for natural video sequence prediction. Villegas, Yang, Hong, Lin, Lee https://arxiv.org/pdf/1706.08033.pdf

Unsupervised video summarization with adversarial LSTM networks. Mahasseni, Lam, Todorovic http://web.engr.oregonstate.edu/~sinisa/research/publications/cvpr17_summarization.pdf

Deep variational bayes filters: unsupervised learning of state space models from raw data. Karl, Soelch, Bayer, van der Smagt https://arxiv.org/pdf/1605.06432.pdf https://github.com/sisl/deep_flow_control

A compositional object-based approach to learning physical dynamics. Chang, Ullman, Torralba, Tenenbaum https://arxiv.org/pdf/1612.00341.pdf https://github.com/mbchang/dynamics

Bayesian learning and inference in recurrent switching linear dynamical systems. Linderman, Johnson, Miller, Adams, Blei, Paninski http://proceedings.mlr.press/v54/linderman17a/linderman17a.pdf https://github.com/slinderman/recurrent-slds

SE3-Nets: learning rigid body motion using deep neural networks. Byravan, Fox https://arxiv.org/pdf/1606.02378.pdf

## 2016

Beyond temporal pooling: recurrence and temporal convolutions for gesture recognition in video. Pigou, van den Oord, Dieleman, Van Herreweghe, Dambre https://arxiv.org/abs/1506.01911

Dynamic filter networks. De Brabandere, Jia, Tuytelaars, Gool https://arxiv.org/pdf/1605.09673.pdf

Dynamic movement primitives in latent space of time-dependent variational autoencoders. Chen, Karl, van der Smagt https://ieeexplore.ieee.org/document/7803340

Learning physical intuiting of block towers by example. Lerer, Gross, Fergus https://arxiv.org/pdf/1603.01312.pdf

Structured inference networks for nonlinear state space models. Krishnan, Shalit, Sontag https://arxiv.org/pdf/1609.09869.pdf https://github.com/clinicalml/structuredinference

A recurrent latent variable model for sequential data. Chung, Kastner, Dinh, Goel, Courville, Bengio https://arxiv.org/pdf/1506.02216.pdf https://github.com/jych/nips2015_vrnn

Recognizing micro-actions and reactions from paired egocentric videos Yonetani, Kitani, Sato http://www.cs.cmu.edu/~kkitani/pdf/YKS-CVPR16.pdf

Anticipating visual representations from unlabeled video. https://github.com/chiawen/activity-anticipation https://www.zpascal.net/cvpr2016/Vondrick_Anticipating_Visual_Representations_CVPR_2016_paper.pdf

Deep multi-scale video prediction beyond mean square error. Mathieu, Couprie, LeCun https://arxiv.org/pdf/1511.05440.pdf

Generating videos with scene dynamics. Vondrick, Pirsiavash, Torralba https://papers.nips.cc/paper/6194-generating-videos-with-scene-dynamics.pdf

Disentangling space and time in video with hierarchical variational auto-encoders. Grathwohl, Wilson https://arxiv.org/pdf/1612.04440.pdf

Understanding visual concepts with continuation learning. Whitney, Chang, Kulkarni, Tenenbaum https://arxiv.org/pdf/1602.06822.pdf

Contextual RNN-GANs for abstract reasoning diagram generation. Ghosh, Kulharia, Mukerjee, Namboodiri, Bansal https://arxiv.org/pdf/1609.09444.pdf

Interaction networks for learning about objects, relations and physics . Battaglia, Pascanu, Lai, Rezende, Kavukcuoglu https://arxiv.org/pdf/1612.00222.pdf https://github.com/jsikyoon/Interaction-networks_tensorflow https://github.com/higgsfield/interaction_network_pytorch https://github.com/ToruOwO/InteractionNetwork-pytorch

An uncertain future: forecasting from static images using Variational Autoencoders. Walker, Doersch, Gupta, Hebert https://arxiv.org/pdf/1606.07873.pdf

Unsupervised learning for physical interaction through video prediction. Finn, Goodfellow, Levine https://arxiv.org/pdf/1605.07157.pdf

Sequential neural models with stochastic layers. Fraccaro, Sonderby, Paquet, Winther https://arxiv.org/pdf/1605.07571.pdf https://github.com/marcofraccaro/srnn

Learning visual predictive models of physics for playing billiards. Fragkiadaki, Agrawal, Levine, Malik https://arxiv.org/pdf/1511.07404.pdf

Attend, infer, repeat: fast scene understanding with generative models. Eslami, Heess, Weber, Tassa, Szepesvari, Kavukcuoglu, Hinton https://arxiv.org/pdf/1603.08575.pdf http://akosiorek.github.io/ml/2017/09/03/implementing-air.html https://github.com/akosiorek/attend_infer_repeat

Synthesizing robotic handwriting motion by learning from human demonstrations. Yin, Alves-Oliveira, Melo, Billard, Paiva https://pdfs.semanticscholar.org/951e/14dbef0036fddbecb51f1577dd77c9cd2cf3.pdf?_ga=2.78226524.958697415.1583668154-397935340.1548854421

## 2015

Learning stochastic recurrent networks. Bayer, Osendorfer https://arxiv.org/pdf/1411.7610.pdf https://github.com/durner/STORN-keras

Deep Kalman Filters. Krishnan, Shalit, Sontag https://arxiv.org/pdf/1511.05121.pdf https://github.com/k920049/Deep-Kalman-Filter

Unsupervised learning of visual representations using videos. Wang, Gupta https://arxiv.org/pdf/1505.00687.pdf

Embed to control: a locally linear latent dynamics model for control from raw images. Watter, Springenberg, Riedmiller, Boedecker https://arxiv.org/pdf/1506.07365.pdf https://github.com/ericjang/e2c

## 2014

Seeing the arrow of time. Pickup, Pan, Wei, Shih, Zhang, Zisserman, Scholkopf, Freeman https://www.robots.ox.ac.uk/~vgg/publications/2014/Pickup14/pickup14.pdf

## 2012

Activity Forecasting. Kitani, Ziebart, Bagnell, Hebert http://www.cs.cmu.edu/~kkitani/pdf/KZBH-ECCV12.pdf

## 2006

Information flows in causal networks. Ay, Polani https://sfi-edu.s3.amazonaws.com/sfi-edu/production/uploads/sfi-com/dev/uploads/filer/45/5f/455fd460-b6b0-4008-9de1-825a5e2b9523/06-05-014.pdf

## 2002

Slow feature analysis. Wiskott, Sejnowski http://www.cnbc.cmu.edu/~tai/readings/learning/wiskott_sejnowski_2002.pdf

## n.d.

Learning variational latent dynamics: towards model-based imitation and control. Yin, Melo, Billard, Paiva https://pdfs.semanticscholar.org/40af/a07f86a6f7c3ec2e4e02665073b1e19652bc.pdf