Object Representations for Learning and Reasoning

Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS)

December 11, 2020, Virtual Workshop

Join via the livestream 🎥 · RocketChat · @ORLR_Workshop · #ORLR2020 · Join our community Slack!

Learning Long-term Visual Dynamics with Region Proposal Interaction Networks

  • Haozhi Qi, Xiaolong Wang, Deepak Pathak, Yi Ma, and Jitendra Malik
  • PDF


Learning long-term dynamics models is the key to understanding physical common sense. Most existing approaches on learning dynamics from visual input sidestep long-term predictions by resorting to rapid re-planning with short-term models. This not only requires such models to be super accurate but also limits them only to tasks where an agent can continuously obtain feedback and take action extit{at each step} until completion. In this paper, we aim to leverage the ideas from success stories in visual recognition tasks to build object representations that can capture inter-object and object-environment interactions over a long range. To this end, we propose Region Proposal Interaction Networks (RPIN), which reason about each object's trajectory in a latent region-proposal feature space. Our approach outperforms prior methods by a significant margin both in terms of prediction quality and their ability to plan for downstream tasks, and also generalize well to novel environments. Results are available at https://sites.google.com/view/orlr-workshop-rpin.