Object Representations for Learning and Reasoning

Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS)

December 11, 2020, Virtual Workshop

Join via the livestream 🎥 · RocketChat · @ORLR_Workshop · #ORLR2020 · Join our community Slack!

Dynamic Regions Graph Neural Networks for Spatio-Temporal Reasoning

Iulia Duta, Andrei L Nicolicioiu, and Marius Leordeanu
PDF

Abstract

Graph Neural Networks are perfectly suited to capture latent interactions occurring in the spatio-temporal domain (e.g. videos) but when an explicit structure is not available, it is not obvious what atomic elements should be represented as nodes. For video processing, we design nodes that are clearly localised in space, with an inductive bias for modeling the relations between instances. Current works are using external object detectors or fixed regions to extract graph nodes, while we propose a module for generating the regions associated with each node dynamically, without explicit object-level supervision. Constructing these localised, adaptive nodes gives our model a bias towards object-centric representations and we show that it improves the modeling of visual interactions. By relying on a few localized nodes, our method learns to focus on salient regions leading to a more explainable model. Our model achieves superior results on video classification tasks involving instance interactions.