Object Representations for Learning and Reasoning
Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS)
December 11-12, 2020, Virtual Workshop
Recent advances in deep reinforcement learning and robotics have enabled agents to achieve superhuman performance on a variety of challenging games [1-4] and learn robotic skills [5-7]. While these results are very promising, several open problems remain. In order to function in real-world environments, learned policies must be both robust to input perturbations and be able to rapidly generalize or adapt to novel situations. Moreover, to collaborate and live with humans in these environments, the goals and actions of embodied agents must be interpretable and compatible with human representations of knowledge. Hence, it is natural to consider how humans so successfully perceive, learn, and plan to build agents that are equally successful.
There is much evidence to suggest that objects are a core level of abstraction at which humans perceive and understand the world [8,9]. Objects have the potential to provide a compact, causal, robust, and generalizable representation of the world. They may be used effectively in a variety of important learning and control tasks, including learning environment models, decomposing tasks into subgoals, and learning task- or situation-dependent object affordances. Recently, there have been many advancements in scene representation, allowing scenes to be represented by their constituent objects, rather than at the level of pixels [10-14]. While these works have shown promising results, there is still a lack of agreement on how to best represent objects, how to learn object representations, and how best to leverage them in agent training.
In this workshop we seek to build a consensus on what object representations should be by engaging with researchers from developmental psychology. Furthermore, we aim to define concrete tasks and capabilities that agents building on top of such abstract representations of the world should succeed at. We will discuss how object representations may be learned through invited presenters with expertise in unsupervised and supervised object representation learning methods. Finally, we will start conversations on new frontiers in object learning, both through a panel and speaker series as well as a broader call to the community for research on applications of object representations.
Workshop Goals and Outcomes
- To host some of the world’s top child developmentalists, roboticists, and machine learning researchers in our set of talks and panels. Objects are a primary concept in leading theories in developmental psychology on how young children explore and learn about the physical world. It has also been shown that objects are useful abstractions in designing machine learning algorithms for embodied agents. We hope such an opportunity will enable AI researchers to learn from cognitive scientists about core properties of objects that can be translated into inductive biases for algorithms, and cognitive scientists to learn from AI researchers the challenges in representing objects from real data.
- To define what an object representation is in terms of what it should do. Specifically, we seek to create a consensus on the types of environments and tasks that will best explore the advantages and disadvantages of differing approaches to building object representations.
- To develop an understanding of key challenges for applying object representations to real-world systems. We facilitate this discussion by inviting experts on robotic control to talk about constraints, requirements, and use-cases for real-world perception. More generally, by viewing object perception and representation as a part of a larger system, we will facilitate a discussion about synergistic ways in which perception and control can interact. To that extent we will host a panel that includes experts from these respective domains.
To provide a hub for the object learning community and a venue for object representation and
learning research focusing on a broader set of questions, including:
- How can object representations be learned? How to learn object representations is still an open question. Many recent papers have explored this, often in simplified environments, but it is still unclear how an object representation of a realistic 3D environment may be quickly learned.
- Applications of object representations to language, explainability, and other areas. The close alignment between object representations and human perception and understanding lends them to research areas that involve human interaction or data. In particular, we are excited to see research on intersections of grounded NLP and objects  and agent/robot interface and explainability using objects .
- Silver, David, et al. "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm."
- Berner, Christopher, et al. "DOTA 2 with Large Scale Deep Reinforcement Learning."
- Vinyals, Oriol, et al. "Alphastar: Mastering the Real-Time Strategy Game Starcraft II."
- Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning."
- Andrychowicz, OpenAI: Marcin, et al. "Learning dexterous in-hand manipulation."
- Kalashnikov, Dmitry, et al. "Qt-opt: Scalable deep reinforcement learning for vision-based robotic manipulation."
- Zeng, Andy, et al. "Learning synergies between pushing and grasping with self-supervised deep reinforcement learning."
- Spelke, Elizabeth. “Principles of Object Perception.”
- Renée Baillargeon. “Physical reasoning in infancy”
- Goel, Vikash, et al. “Unsupervised Video Object Segmentation for Deep Reinforcement Learning.”
- Greff, Klaus, et al. “Multi-Object Representation Learning with Iterative Variational Inference.”
- Anand, Ankesh, et al. “Unsupervised State Representation Learning in Atari”
- Kulkarni, Tejas et al. “Unsupervised Learning of Object Keypoints for Perception and Control.”
- Lin, Zhixuan, et al. “Space: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition.”
- Bisk, Yonatan, et al. "Experience Grounds Language."
- Shridhar, Mohit, and David Hsu. "Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction."