Learning to Ground Multi-Agent Communication with Autoencoders

Learning to Ground Multi-Agent Communication
with Autoencoders

Toru Lin¹

Minyoung Huh¹

Chris Stauffer²

Sernam Lim²

Phillip Isola¹

¹MIT CSAIL

²Facebook AI

[code]

[arxiv]

[paper]

Abstract

Communication requires having a common language, a lingua franca, between agents. This language could emerge via a consensus process but it may require many generations of trial and error. Alternatively, the lingua franca can be given by the environment, where agents ground their language in representations of the observed world. We demonstrate a simple way to ground language in learned representations, which facilitates decentralized multi-agent communication and coordination. We find that a standard representation learning algorithm -- autoencoding -- is sufficient for arriving at a grounded common language. When agents broadcast these representations, they learn to understand and respond to each other's utterances, and achieve surprisingly strong task performance across a variety of multi-agent communication environments.

illustration by AIEKU

Model and Environments

Model Overview

The overall schematic of our multi-agent system.
All agents share the same individual model architecture, but each agent is independently trained to learn to auto-encode its own observation and use the learned representation for communication. At each time step, each agent observes an image representation of the environment as well as messages broadcasted by other agents during the last time step. The image pixels are processed through an Image Encoder; the broadcasted messages are processed through a Message Encoder; the image features and the message features are concatenated and passed through a Policy Network to predict the next action. The image features are also used to generate the next communication messages using the Communication Autoencoder.

CIFAR Game Environment

We design a two-agent CIFAR Game following the setup of Multi-Step MNIST Game in [18], but with CIFAR-10 dataset instead.

MarlGrid Environment

We introduce two new grid environments adapted from the GridWorld environment.

Video

Results

Comparison with Baselines

Our proposed model consistently outperform baselines across various multi-agent reinforcement learning (MARL) environments.

Communication clusters

We observed that the communication message clusters correspond to various meaningful phases throughout each task.
In visualization of messages from the RedBlueDoors environment, the communication symbol of the purple cluster corresponds to when no doors are visible by either agent, and the light green cluster corresponds to when the red door is opened.

Policy entropy with communication

We visualize entropies of action policies throughout each MarlGrid task (lower is better). Agents with our proposed grounding transmits messages that are effectively used by other agents.

Supplemental Results

The following demo videos compare ae-comm with ae-rl-comm, no-comm, and rl-comm agents in the MarlGrid environments.

Please use the dropdown menu to select from a list of 10 examples for each environment.

RedBlueDoors

A reward of 1 is given to both agents if and only if the red door is opened first and then the blue door. If the blue door is opened first, no reward is given and episode ends immediately.

FindGoal

Each agent receives a reward of 1 when they reach the goal, and an additional reward of 1 when all 3 agents reach the goal within the maximum episode length.

Citation

        @misc{lin2021learning,
              title={Learning to Ground Multi-Agent Communication with Autoencoders}, 
              author={Toru Lin and Minyoung Huh and Chris Stauffer and Ser-Nam Lim and Phillip Isola},
              year={2021},
              eprint={2110.15349},
              archivePrefix={arXiv},
              primaryClass={cs.LG}
        }

Acknowledgements

We sincerely thank all the anonymous reviewers for their extensive discussions on and valuable contributions to this paper. We thank Lucy Chai and Xiang Fu for helpful comments on the manuscript. We thank Jakob Foerster for providing inspiring advice on multi-agent training. Additionally, TL would like to thank Sophie and Sofia; MH would like to thank Sally, Leo and Mila; PI would like to thank Moxie and Momo. This work was supported by a grant from Facebook.

Website template edited from Colorful Colorization