|
|
|
|
|
|
|
|
|
|
[code] |
[arxiv] |
[paper] |

illustration by AIEKU

The overall schematic of our multi-agent system.
All agents share the same individual model architecture, but each agent is independently trained to learn to auto-encode its own observation and use the learned representation for communication. At each time step, each agent observes an image representation of the environment as well as messages broadcasted by other agents during the last time step. The image pixels are processed through an Image Encoder; the broadcasted messages are processed through a Message Encoder; the image features and the message features are concatenated and passed through a Policy Network to predict the next action. The image features are also used to generate the next communication messages using the Communication Autoencoder.
We design a two-agent CIFAR Game following the setup of Multi-Step MNIST Game in [18], but with CIFAR-10 dataset instead.

We introduce two new grid environments adapted from the GridWorld environment.

Our proposed model consistently outperform baselines across various multi-agent reinforcement learning (MARL) environments.

We observed that the communication message clusters correspond to various meaningful phases throughout each task.
In visualization of messages from the RedBlueDoors environment, the communication symbol of the purple cluster corresponds to when no doors are visible by either agent, and the light green cluster corresponds to when the red door is opened.

We visualize entropies of action policies throughout each MarlGrid task (lower is better). Agents with our proposed grounding transmits messages that are effectively used by other agents.
@misc{lin2021learning,
title={Learning to Ground Multi-Agent Communication with Autoencoders},
author={Toru Lin and Minyoung Huh and Chris Stauffer and Ser-Nam Lim and Phillip Isola},
year={2021},
eprint={2110.15349},
archivePrefix={arXiv},
primaryClass={cs.LG}
}