FlexiBiT: Flexible Inference in Sequential Decision Problems via Bidirectional Transformers

Published in ICLR GPL Workshop 2022 , 2022

Micah Carroll, Jessy Lin, Orr Paradise, Raluca Georgescu, Mingfei Sun, David Bignell, Stephanie Milani, Katja Hofmann, Matthew Hausknecht, Anca Dragan, Sam Devlin

Abstract: Randomly masking sub-portions of sentences has been a very successful approach in training natural language processing models for a variety of tasks. In this work, we observe that the same idea also applies naturally to sequential decision making, where many traditional tasks like behavior cloning, offline RL, inverse dynamics, or planning correspond to different sequence maskings. We introduce the FlexiBiT framework, which enables to flexibly specify models which can be trained on many different sequential decision making tasks. Experimentally, we show that we can train a single FlexiBiT model to perform all tasks with performance similar to or better than specialized models, and that such performance can be further improved by fine-tuning this general model on the task of interest.