Exploration Strategies in 'Little Alchemy 2': Uncertainty-Guided vs. Empowerment | Lecture notes Mechanics

Intrinsically Motivated Exploration as Empowerment

Franziska Br¨

andle1,+,*, Lena J. Stocks1,+, Joshua B. Tenenbaum2, Samuel J. Gershman3,

and Eric Schulz1

1Max Planck Institute for Biological Cybernetics, 72076 T¨

ubingen, Germany

2Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139

3Department of Psychology, Harvard University, Cambridge, MA 02138

*franziska.braendle@tuebingen.mpg.de. Max Planck Ring 8. 72076 T ¨

ubingen.

+these authors contributed equally to this work

ABSTRACT

Studies of human exploration frequently cast people as serendipitously stumbling upon good options. Yet these studies may

not capture the richness of exploration strategies that people exhibit in more complex environments. We study human behavior

in a large data set of 29,493 players of the richly-structured online game “Little Alchemy 2”. In this game, players start with

four elements, which they can combine to create up to 720 complex objects. We find that players are driven to create objects

that empower them to create even more objects. We find that this drive for empowerment is eliminated when people play

a version of the game that lacks recognizable semantics, indicating that they use their knowledge about the world to guide

their exploration. Our results suggest that the drive for empowerment may be a potent source of intrinsic motivation in richly

structured domains, particularly those that lack explicit reward signals.

Introduction

Exploration—seeking out potentially useful information—is prevalent in our everyday lives. From choosing a restaurant to

finding a suitable workplace, we need to explore our options to be able to make good decisions. A fundamental tension in all

these scenarios exists between exploring unknown options and exploiting known options. An algorithmic account of human

exploration must explain both what to explore and when to explore.

Psychologists and neuroscientists have extensively studied human exploration in simple and highly controlled multi-armed

bandit tasks

1,2

. In these tasks, participants choose between a set of options (“arms”), each associated with an unknown reward

distribution. It is the participants’ goal to maximize rewards by repeatedly sampling arms and collecting the resulting rewards.

Ideal agents should explore by combining the immediate reward and the value of information for each action; they can do so by

thinking through all possible future actions and calculating how much rewards would increase if more knowledge about the

reward distributions was collected. However, such optimal exploration strategies are computationally intractable. Researchers

have therefore focused on the heuristic strategies of exploration that humans might employ

3,4

. Some evidence suggests that

people use sophisticated uncertainty-based heuristics5–7.

We propose that human exploration strategies are richer than what has previously been described. In particular, we believe

that current models of human exploration do not capture the intrinsically motivated exploration strategies observed in the real

world

8–10

. As an example, consider how children play with their environment, curiously trying out new things in order to

understand and learn about the world; or how scientists explore and arbitrate between different hypotheses to advance our

collective knowledge. In many of these settings, direct rewards are very sparse and it is often not even clear what the reward is.

Yet people can spend time on activities without such rewards; these preferences reflect intrinsic exploratory drives. Current

laboratory tasks are not rich enough to study these types of behaviors quantitatively. We, therefore, propose to study human

exploration in more complex and richly-structured environments. One such environment is the online game “Little Alchemy

2”, in which players start out with four basic elements: water, fire, earth, and air. They can then use their intuitive semantic

understanding and always combine two elements, which sometimes leads to new elements. Each created element is added to an

inventory for use in future combinations (see Fig. 1a). The combination results are semantically meaningful (e.g., combining

water with fire produces steam, and can lead to increasingly complex elements, such as humans; see Fig. 1b). Game play

is not random: people selectively choose which elements to combine, and thereby follow particular paths through the vast

state space of element inventories. Importantly, players do not receive any extrinsic rewards during the game, yet may play

for several hours. Thus, we believe that “Little Alchemy 2” offers a better and more realistic testbed to investigate intrinsic

exploration strategies than many current laboratory tasks. In the current paper, we analyzed a large data set of 29,493 players

who collectively produced more than 4 million trials.

Exploration Strategies in 'Little Alchemy 2': Uncertainty-Guided vs. Empowerment, Lecture notes of Mechanics

Related documents

Partial preview of the text

Download Exploration Strategies in 'Little Alchemy 2': Uncertainty-Guided vs. Empowerment and more Lecture notes Mechanics in PDF only on Docsity!

Intrinsically Motivated Exploration as Empowerment

Franziska Br ¨andle 1,+,* , Lena J. Stocks 1,+ , Joshua B. Tenenbaum^2 , Samuel J. Gershman^3 ,

and Eric Schulz^1

ABSTRACT

Introduction

Extending Models of Human Exploration

Discussion

Conclusion

Material and Methods

Supporting Information

Table 2. Full regression results for model comparison run on the “Little Alchemy 2” data set.