Insights from the Application of Temporal Difference Learning in Models of Foraging

Jan Teichmann

Chalmers Conferences, 9th European Conference on Mathematical and Theoretical Biology

Jan Teichmann

Last modified: 2014-03-28

Abstract

Predators have to secure a high energy intake in the face of changing anduncertain environments. Through the evolution of predator-prey interactionsmanifold mechanisms have emerged to avoid predation. So called secondary de-fences commonly involve the possession of toxins or deterrent substances whichare not directly observable by predators. However, many defended species useconspicuous signals as warning flags in combination with their secondary de-fences (aposematism). The field of aposematism has a renewed interest in therole of the predator and details of the predator’s aversive learning process. Asthe selective agent, aversive learning is an important aspect of predator avoid-ance.We present an experience-based aversive learning model of foraging be-haviour in uncertain environments [1]. In particular, we use Q-learning as amodel-free implementation of Temporal Difference learning motivated by grow-ing evidence for neural correlates in natural reinforcement settings. We gaininsights on how aversive learning influences foraging in uncertain environmentsand discuss similarities and differences to the net energy maximisation approachof classical optimal foraging theory.In our model the predator has the choice of including an aposematic preyin its diet or to forage on alternative food sources. We show how the preda-tor’s foraging behaviour and energy intake depends on toxicity of the defendedprey and the presence of Batesian mimics. We introduce the precondition ofexploration of the action space for successful aversion formation and show howit predicts foraging behaviour in the presence of conflicting rewards which isconditionally suboptimal in a fixed environment but allows better adaptation inchanging environments. We present fitness distributions of learning predatorscompared to a primary mutation strategy in a set of different environmentalconditions.

[1] Jan Teichmann et al. “The application of temporal difference learning inoptimal diet models.” Journal of theoretical biology, 340 (2014): 11-16.