Action Detection for Wildlife Monitoring with Camera Traps Based on Segmentation with Filtering of Tracklets (SWIFT) and Mask-Guided Action Recognition (MAROON)

Research output: Contribution to journalResearch articleContributedpeer-review

Contributors

  • Frank Schindler - , University of Bonn (Author)
  • Volker Steinhage - , University of Bonn (Author)
  • Suzanne T.S. van Beeck Calkoen - , Chair of Forest Zoology, Bavarian Forest National Park, University of Göttingen (Author)
  • Marco Heurich - , Bavarian Forest National Park, University of Freiburg, Inland Norway University of Applied Sciences (Author)

Abstract

Behavioral analysis of animals in the wild plays an important role for ecological research and conservation and has been mostly performed by researchers. We introduce an action detection approach that automates this process by detecting animals and performing action recognition on the detected animals in camera trap videos. Our action detection approach is based on SWIFT (segmentation with filtering of tracklets), which we have already shown to successfully detect and track animals in wildlife videos, and MAROON (mask-guided action recognition), an action recognition network that we are introducing here. The basic ideas of MAROON are the exploitation of the instance masks detected by SWIFT and a triple-stream network. The instance masks enable more accurate action recognition, especially if multiple animals appear in a video at the same time. The triple-stream approach extracts features for the motion and appearance of the animal. We evaluate the quality of our action recognition on two self-generated datasets, from an animal enclosure and from the wild. These datasets contain videos of red deer, fallow deer and roe deer, recorded both during the day and night. MAROON improves the action recognition accuracy compared to other state-of-the-art approaches by an average of 10 percentage points on all analyzed datasets and achieves an accuracy of (Formula presented.) on the Rolandseck Daylight dataset, in which 11 different action classes occur. Our action detection system makes it possible todrasticallyreduce the manual work of ecologists and at the same time gain new insights through standardized results.

Details

Original languageEnglish
Article number514
Number of pages17
JournalApplied Sciences (Switzerland)
Volume14 (2024)
Issue number2
Publication statusPublished - 6 Jan 2024
Peer-reviewedYes

Keywords

Keywords

  • action detection for deer, deep learning, mask-supported action recognition, triple-stream convolutional neural network, video instance segmentation, wildlife monitoring