Real-time Tracking and Pose Estimation of Guillemots

Thesis Project created by Matilda Hanes English
1y ago update

The study of animals and their behaviours does in many aspects further the knowledge and understanding of the environment. Current techniques for observing animals in the wild often include some sort of camera setup, e.g. camera traps, to collect data. Even though this is a good way of observing wildlife it still requires many hours of analysis. This study explores the use of machine learning for extracting information from wildlife video captures of guillemots (a type of seabird). The study focus on two main areas, object tracking and animal pose estimation. Tracking allows the observer to link information captured from consecutive snapshots of the data, and in combination with pose estimation, it enables analysis of animal behaviour from videos. 

Within target tracking, machine learning models are often used for re-identification after losing track of the object for a shorter time period. Training these models require annotated data with ID-labels for each object, and by introducing machine learning during inference the computational speed is often decreased. To limit the need for manual data annotation and to increase the computational speed of the model, we explore two tracking techniques without machine learning, SORT and ByteTrack, which rely on statistical predictions and velocity estimations of detections from a detector model.  Furthermore, model evaluation requires annotated data for each consecutive snapshot in a video segment, thus we develop a methodology for semi-automatic track generation which is used to generate ground truth tracks 

For animal pose estimation we contribute with a data set containing approximately 10 000 annotated key-points of guillemots. The data is collected from 9 different videos, recorded from two stations at Stora Karlsö, Bonden5 and Farallon3. This data set is used for training a multi-animal pose estimation model with the DeepLabCut framework.  Furthermore, we provide a new approach for mapping key-points to individual birds using region proposals from a YOLO detector model in combination with part affinity fields.  Our results indicate an increased performance of assembling individuals using YOLO compared with only relying on part affinity fields.

In conclusion we present a real-time approach for acquiring high accuracy tracks of guillemots from wildlife video data. By not only producing tracks for the position of the birds, but by combining tracking with pose estimation, it would be possible to in detail follow their movement patterns. We believe that this has a high applicability in the study of animal behaviour, and that by utilising machine learning the time spent on manual analysis could be decreased.


Do you want to learn more about this project? Please contact us at

Matilda Hanes,

Shreyash Kad,


Engineering, Research & Development
CNN, DNN, Image Analysis