Self-Supervised LIDAR-Transformer for Automotive Application

Thesis Project created by Johan Jaxing English
1y ago update
Self-supervision has increased performance within various domains, such as natural language processing and 2D computer vision, by enabling the usage of unannotated data during pre-training. Using unannotated data to increase performance is desired since annotating data is time-consuming and expensive. However, within the automotive LIDAR domain, the approach remains unexplored.

Inspired by the success of self-supervision within the 2D image domain, this research investigates how self-supervision could be translated to the 3D domain, specifically to LIDAR point clouds. The downstream task performance is evaluated on 3D Object Detection (OD). We extend the ‘supervised’ Single-stride Sparse Transformer architecture, as it provides state-of-the-art (SOTA) performance on various Waymo Open Dataset benchmarks. Specifically, we implemented a masking strategy for the pre-training, aimed to help the model learn point cloud representations.

Our results show that the proposed method of self-supervised pre-training increases the 3D OD performance of the model compared to when trained without pre-training. When using the nuScenes dataset and omitting intensity, the mAP rises from 49.08 to 51.95, and NDS from 0.6075 to 0.6216. While, when including intensity, the mAP increases from 53.39 to 55.14, and NDS from 0.6295 to 0.6400. When pre-training with data from the same dataset, there is a greater performance increase as the ratio of labelled data to all data decreases. The difference in performance follows a logarithmic trend, which indicates that the performance of the model increases logarithmically with the addition of more unlabelled data.


Engineering, Research & Development
Better Quality, More Efficient, Saving Cost
Discovery, Vision
DNN, Machine Learning, Self/Unsupervised, Transformer