Shelf-Supervised Cross-Modal Pre-Training for 3D Object Detection

1 IIIT Delhi   2 Carnegie Mellon University   3 Georgia Institute of Technology   * Equal contribution



Paper Code


CM3D Overview



Abstract

State-of-the-art 3D object detectors are often trained on massive labeled datasets. However, annotating 3D bounding boxes remains expensive and time-consuming, especially for LiDAR. We propose a shelf-supervised approach leveraging off-the-shelf image foundation models to pre-train 3D detectors. Our method improves detection accuracy in semi-supervised settings by generating zero-shot 3D bounding boxes using multimodal data from RGB and LiDAR. We demonstrate the effectiveness of our approach on nuScenes and WOD benchmarks, significantly outperforming prior self-supervised methods.


CM3D Pseudo-Label Pipeline
Our Pseudo-labeling Pipeline


Results

In the zero-shot setting, our method significantly outperforms the only prior method, SAM3D in both mAP and NDS metrics on the nuScenes dataset.

Method mAP NDS
CM3D (Ours) 23.0% 22.1%
SAM3D 1.7% 2.4%



In the semi-supervised setting, our method also outperforms prior methods in both mAP and NDS metrics when less data is available.

LiDAR-only models
Training Data Method mAP ↑ NDS ↑
5% CenterPoint + PRC 38.2 46.0
5% CenterPoint + CM3D (Ours) 46.3 48.8
10% CenterPoint + PRC 44.1 53.1
10% CenterPoint + CM3D (Ours) 51.0 56.3
20% CenterPoint + PRC 49.5 58.9
20% CenterPoint + CM3D (Ours) 54.5 59.0
LiDAR + RGB models
Training Data Method mAP ↑ NDS ↑
5% CALICO 41.7 47.9
5% BEVFusion + CM3D (Ours) 51.3 52.5
10% CALICO 50.0 53.9
10% BEVFusion + CM3D (Ours) 53.3 56.5
20% CALICO 54.8 59.5
20% BEVFusion + CM3D (Ours) 56.2 60.2



Visuals

CM3D Visuals




Citation

@inproceedings{
    khurana2024shelfsupervised,
    title={Shelf-Supervised Multi-Modal Pre-Training for 3D Object Detection},
    author={Mehar Khurana and Neehar Peri and James Hays and Deva Ramanan},
    booktitle={8th Annual Conference on Robot Learning},
    year={2024},
    url={https://openreview.net/forum?id=eeoX7tCoK2}
}