ODIN, a single transformer-based model, can perform both 2D and 3D instance and semantic segmentation by alternating between within-view 2D fusion and cross-view 3D fusion.