Title Icon Unifying Appearance Codes and Bilateral Grids for Driving Scene Gaussian Splatting

1AIR, Tsinghua University 2Beihang University 3Nanyang Technological University 4Shanghai Jiao Tong University 5Eastern Institute of Technology, Ningbo 6Tongji University 7The Chinese University of Hong Kong 8The Chinese University of Hong Kong, Shenzhen 9University of Trento 10Zhejiang University 11Lightwheel AI 12LeddarTech
Corresponding Author

TL;DR: we introduced Multi-Scale Bilateral Grids that unifies appearance codes and bilateral grids, significantly improves geometric accuracy in dynamic, decoupled autonomous driving scene reconstruction.

Abstract

Neural rendering techniques, including NeRF and Gaussian Splatting (GS), rely on photometric consistency to produce high-quality reconstructions. However, in real-world scenarios, it is challenging to guarantee perfect photometric consistency in acquired images. Appearance codes have been widely used to address this issue, but their modeling capability is limited, as a single code is applied to the entire image. Recently, the bilateral grid was introduced to perform pixel-wise color mapping, but it is difficult to optimize and constrain effectively. In this paper, we propose a novel multi-scale bilateral grid that unifies appearance codes and bilateral grids. We demonstrate that this approach significantly improves geometric accuracy in dynamic, decoupled autonomous driving scene reconstruction, outperforming both appearance codes and bilateral grids. This is crucial for autonomous driving, where accurate geometry is important for obstacle avoidance and control. Our method shows strong results across four datasets: Waymo, NuScenes, Argoverse, and PandaSet. We further demonstrate that the improvement in geometry is driven by the multi-scale bilateral grid, which effectively reduces floaters caused by photometric inconsistency.



Framework

We unify appearance codes with multi-scale bilateral grids. Initially, a coarse rendering is obtained from a Gaussian scene graph. This rendered image is then processed by our multi-scale bilateral grids to perform detailed per-pixel color modeling, guided by a luminance-based map through slice and fusion operations.