Fused-Planes architecture and training framework. We learn a set of Fused-Planes \(\mathcal{T} = \{T_i\}\) in the latent space of an autoencoder, denoted by the encoder \(E_\phi\) and the decoder \(D_\psi\). Hence, Fused-Planes render latent images \(\tilde{z}_{i,j}\) with reduced resolution, enabling faster rendering and training. Each Fused-Plane \(T_i\) is split into a micro plane \(T_i^\mathrm{mic}\) which captures scene specific information, and a macro plane \(T_i^\mathrm{mac}\) computed via a weighted summation over \(M\) shared base planes \(\mathcal{B}\), with weights \(W_i\). The shared planes \(\mathcal{B}\) capture common structure across scenes. To learn our set of Fused-Planes, we start by training a first subset of micro planes \(\mathcal{T}_1^\mathrm{mic}\), their corresponding weights \(W_i\) and the base planes \(\mathcal{B}\), jointly with the encoder \(E_\phi\) and decoder \(D_\psi\). Subsequently, we learn the remaining scenes by training the micro planes \(\mathcal{T}_2^\mathrm{mic}\) and their corresponding weights \(W_i\) while fine-tuning \(\mathcal{B}\) and \(D_\psi\).
Overview: NeRF methods for MSIG. Comparison of resource costs and rendering quality across recent works when training a scene. Circle sizes represent the NVS quality. Our method presents the lowest training time and memory footprint among all planar representations, while maintaining a similar rendering quality. Fused-Planes-ULW presents the lowest memory requirement.
@article{fused-planes,
title={{Fused-Planes: Improving Planar Representations for Learning Large Sets of 3D Scenes}},
author={Karim Kassab and Antoine Schnepf and Jean-Yves Franceschi and Laurent Caraffa and Flavian Vasile and Jeremie Mary and Andrew Comport and Valérie Gouet-Brunet},
journal={arXiv preprint arXiv:2410.23742},
year={2025}
}