Method overview. A set of Fused-Planes \(\{T_i\}\) reconstructs a class of 3D objects \(\{O_i\}\) from their GT views \(\{x_{i,j}\}\), where \(i\) and \(j\) respectively denote the object and the view indices. For clarity, only one Fused-Planes is shown. (a) Each Fused-Planes \(T_i\) is formed from a micro plane \(T_i^\mathrm{mic}\) which captures object-specific information, and a macro plane \(T_i^\mathrm{mac}\) computed via a weighted summation over a set of shared base planes \(\mathcal{B}\). This base captures class-level information like structural similarities across objects. (b) View synthesis is performed in the latent space of an auto-encoder (\(E_\phi\), \(D_\psi\)) via classical volume rendering. The rendered latent image \(\tilde{z}_{i,j}\) (low resolution) is decoded to obtain the output RGB view (high resolution). (c) The Fused-Planes components (i.e. \(T_i^\mathrm{mic}\), \(\mathcal{B}\), \(W_i\)) and the autoencoder are supervised with three reconstructive losses.
Resource costs overview. To reconstruct a large class of objects, one would consider three options: many per-scene models (e.g. INGP, 3DGS, or planar methods), a multi-scene method (e.g. CodeNeRF), or Fused-Planes. Our method presents the lowest per-object training time and memory footprint among all planar representations, while maintaining a similar rendering quality. Circle sizes represent the NVS quality.
@inproceedings{fused-planes,
title={{Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?}},
author={Karim Kassab and Antoine Schnepf and Jean-Yves Franceschi and Laurent Caraffa and Flavian Vasile and Jeremie Mary and Andrew Comport and Valérie Gouet-Brunet},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=bAG7lS1AUL}
}