Fused-Planes:
Why Train a Thousand Tri-Planes
When You Can Share?

ICLR 2026


* \(^\dagger\)equal contribution
1 Criteo AI Lab, Paris, France
2 LASTIG, Université Gustave Eiffel, IGN-ENSG, F-94160 Saint-Mandé
3 Université Côte d'Azur, CNRS, I3S, France

Abstract

Tri-Planar NeRFs enable the application of powerful 2D vision models for 3D tasks, by representing 3D objects using 2D planar structures. This has made them the prevailing choice to model large collections of 3D objects. However, training Tri-Planes to model such large collections is computationally intensive and remains largely inefficient. This is because the current approaches independently train one Tri-Plane per object, hence overlooking structural similarities in large classes of objects. In response to this issue, we introduce Fused-Planes, a novel object representation that improves the resource efficiency of Tri-Planes when reconstructing object classes, all while retaining the same planar structure. Our approach explicitly captures structural similarities across objects through a latent space and a set of globally shared base planes. Each individual Fused-Planes is then represented as a decomposition over these base planes, augmented with object-specific features. Fused-Planes showcase state-of-the-art efficiency among planar representations, demonstrating \(7.2 \times\) faster training and \(3.2 \times\) lower memory footprint than Tri-Planes while maintaining rendering quality. An ultra-lightweight variant further cuts per-object memory usage by \(1875 \times\) with minimal quality loss.

Method



Method Scheme

Method overview. A set of Fused-Planes \(\{T_i\}\) reconstructs a class of 3D objects \(\{O_i\}\) from their GT views \(\{x_{i,j}\}\), where \(i\) and \(j\) respectively denote the object and the view indices. For clarity, only one Fused-Planes is shown. (a) Each Fused-Planes \(T_i\) is formed from a micro plane \(T_i^\mathrm{mic}\) which captures object-specific information, and a macro plane \(T_i^\mathrm{mac}\) computed via a weighted summation over a set of shared base planes \(\mathcal{B}\). This base captures class-level information like structural similarities across objects. (b) View synthesis is performed in the latent space of an auto-encoder (\(E_\phi\), \(D_\psi\)) via classical volume rendering. The rendered latent image \(\tilde{z}_{i,j}\) (low resolution) is decoded to obtain the output RGB view (high resolution). (c) The Fused-Planes components (i.e. \(T_i^\mathrm{mic}\), \(\mathcal{B}\), \(W_i\)) and the autoencoder are supervised with three reconstructive losses.

Results

Resource Costs



Method Scheme

Resource costs overview. To reconstruct a large class of objects, one would consider three options: many per-scene models (e.g. INGP, 3DGS, or planar methods), a multi-scene method (e.g. CodeNeRF), or Fused-Planes. Our method presents the lowest per-object training time and memory footprint among all planar representations, while maintaining a similar rendering quality. Circle sizes represent the NVS quality.



Comparison with classical Tri-Planes

Shapenet Cars Scenes



Fused-Planes


Tri-Planes

Basel Faces Scenes

Fused-Planes
Tri-Planes

BibTeX


      @inproceedings{fused-planes,
        title={{Fused-Planes: Why Train a Thousand Tri-Planes When You Can Share?}},
        author={Karim Kassab and Antoine Schnepf and Jean-Yves Franceschi and Laurent Caraffa and Flavian Vasile and Jeremie Mary and Andrew Comport and Valérie Gouet-Brunet},
        booktitle={The Fourteenth International Conference on Learning Representations},
        year={2026},
        url={https://openreview.net/forum?id=bAG7lS1AUL}
      }