Below we provide an interactive comparison of π³ predicted (pre-trained vs fine-tuned) and ground-truth point maps on five randomly selected scenes from the UnSceneRecon test set. VGGT and WorldMirror results are omitted due to size constraints (but static visualizations are provided in Figure 4 in the accompanying supplementary PDF document). The grey bar shows the scene name and REC in meters of both models (pre-trained and fine-tuned).
As illustrated below, the quality of the dense prediction head is not compromised, even though our alignment scheme does not directly supervise these dense predictions.
Controls: