Raster2Seq

Polygon Sequence Generation for Floorplan Reconstruction

Cornell University

SIGGRAPH 2026

Raster2Seq teaser image

We illustrate results on held-out CubiCasa5K test samples (left) and on real-world WAFFLE floorplan images (right).

CubiCasa5K Semantic Categories

Outdoor Kitchen Living room Bedroom Bath Entry Storage Garage Undefined
Door Window


TL;DR: We introduce Raster2Seq, an approach that transforms rasterized floorplan images to vectorized format using a labeled polygon sequence representation.


Abstract

Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task in which floorplan elements—such as rooms, windows, and doors—are represented as labeled polygon sequences that jointly encode geometry and semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D, CubiCasa5K, and Raster2Graph, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.


Qualitative Results

Each example shows a pair of images: the input image and the output reconstruction generated by Raster2Seq. Swipe or click a label to browse results across Structured3D-B, CubiCasa5K, and Raster2Graph datasets, or navigate to WAFFLE for qualitative comparisons on real-world Internet floorplans.
*Structured3D-B denotes our binary raster version of Structured3D, constructed from ground-truth annotations to resemble standard floorplan drawings rather than the density-map inputs used in the original dataset.

Living Room Kitchen Bedroom Bathroom Balcony Corridor Dining room Study Studio Store room Garden Laundry room Office Basement Garage Misc
Door Window
Structured3D sample 3250 ground-truth image
Input Image
Structured3D sample 3250 predicted floorplan
Output Reconstruction
Structured3D sample 3253 ground-truth image
Input Image
Structured3D sample 3253 predicted floorplan
Output Reconstruction
Structured3D sample 3268 ground-truth image
Input Image
Structured3D sample 3268 predicted floorplan
Output Reconstruction
Structured3D sample 3274 ground-truth image
Input Image
Structured3D sample 3274 predicted floorplan
Output Reconstruction
Structured3D sample 3277 ground-truth image
Input Image
Structured3D sample 3277 predicted floorplan
Output Reconstruction
Structured3D sample 3301 ground-truth image
Input Image
Structured3D sample 3301 predicted floorplan
Output Reconstruction
Outdoor Kitchen Living room Bed room Bath Entry Storage Garage Undefined
Door Window
CubiCasa5K sample 6028 ground-truth image
Input Image
CubiCasa5K sample 6028 predicted floorplan
Output Reconstruction
CubiCasa5K sample 6170 ground-truth image
Input Image
CubiCasa5K sample 6170 predicted floorplan
Output Reconstruction
CubiCasa5K sample 6197 ground-truth image
Input Image
CubiCasa5K sample 6197 predicted floorplan
Output Reconstruction
CubiCasa5K sample 6251 ground-truth image
Input Image
CubiCasa5K sample 6251 predicted floorplan
Output Reconstruction
CubiCasa5K sample 6261 ground-truth image
Input Image
CubiCasa5K sample 6261 predicted floorplan
Output Reconstruction
CubiCasa5K sample 6265 ground-truth image
Input Image
CubiCasa5K sample 6265 predicted floorplan
Output Reconstruction
Unknown Living room Kitchen Bedroom Bathroom Restroom Balcony Closet Corridor Washing room PS Outside
Raster2Graph sample 010332 ground-truth image
Input Image
Raster2Graph sample 010332 predicted floorplan
Output Reconstruction
Raster2Graph sample 010335 ground-truth image
Input Image
Raster2Graph sample 010335 predicted floorplan
Output Reconstruction
Raster2Graph sample 010338 ground-truth image
Input Image
Raster2Graph sample 010338 predicted floorplan
Output Reconstruction
Raster2Graph sample 010339 ground-truth image
Input Image
Raster2Graph sample 010339 predicted floorplan
Output Reconstruction
Raster2Graph sample 010340 ground-truth image
Input Image
Raster2Graph sample 010340 predicted floorplan
Output Reconstruction
Raster2Graph sample 010341 ground-truth image
Input Image
Raster2Graph sample 010341 predicted floorplan
Output Reconstruction

Qualitative comparison with RoomFormer on unseen WAFFLE floorplan images; both models are trained on CubiCasa5K. Our model exhibits stronger generalization capabilities over real-world Internet data.

Church of Saint James

Church of Saint James the Greater in Rovny input floorplan image
Input Image
RoomFormer reconstruction for Church of Saint James the Greater in Rovny
RoomFormer
Raster2Seq reconstruction for Church of Saint James the Greater in Rovny
Raster2Seq

Teltow Canal Power Station

Teltow Canal Power Station input floorplan image
Input Image
RoomFormer reconstruction for Teltow Canal Power Station
RoomFormer
Raster2Seq reconstruction for Teltow Canal Power Station
Raster2Seq

Church of Saint Nicholas

Church of Saint Nicholas input floorplan image
Input Image
RoomFormer reconstruction for Church of Saint Nicholas
RoomFormer
Raster2Seq reconstruction for Church of Saint Nicholas
Raster2Seq

Imkerhaus

Imkerhaus input floorplan image
Input Image
RoomFormer reconstruction for Imkerhaus
RoomFormer
Raster2Seq reconstruction for Imkerhaus
Raster2Seq

Palais du Louvre

Palais du Louvre input floorplan image
Input Image
RoomFormer reconstruction for Palais du Louvre
RoomFormer
Raster2Seq reconstruction for Palais du Louvre
Raster2Seq

Palmer Mansion

Palmer Mansion input floorplan image
Input Image
RoomFormer reconstruction for Palmer Mansion
RoomFormer
Raster2Seq reconstruction for Palmer Mansion
Raster2Seq

Please refer to our Interactive Visualization for more qualitative comparisons.


Quantitative Results

Results on Standard Benchmarks

Quantitative comparison on Structured3D, CubiCasa5K, and Raster2Graph datasets, evaluating F1 scores across geometric predictions (Room, Corner, Angle) and semantic predictions (Room Semantic, Window & Door).

We compare performance over the raster-to-vector conversion task across three datasets. Overall, our method achieves state-of-the-art performance on both structural metrics (Room and Corner) and semantic metrics (Room Semantic and Window & Door).

Note that not all models include semantic predictions, and Raster2Graph does not include Window & Door annotations. The Raster2Graph model can only be evaluated on its own dataset because it requires per-corner neighboring room-class annotations.
Method Room Corner Angle Room Semantic Window & Door
Structured3D-B
HEAT 94.7 84.5 79.6 - -
PolyRoom 98.9 96.0 91.9 - -
FRI-Net 96.5 85.4 83.3 - -
RoomFormer 95.1 91.7 83.2 74.2 94.1
Ours 99.6 98.3 92.7 76.9 98.5
CubiCasa5K
HEAT 78.2 53.7 32.3 - -
PolyRoom 54.1 37.1 23.0 - -
FRI-Net 77.1 50.8 38.0 - -
RoomFormer 83.5 55.5 34.1 63.0 78.5
Ours 88.7 59.4 37.4 63.8 77.8
Raster2Graph
HEAT 95.9 79.7 50.9 - -
PolyRoom 56.9 42.4 23.8 - -
FRI-Net 91.5 72.3 52.8 - -
RoomFormer 91.9 74.5 51.1 79.5 -
Raster2Graph 95.0 78.3 67.3 83.4 -
Ours 97.0 80.3 66.6 85.1 -

Model Robustness To Floorplan Complexity

We report the Room F1 performance of RoomFormer, FRI-Net, and our model across varying numbers of polygons and corners on the Structured3D-B and CubiCasa5K datasets. Our method consistently demonstrates greater robustness as floorplan complexity increases.

Performance vs. floorplan complexity—as approximated by the total number of polygons (left) and the total number of corners (right). As illustrated above over Structured3D-B (top) and CubiCasa5K (bottom), our approach yields larger gains as the floorplan complexity increases.

Model Generalization

We perform a cross-evaluation experiment across different train-test dataset configurations. We evaluate performance using metrics reported previously, using RoomF1 for the CubiCasa5K and Raster2Graph datasets and IoU for WAFFLE. Cross-evaluation heatmaps show performance across evaluation datasets (rows) and training datasets (columns), with hotter colors denoting higher performance.



How does it work?


Existing methods predicts all structural floorplan elements simultaneously, which makes them struggled to faithfully reconstruct structure and semantics of complex floorplan. Our key observation, motivating our framework design, is that floorplan elements can be effectively modeled as a sequence, directly capturing both spatial structure and semantic attributes. This allows us to decompose floorplan reconstruction into interpretable prediction steps that mirror the natural CAD design workflow, while naturally handling variable-length polygons.


📜 Labeled corner sequence representation. Each polygon is represented as a sequence of labeled corners — spatial coordinates paired with semantic labels (rooms, windows, doors) — and polygons are sorted left-to-right across the floorplan. This representation naturally accommodates inputs and outputs of variable lengths.


🔗 Anchor-based autoregressive decoder. The core of our framework predicts the next labeled corner by fusing image features and previously generated corners, guided by learnable anchors that steer attention toward informative image regions for efficient handling of complex floorplans.


🏷️ Token-level semantic supervision. A per-corner semantic classification loss applied to individual corner embeddings preserves semantic fidelity throughout autoregressive generation.


Raster2Seq System

Given a rasterized floorplan image (left), Raster2Seq converts it into a vectorized representation as a labeled polygon sequence, with polygons delimited by special <SEP> tokens. The core component is an anchor-based autoregressive decoder that predicts the next token from image features (\(f_\text{img}\)), learnable anchors (\(v_\text{anc}\)), and previously generated tokens. Above, we visualize the first two predicted labeled polygons (in orange and pink, respectively).


BibTeX

@inproceedings{phung2026raster2seq,
  title = {Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction},
  author = {Phung, Hao and Averbuch-Elor, Hadar},
  booktitle={Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Papers},
  year = {2026},
}