Abstract

Symmetry plays a vital role in understanding structural patterns, aiding object recognition and scene interpretation. This paper focuses on rotation symmetry, where objects remain unchanged when rotated around a central axis, requiring detection of rotation centers and supporting vertices. Traditional methods relied on hand-crafted feature matching, while recent segmentation models based on convolutional neural networks (CNNs) detect rotation centers but struggle with 3D geometric consistency due to viewpoint distortions. To overcome this, we propose a model that directly predicts rotation centers and vertices in 3D space and projects the results back to 2D while preserving structural integrity. By incorporating a vertex reconstruction stage enforcing 3D geometric priors—such as equal side lengths and interior angles—our model enhances robustness and accuracy. Experiments on the DENDI dataset show superior performance in rotation axis detection and validate the impact of 3D priors through ablation studies.

Motivation

Figure 1. Rotation symmetry detection models and results. (a) 3D detection baseline model without geometric priors, and (b) its qualitative results. (c) Our 3D detection model with geometric priors, and (d) its corresponding qualitative results. The results highlight the benefits of incorporating 3D geometric constraints.

Proposed Method

Figure 2. Overall pipeline. The input image is processed through a backbone and transformer encoder with camera queries. The detection head predicts the 3D rotation center, seed vertex, rotation axis, and symmetry group. The seed vertex is then duplicated according to the predicted symmetry group before the 3D coordinates are projected to 2D.

Feature Learning

Figure 3. Camera Cross Attention. The 3D reference point grids in camera coordinates are projected onto image coordinates to query the backbone image features.

The rotation symmetry detector predicts rotation centers and vertices in 3D camera coordinates. To transform backbone features from image coordinates to camera coordinates, we introduce camera queries—a set of grid-shaped learnable parameters denoted as \( \mathbf{Q} \in \mathbb{R}^{C \times N_x \times N_y} \). Here, \( N_x \) and \( N_y \) represent the spatial dimensions along the \( x \)- and \( y \)-axes, while \( C \) is the embedding dimension. Each query \( \mathbf{Q}_q \in \mathbb{R}^{C} \), located at \( \mathbf{p}_q \), corresponds to a grid cell in the camera’s local coordinate space, covering a predefined range along the \( x \)- and \( y \)-axes. Given an input feature map \( \mathbf{F} \in \mathbb{R}^{C \times H \times W} \), let \( q \) index a query feature map \( \mathbf{Q} \) with a 2D reference point \( \mathbf{p}_q \). The camera cross attention is computed as:

\[ \text{CCA}(\mathbf{Q}, q, \mathbf{F}) = \sum^{N_\mathrm{ref}}_{i=1} {\mathrm{Deform}}(\mathbf{Q}_q, \mathcal{P}(\mathbf{p}_q, z_i), \mathbf{F}) \] \[ \mathcal{P}(\mathbf{p}_q, z_i) = \begin{pmatrix} f & 0 & c_x \\ 0 & f & c_y \end{pmatrix} \begin{pmatrix} \frac{\mathbf{p}_{q,x}}{z_i} \\ \frac{\mathbf{p}_{q,y}}{z_i} \\ 1 \end{pmatrix} \]

where \( \mathrm{Deform} \) denotes deformable attention [Zhu et al.], \( f \) is the focal length, and \( c_x, c_y \) is the focal center. For each \( x \)-\( y \) position, \( N_\mathrm{ref} \) depth values along the \( z \)-axis generate 3D reference points, which are projected to 2D to sample image features.

Detection Head

Each query is processed by the transformer decoder and then passed to a classification branch and a regression branch. The classification branch predicts the rotation symmetry group \( g \), defining the order of symmetry \( N \). The regression branch outputs four parameters that define the 3D geometric structure as:

\[ \begin{bmatrix} \mathbf{c}^\top \ \mathbf{s}^\top \ \mathbf{a}^\top \ \beta \end{bmatrix}^\top, \]

where \( \mathbf{c} \in \mathbb{R}^3 \) is the 3D center coordinate, \( \mathbf{s} \in \mathbb{R}^3 \) is the 3D seed coordinate (a vertex on the polygon boundary), \( \mathbf{a} \in \mathbb{R}^3 \) is the 3D axis vector defining the rotation axis, and \( \beta \) is an angle bias for initial alignment in certain shapes (e.g., rectangles). These parameters define the spatial structure and orientation needed to construct the polygon in 3D space. To position vertices according to the predicted rotation symmetry group in 3D space, each vertex \( \mathbf{v}_k \) is computed by rotating a seed point \( \mathbf{s} \) around a rotation axis vector \( \mathbf{a} \), centered at the rotation center \( \mathbf{c} \). The axis \( \mathbf{a} \) is normalized to a unit vector. Each rotation vertex is given by \( \mathbf{v}_k = \mathbf{r}_k + \mathbf{c} \), where the rotated vector \( \mathbf{r}_k \) is calculated using Rodrigues' rotation formula:

\[ \mathbf{r}_k = \mathbf{r} \cos \theta_k + (\mathbf{a} \times \mathbf{r}) \sin \theta_k + \mathbf{a} (\mathbf{a} \cdot \mathbf{r}) (1 - \cos \theta_k), \]

with the initial radial vector \( \mathbf{r} = \mathbf{s} - \mathbf{c} \), and rotation angle \( \theta_k = \frac{2\pi k}{N} \). Given the predicted symmetry group order \( N \), we generate \( N \) vertices by setting \( k = 1, 2, \ldots, N \). The predicted 3D points are then projected into 2D.

Quantitative results

Ablation study

Table 1. Rotation vertex detection results on DENDI.
Method 3D query/pred. vertex recon. mAP
2D baseline 24.7
3D baseline 23.5
Ours 30.6

Comparison with the state-of-the-art

Table 2. Rotation symmetry detection results on DENDI.
Method Prediction Max F1-score
EquiSym [Seo et al., 2022] segmentation 22.5
Ours detection 33.2

Qualitative Results

Figure 4. Qualitative comparison of rotation vertex detection results on the DENDI dataset. Each set of four columns displays ground truth, the results of 2D baseline, 3D baseline, and ours. Only polygon predictions with all true-positive vertices are marked(green).

Figure 5. Qualitative results of rotation center detection on the DENDI dataset. Each set of three columns shows ground truth, EquiSym[Seo et al., 2022], and our method. Our detection-based model allows for analysis of individual symmetries.

Acknowledgements

This work was supported by the Samsung Electronics AI Center and also by the IITP grants (RS-2022-II220290: Visual Intelligence for Space-Time Understanding and Generation (50%), RS-2021-II212068: AI Innovation Hub (45%), RS-2019-II191906: Artificial Intelligence Graduate School Program at POSTECH (5%)) funded by the Korea government (MSIT).

Paper

Leveraging 3D Geometric Priors in 2D Rotation Symmetry Detection
Ahyun Seo, Minsu Cho
CVPR, 2025
[arXiv] [Bibtex] [Code]