Papers
arxiv:2604.24040

Improving Robustness of Tabular Retrieval via Representational Stability

Published on Apr 27
· Submitted by
Kushal Raj Bhandari
on Apr 28
Authors:
,
,
,

Abstract

Transformer-based table retrieval systems flatten structured tables into token sequences, making retrieval sensitive to the choice of serialization even when table semantics remain unchanged. We show that semantically equivalent serializations, such as csv, tsv, html, markdown, and ddl, can produce substantially different embeddings and retrieval results across multiple benchmarks and retriever families. To address this instability, we treat serialization embedding as noisy views of a shared semantic signal and use its centroid as a canonical target representation. We show that centroid averaging suppresses format-specific variation and can recover the semantic content common to different serializations when format-induced shifts differ across tables. Empirically, centroid representations outrank individual formats in aggregate pairwise comparisons across MPNet, BGE-M3, ReasonIR, and SPLADE. We further introduce a lightweight residual bottleneck adapter on top of a frozen encoder that maps single-serialization embeddings towards centroid targets while preserving variance and enforcing covariance regularization. The adapter improves robustness for several dense retrievers, though gains are model-dependent and weaker for sparse lexical retrieval. These results identify serialization sensitivity as a major source of retrieval variance and show the promise of post hoc geometric correction for serialization-invariant table retrieval. Our code, datasets, and models are available at https://github.com/KBhandari11/Centroid-Aligned-Table-Retrieval{https://github.com/KBhandari11/Centroid-Aligned-Table-Retrieval}.

Community

Paper author Paper submitter
edited about 14 hours ago

Transformer retrievers flatten tables into token sequences, and the choice of serialization format , CSV, HTML, DDL, Markdown, and so on, produces substantially different embeddings even when the underlying table data stays identical. This paper quantifies instability across four retriever families and three benchmarks, then proposes averaging embeddings across serialization formats to compute a stable centroid representation. A lightweight residual bottleneck adapter then learns to approximate that centroid from a single serialization at inference time, keeping the base encoder frozen throughout.
overview_even_small

Key findings:

  • Serialization format acts as a first-order retrieval variable, with Recall@1 swings as large as 0.26 on NQ-Tables for a single retriever across formats
  • Centroid averaging outranks every individual serialization format in aggregate pairwise comparisons across all tested models
  • The residual adapter delivers meaningful gains for dense retrievers, particularly MPNet and ReasonIR on syntactically heavy or structurally perturbed serializations
  • The subset adapter, trained only on WTQ and WikiSQL, transfers partially to unseen NQ-Tables data, suggesting the learned correction captures format-structural patterns rather than dataset-specific ones

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.24040
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 8

Browse 8 models citing this paper

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.24040 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.24040 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.