Model Information

This model is based on BERT. It is fine-tuned using a regression head to predict the "formulaicness" of texts. This model was created with logic-to-text generation in mind, a case study. Therefore, it may not work well with all types of sentences.

Model Details

  • Authors: Eduardo Calò, Guanyi Chen, Elias Stengel-Eskin, Albert Gatt, Kees van Deemter
  • Main Affiliation: Utrecht University
  • GitHub Repository: Formulaicness
  • Paper: Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation
  • Contact: e.calo@uu.nl

Usage Example

# === Load tokenizer ===
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

# === BERT regression model ===
class BertRegressionModel(BertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)
        self.bert = BertModel(config)
        self.regressor = nn.Linear(config.hidden_size, 1)
        self.init_weights()

    def forward(self, input_ids, attention_mask=None, labels=None):
        outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
        pooled_output = outputs.pooler_output
        logits = self.regressor(pooled_output).squeeze(-1)
        loss = None
        if labels is not None:
            loss_fct = nn.MSELoss()
            loss = loss_fct(logits, labels)
        return {"loss": loss, "logits": logits}
    
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = BertRegressionModel.from_pretrained("eduardo-calo/formulaicness")
model.to(device)

def predict_formulaicness(text: str) -> float:
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    # Only pass input_ids and attention_mask to the model
    inputs = {key: inputs[key].to(device) for key in ['input_ids', 'attention_mask']}
    with torch.no_grad():
        outputs = model(**inputs)
    return outputs['logits'].item()

sents = [
    "for all x x is a cube",
    "for all x if x is a cube and y is a tetrahedron then x is to the right of y",
    "some primes are even",
    "if a is a cube and b is a tetrahedron then a is to the right of b",
    "no cube is to the right of a tetrahedron",
    "In case of a cube and a tetrahedron, the cube is to the right of the tetrahedron.",
    "F and E are beautiful letters.",
    "The cat sat on the mat.",
    ]

for sent in sents:
    prob = predict_formulaicness(sent)
    print(f"Probability of formulaicness: {prob:.2f}")

Citation

If you find this work helpful or use any artifact coming from it, please cite our paper as follows:

@inproceedings{calo-etal-2025-incorporating,
    title = "Incorporating Formulaicness in the Automatic Evaluation of Naturalness: A Case Study in Logic-to-Text Generation",
    author = "Cal{\`o}, Eduardo  and
      Chen, Guanyi  and
      Stengel-Eskin, Elias  and
      Gatt, Albert  and
      van Deemter, Kees",
    editor = "Flek, Lucie  and
      Narayan, Shashi  and
      Phương, L{\^e} Hồng  and
      Pei, Jiahuan",
    booktitle = "Proceedings of the 18th International Natural Language Generation Conference",
    month = oct,
    year = "2025",
    address = "Hanoi, Vietnam",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.inlg-main.21/",
    pages = "352--365",
    abstract = "Data-to-text natural language generation (NLG) models may produce outputs that closely mirror the structure of their input. We introduce formulaicness as a measure of the output-to-input structural resemblance, proposing it as an enhancement for reference-less naturalness evaluation. Focusing on logic-to-text generation, we construct a dataset and train a regressor to predict formulaicness scores. We collect human judgments on naturalness and examine how incorporating formulaicness into existing metrics affects alignment with these judgments."
}
Downloads last month
2
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Eduardo-Calo/formulaicness

Finetuned
(6641)
this model