🍷 DistilBERT Wine Variety Classifier
A fine-tuned DistilBERT model that predicts the grape variety of a wine from its tasting description. Classifies into 32 grape varieties.
Performance
| Epoch | Eval Accuracy | Eval F1 (weighted) |
|---|---|---|
| 1 | 69.7% | 68.4% |
| 2 | 72.8% | 72.0% |
| 3 | 73.8% | 73.0% |
| 4 | 75.4% | 74.8% |
| 5 | 75.8% | 75.3% |
Note: 32-class classification from text alone is a challenging task. Random baseline would be ~3%.
Supported Varieties (32 classes)
Barbera, Bordeaux-style Red Blend, Bordeaux-style White Blend, Cabernet Franc, Cabernet Sauvignon, Champagne Blend, Chardonnay, Gewürztraminer, Glera, Grenache, Malbec, Merlot, Nebbiolo, Petite Sirah, Pinot Grigio, Pinot Gris, Pinot Noir, Portuguese Red, Red Blend, Rhône-style Red Blend, Riesling, Rosé, Sangiovese, Sangiovese Grosso, Sauvignon Blanc, Sparkling Blend, Syrah, Tempranillo, Tempranillo Blend, Viognier, White Blend, Zinfandel
Usage
from transformers import pipeline
classifier = pipeline("text-classification", model="victor/distilbert-wine-variety", top_k=5)
description = "Aromas of dark cherry, plum and cedar lead to a full-bodied palate with firm tannins and flavors of blackberry, vanilla and tobacco."
results = classifier(description)
for r in results:
print(f" {r['label']}: {r['score']:.2%}")
Example Output
Cabernet Sauvignon: 45.2%
Red Blend: 18.7%
Merlot: 12.3%
Syrah: 8.1%
Zinfandel: 5.4%
Training Details
- Base model:
distilbert/distilbert-base-uncased(66M parameters) - Dataset: GroNLP/ik-nlp-22_winemag — 70K wine reviews from Wine Enthusiast magazine
- Epochs: 5
- Batch size: 32
- Learning rate: 2e-5 (linear decay with 10% warmup)
- Max sequence length: 256 tokens
- Hardware: NVIDIA T4 GPU
- Training time: ~13 minutes
Dataset
The model was trained on wine reviews from Wine Enthusiast magazine, containing expert descriptions of wines with flavor notes, aromas, and mouthfeel characteristics. The text descriptions encode subtle grape-specific vocabulary that the model learns to associate with each variety.
Limitations
- The model works best on English wine review text in the style of wine critics
- Some varieties that share flavor profiles (e.g., Pinot Grigio vs Pinot Gris) are inherently hard to distinguish
- Blend categories (Red Blend, White Blend) are catch-all and harder to predict
- The model has only seen wines from the 8 countries in the dataset (US, France, Italy, Spain, Argentina, Germany, Portugal, Australia)
License
MIT
- Downloads last month
- 66