arxiv:2408.05337

VACoDe: Visual Augmented Contrastive Decoding

Published on Jul 26, 2024

Authors:

Abstract

VACoDe, a visual augmented contrastive decoding method, adaptively selects optimal image augmentations based on task-specific contrast metrics to reduce hallucinations in vision-language models.

AI-generated summary

Despite the astonishing performance of recent Large Vision-Language Models (LVLMs), these models often generate inaccurate responses. To address this issue, previous studies have focused on mitigating hallucinations by employing contrastive decoding (CD) with augmented images, which amplifies the contrast with the original image. However, these methods have limitations, including reliance on a single augmentation, which is restrictive for certain tasks, as well as the high cost of using external knowledge. In this study, we address these limitations by exploring how to utilize multiple image augmentations. Through extensive experiments, we observed that different augmentations produce varying levels of contrast depending on the task. Based on this observation, we introduce a novel method called VACoDe, Visual Augmented Contrastive Decoding. This method adaptively selects the augmentation with the highest contrast for each task using the proposed softmax distance metric. Our empirical tests show that \alg outperforms previous methods and improves output quality in various vision-language tasks. Additionally, VACoDe can be universally applied across different model types and sizes without additional training or the use of external models and data.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2408.05337

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2408.05337 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2408.05337 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2408.05337 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.