Florence-2 Community

community

Activity Feed

AI & ML interests

This organization contains official transformers implementation for Florence-2 model by Microsoft.

Recent Activity

fcakyon authored a paper about 1 hour ago

SenBen: Sensitive Scene Graphs for Explainable Content Moderation

ducviet00 new activity 7 months ago

florence-community/Florence-2-base:How were the models converted?

fcakyon new activity 8 months ago

florence-community/Florence-2-large:Thanks for converting these models!

View all activity

Organization Card

Community About org cards

This is the organization for official transformers converted checkpoints of Microsoft's Florence model. Try the model itself here. This integration unlocks use of Florence-2 with all the libraries/APIs in Hugging Face ecosystem.

Florence-2 is an advanced vision foundation model that uses a prompt-based approach to handle a wide range of vision and vision-language tasks. Florence-2 can interpret simple text prompts to perform tasks like captioning, object detection, and segmentation. It leverages FLD-5B dataset, containing 5.4 billion annotations across 126 million images, to master multi-task learning. The model's sequence-to-sequence architecture enables it to excel in both zero-shot and fine-tuned settings, proving to be a competitive vision foundation model.

Resources and Technical Documentation:

Model	Model size	Model Description
Florence-2-base[HF]	0.23B	Pretrained model with FLD-5B
Florence-2-large[HF]	0.77B	Pretrained model with FLD-5B
Florence-2-base-ft[HF]	0.23B	Finetuned model on a colletion of downstream tasks
Florence-2-large-ft[HF]	0.77B	Finetuned model on a colletion of downstream tasks

Use the code below to get started with the model.

import torch
import requests
from PIL import Image
from transformers import AutoProcessor, Florence2ForConditionalGeneration


model = Florence2ForConditionalGeneration.from_pretrained(
    "florence-community/Florence-2-base-ft",
    dtype=torch.bfloat16,
    device_map="auto",
)
processor = AutoProcessor.from_pretrained("florence-community/Florence-2-base-ft")

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/car.jpg?download=true"
image = Image.open(requests.get(url, stream=True).raw).convert("RGB")

task_prompt = "<OD>"
inputs = processor(text=task_prompt, images=image, return_tensors="pt").to(model.device, torch.bfloat16)

generated_ids = model.generate(
    **inputs,
    max_new_tokens=1024,
    num_beams=3,
)
generated_text = processor.batch_decode(generated_ids, skip_special_tokens=False)[0]

image_size = image.size
parsed_answer = processor.post_process_generation(generated_text, task=task_prompt, image_size=image_size)

print(parsed_answer)

models 4

datasets 0

None public yet

AI & ML interests

Recent Activity

Team members 3

models 4 Sort: Recently updated

datasets 0

models 4