Florence-2 Community

community

AI & ML interests

This organization contains official transformers implementation for Florence-2 model by Microsoft.

Recent Activity

fcakyon authored a paper about 6 hours ago

SenBen: Sensitive Scene Graphs for Explainable Content Moderation

ducviet00 new activity 7 months ago

florence-community/Florence-2-base:How were the models converted?

fcakyon new activity 8 months ago

florence-community/Florence-2-large:Thanks for converting these models!

View all activity

authored a paper about 6 hours ago

SenBen: Sensitive Scene Graphs for Explainable Content Moderation

Paper • 2604.08819 • Published 27 days ago • 1

posted an update about 10 hours ago

Post

25

Let me introduce you to our CVPR 2026 paper!

Today's content moderation systems give you a label: safe or unsafe. They don't tell you what triggered the decision, who is involved, or where in the image it happens. That opacity hurts auditing, breaks adaptation across platforms, and frustrates the human review that responsible deployment demands.

We built SenBen to fix this: the first large-scale scene graph benchmark designed specifically for sensitive content moderation:

- 13,999 annotated frames from 157 movies
- Visual Genome style scene graphs with bounding boxes, attributes, and predicates
- Affective state attributes (pain, fear, aggression, distress) so the model captures not just what is in the frame, but what it means
- 16 safety tags across 5 categories, the broadest taxonomy of any dataset of this kind

A small model that beats much bigger ones:

We distilled a frontier VLM into a compact 241M parameter student built on Florence-2.

On grounded scene graph metrics, the 241M student beats every evaluated VLM except Gemini, and every commercial safety API. It also wins on object detection and captioning across the entire model zoo. It runs at 733 ms per frame on 1.2 GB VRAM, which is 7.6 times faster than the next-best local VLM at zero per-frame cost. The whole benchmark, from dataset creation through all baseline evaluations, is reproducible for under $350.

Project: https://senben.kim/
Paper: SenBen: Sensitive Scene Graphs for Explainable Content Moderation (2604.08819)
Dataset: fcakyon/senben
Code (soon): https://github.com/fcakyon/senben

posted an update 7 months ago

Post

11960

deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages

4 replies

·

in florence-community/Florence-2-base 7 months ago

How were the models converted?

#1 opened 7 months ago by

posted an update 8 months ago

Post

7001

large AI labs open-sourced a ton of models last week 🔥
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking

2 replies

·

posted an update 8 months ago

Post

3531

IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥

> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗

posted an update 8 months ago

Post

1284

a ton of image/video generation models and LLMs from big labs 🔥

> Meta released facebook/mobilellm-r1-68c4597b104fac45f28f448e, smol LLMs for on-device use 💬
> Tencent released tencent/SRPO, high res image generation model and tencent/POINTS-Reader, cutting edge OCR 📝
> ByteDance released bytedance-research/HuMo, video generation from any input ⏯️

find more models, datasets, demos here merve/sep-11-releases-68c7dbfa26bea8cd921fa0ac

in florence-community/Florence-2-large 8 months ago

Thanks for converting these models!

#1 opened 8 months ago by

posted an update 8 months ago

Post

1077

fan-favorite vision LM Florence-2 is now officially supported in transformers 🤗

find all the models in

florence-community org 🫡

updated a Space 8 months ago

README

updated 2 models 8 months ago

florence-community/Florence-2-base

Image-Text-to-Text • 0.2B • Updated Sep 11, 2025 • 26.9k • 5

florence-community/Florence-2-base-ft

Image-Text-to-Text • 0.2B • Updated Sep 11, 2025 • 3.97k • 2

published a Space 8 months ago

README

posted an update 8 months ago

Post

1874

past week was great for open LLMs 🔥 merve/sep-1-releases-68bede0e729c12597eefd050

> Google released google/embeddinggemma-300m, new embedding model with 300M params
> new update to Kimi-K2 just landed moonshotai/Kimi-K2-Instruct-0905 😍
> OpenBMB released a new version to MiniCPM with 8B params openbmb/MiniCPM4.1-8B

also soooo many Qwen-Image & Kontext LoRAs dropped!

posted an update 8 months ago

Post

3763

upgrade your transformers 🔥
it comes with insanely capable models like merve/sam2-66ac9deac6fca3bc5482fe30, microsoft/kosmos-2.5, and more 🫡
I built a notebook you can run with free Colab T4 to walk through the API for new models 🙋🏻‍♀️ merve/smol-vision

fine-tuning will follow-up soon!

posted an update 8 months ago

Post

6325

large AI labs have dropped so many open models last week 🔥 don't miss out on them

→ Apple released on-device vision LMs apple/fastvlm-68ac97b9cd5cacefdd04872e & apple/mobileclip2-68ac947dcb035c54bcd20c47
→ OpenGVLab released InternVL3.5, 32 new vision LMs with one based on gpt-oss! (OS) OpenGVLab/internvl35-68ac87bd52ebe953485927fb
→ MSFT released a killer small TTS model (OS) microsoft/VibeVoice-1.5B

find more herehttps://huggingface.co/collections/merve/august-29-releases-68b5a3754cfb8abf59e2b486

1 reply

·

posted an update 8 months ago

Post

6113

first vision language model built off openai/gpt-oss-20b just dropped! 🔥

InternVL3.5 comes with 32 models 🤯 pre-trained, fine-tuned, aligned in various sizes OpenGVLab/internvl35-68ac87bd52ebe953485927fb
comes with gpt-oss or Qwen3 for LLM part ⤵️

1 reply

·

posted an update 9 months ago

Post

3351

GPT-4.1-mini level model right in your iPhone 🤯

openbmb/MiniCPM-V-4 is only 4B while surpassing GPT-4.1-mini in vision benchmarks 🔥

allows commercial use as well!

posted an update 9 months ago

Post

1211

we're all sleeping on this OCR model rednote-hilab/dots.ocr 🔥

dots.ocr is a new 3B model with sota performance, support for 100 languages & allowing commercial use! 🤯

single e2e model to extract image, convert tables, formula, and more into markdown 📝
try it MohamedRashad/Dots-OCR

posted an update 9 months ago

Post

723

massive releases and tons of Flux 1. Krea LoRas past week!
here's some of the picks, find more models in collection 🫡 merve/releases-august-2-6890c14248203522b7d0267f

LLMs 💬
> Tencent dropped tencent/Hunyuan-7B-Instruct
> Qwen released Qwen/Qwen3-Coder-30B-A3B-Instruct, 30B MoE with 3B params for coding (OS)

vision/multimodal
> RedNote released rednote-hilab/dots.ocr - 3B OCR model (OS)
> Cohere released CohereLabs/command-a-vision-07-2025 - 112B (dense!) VLM for 6 languages
> StepFun-AI shipped stepfun-ai/step3 - 321B MoE VLM (OS)
> Skywork shipped Skywork/Skywork-UniPic-1.5B - new any-to-any model (image+text → image+text) (OS)