-
Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS
Paper • 2503.10674 • Published • 3 -
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
Paper • 2506.01646 • Published • 2 -
Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs
Paper • 2406.19102 • Published • 1 -
Improving Retrieval for RAG based Question Answering Models on Financial Documents
Paper • 2404.07221 • Published
Guille Pérez-Torró
guishe
AI & ML interests
Information Retrieval, Few-Shot Learning, Named Entity Recognition, Named Entity Disambiguation, Semantic Search, Aspect-based Sentiment Analysis
Recent Activity
liked a Space 13 days ago
HuggingFaceH4/on-policy-distillation liked a Space 13 days ago
HuggingFaceTB/trl-distillation-trainer upvoted an article 18 days ago
From GRPO to DAPO and GSPO: What, Why, and HowOrganizations
Multimodal Embeddings
Reasoning LLMs
ReRanker Encoder-only Models
-
jinaai/jina-reranker-v2-base-multilingual
Text Ranking • 0.3B • Updated • 1.79M • 352 -
BAAI/bge-reranker-v2-m3
Text Classification • 0.6B • Updated • 16.2M • • 1.06k -
mixedbread-ai/mxbai-rerank-base-v1
Text Ranking • 0.2B • Updated • 474k • 46 -
mixedbread-ai/mxbai-rerank-large-v1
Text Ranking • Updated • 68.4k • 142
Embedding Encoder-only Models
-
BAAI/bge-m3
Sentence Similarity • Updated • 31.3M • • 3.16k -
BAAI/bge-large-en-v1.5
Feature Extraction • 0.3B • Updated • 14.6M • • 689 -
nomic-ai/nomic-embed-text-v1.5
Sentence Similarity • 0.1B • Updated • 17.9M • 857 -
mixedbread-ai/mxbai-embed-large-v1
Feature Extraction • 0.3B • Updated • 5.84M • • 811
Summarization
-
Falconsai/text_summarization
Summarization • 60.5M • Updated • 119k • • 297 -
knkarthick/MEETING-SUMMARY-BART-LARGE-XSUM-SAMSUM-DIALOGSUM
Summarization • 0.4B • Updated • 117 • 13 -
knkarthick/MEETING-SUMMARY-BART-LARGE-XSUM-SAMSUM-DIALOGSUM-AMI
Summarization • 0.4B • Updated • 12 • 17 -
BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
Summarization • 0.3B • Updated • 432 • 3
Instruct LLMs
-
instruction-pretrain/finance-Llama3-8B
Text Generation • 8B • Updated • 500 • • 76 -
EmergentMethods/Phi-3-mini-4k-instruct-graph
Text Generation • 4B • Updated • 48 -
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Text Generation • 71B • Updated • 16.2k • • 2.07k -
unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
Text Generation • 3B • Updated • 69.4k • 10
Multi-Vector Embedding Models
LLM-as-a-Judge
-
AtlaAI/Selene-1-Mini-Llama-3.1-8B
Text Generation • 8B • Updated • 5.42k • • 104 -
AtlaAI/Selene-1-Mini-Llama-3.1-8B-Q4_K_M-GGUF
Text Generation • 8B • Updated • 129 • 8 -
flowaicom/Flow-Judge-v0.1
Text Generation • 4B • Updated • 15.9k • 71 -
prometheus-eval/prometheus-7b-v2.0
Text Generation • 7B • Updated • 45.9k • • 108
Zero-Shot Entailment Models
-
MoritzLaurer/bge-m3-zeroshot-v2.0
Zero-Shot Classification • 0.6B • Updated • 67.7k • • 63 -
MoritzLaurer/deberta-v3-large-zeroshot-v2.0
Zero-Shot Classification • 0.4B • Updated • 106k • • 129 -
MoritzLaurer/ModernBERT-base-zeroshot-v2.0
Text Classification • 0.1B • Updated • 926 • 19 -
MoritzLaurer/ModernBERT-large-zeroshot-v2.0
Text Classification • 0.4B • Updated • 16.6k • • 66
NER Encoder-only Models
This collections gathers several NER models. Either fine-tuned versions for specific tasks or generic backbone models ready to be fine-tuned.
-
guishe/nuner-v1_orgs
Token Classification • 0.1B • Updated • 7.03k • • 2 -
guishe/span-marker-generic-ner-v1-fewnerd-fine-super
Token Classification • 0.1B • Updated • 3.1k • 13 -
protectai/guishe-nuner-v1_orgs-onnx
Token Classification • Updated • 1.68k -
guishe/nuner-v1_fewnerd_fine_super
Token Classification • 0.1B • Updated • 4
Small LLMs
-
microsoft/Phi-3-mini-4k-instruct
Text Generation • 4B • Updated • 587k • • 1.44k -
google/gemma-2-2b-it
Text Generation • 3B • Updated • 388k • • 1.41k -
nvidia/Nemotron-Mini-4B-Instruct
Text Generation • Updated • 389k • 184 -
HuggingFaceTB/SmolLM2-360M-Instruct
Text Generation • 0.4B • Updated • 245k • 197
Part-of-Speech Tagging
papers_ESG
-
Enhancing Retrieval for ESGLLM via ESG-CID -- A Disclosure Content Index Finetuning Dataset for Mapping GRI and ESRS
Paper • 2503.10674 • Published • 3 -
ESGenius: Benchmarking LLMs on Environmental, Social, and Governance (ESG) and Sustainability Knowledge
Paper • 2506.01646 • Published • 2 -
Statements: Universal Information Extraction from Tables with Large Language Models for ESG KPIs
Paper • 2406.19102 • Published • 1 -
Improving Retrieval for RAG based Question Answering Models on Financial Documents
Paper • 2404.07221 • Published
Multi-Vector Embedding Models
Multimodal Embeddings
LLM-as-a-Judge
-
AtlaAI/Selene-1-Mini-Llama-3.1-8B
Text Generation • 8B • Updated • 5.42k • • 104 -
AtlaAI/Selene-1-Mini-Llama-3.1-8B-Q4_K_M-GGUF
Text Generation • 8B • Updated • 129 • 8 -
flowaicom/Flow-Judge-v0.1
Text Generation • 4B • Updated • 15.9k • 71 -
prometheus-eval/prometheus-7b-v2.0
Text Generation • 7B • Updated • 45.9k • • 108
Reasoning LLMs
Zero-Shot Entailment Models
-
MoritzLaurer/bge-m3-zeroshot-v2.0
Zero-Shot Classification • 0.6B • Updated • 67.7k • • 63 -
MoritzLaurer/deberta-v3-large-zeroshot-v2.0
Zero-Shot Classification • 0.4B • Updated • 106k • • 129 -
MoritzLaurer/ModernBERT-base-zeroshot-v2.0
Text Classification • 0.1B • Updated • 926 • 19 -
MoritzLaurer/ModernBERT-large-zeroshot-v2.0
Text Classification • 0.4B • Updated • 16.6k • • 66
ReRanker Encoder-only Models
-
jinaai/jina-reranker-v2-base-multilingual
Text Ranking • 0.3B • Updated • 1.79M • 352 -
BAAI/bge-reranker-v2-m3
Text Classification • 0.6B • Updated • 16.2M • • 1.06k -
mixedbread-ai/mxbai-rerank-base-v1
Text Ranking • 0.2B • Updated • 474k • 46 -
mixedbread-ai/mxbai-rerank-large-v1
Text Ranking • Updated • 68.4k • 142
NER Encoder-only Models
This collections gathers several NER models. Either fine-tuned versions for specific tasks or generic backbone models ready to be fine-tuned.
-
guishe/nuner-v1_orgs
Token Classification • 0.1B • Updated • 7.03k • • 2 -
guishe/span-marker-generic-ner-v1-fewnerd-fine-super
Token Classification • 0.1B • Updated • 3.1k • 13 -
protectai/guishe-nuner-v1_orgs-onnx
Token Classification • Updated • 1.68k -
guishe/nuner-v1_fewnerd_fine_super
Token Classification • 0.1B • Updated • 4
Embedding Encoder-only Models
-
BAAI/bge-m3
Sentence Similarity • Updated • 31.3M • • 3.16k -
BAAI/bge-large-en-v1.5
Feature Extraction • 0.3B • Updated • 14.6M • • 689 -
nomic-ai/nomic-embed-text-v1.5
Sentence Similarity • 0.1B • Updated • 17.9M • 857 -
mixedbread-ai/mxbai-embed-large-v1
Feature Extraction • 0.3B • Updated • 5.84M • • 811
Small LLMs
-
microsoft/Phi-3-mini-4k-instruct
Text Generation • 4B • Updated • 587k • • 1.44k -
google/gemma-2-2b-it
Text Generation • 3B • Updated • 388k • • 1.41k -
nvidia/Nemotron-Mini-4B-Instruct
Text Generation • Updated • 389k • 184 -
HuggingFaceTB/SmolLM2-360M-Instruct
Text Generation • 0.4B • Updated • 245k • 197
Summarization
-
Falconsai/text_summarization
Summarization • 60.5M • Updated • 119k • • 297 -
knkarthick/MEETING-SUMMARY-BART-LARGE-XSUM-SAMSUM-DIALOGSUM
Summarization • 0.4B • Updated • 117 • 13 -
knkarthick/MEETING-SUMMARY-BART-LARGE-XSUM-SAMSUM-DIALOGSUM-AMI
Summarization • 0.4B • Updated • 12 • 17 -
BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
Summarization • 0.3B • Updated • 432 • 3
Part-of-Speech Tagging
Instruct LLMs
-
instruction-pretrain/finance-Llama3-8B
Text Generation • 8B • Updated • 500 • • 76 -
EmergentMethods/Phi-3-mini-4k-instruct-graph
Text Generation • 4B • Updated • 48 -
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Text Generation • 71B • Updated • 16.2k • • 2.07k -
unsloth/Llama-3.2-3B-Instruct-unsloth-bnb-4bit
Text Generation • 3B • Updated • 69.4k • 10