Exaone3.5-7.8B_ReST_V0_Quantized

This model is a fine-tuned and AWQ-quantized version of EXAONE 3.5 7.8B (Instruct), optimized for efficient inference and structured text generation.

Overview

  • Base Model: EXAONE 3.5 7.8B (Instruct)
  • Fine-tuning: Supervised fine-tuning on domain-specific data
  • Quantization: 4-bit AWQ
  • Inference: Optimized for vLLM
  • Context Length: up to 32K tokens

Model Details

  • Architecture: ExaoneForCausalLM
  • Hidden Size: 4096
  • Layers: 32
  • Attention Heads: 32
  • Max Position Embeddings: 32768
  • Quantization: 4-bit AWQ
  • Torch dtype: float16

Intended Use

  • Instruction-based text generation
  • Structured output generation (JSON)
  • LLM-based data pipelines
  • RAG systems
  • Efficient inference

Example Usage

from vllm import LLM, SamplingParams

llm = LLM(
    model="cococoomo/Exaone3.5-7.8B_ReST_V0_Quantized",
    quantization="AWQ",
)

sampling_params = SamplingParams(
    temperature=0.2,
    top_p=0.8,
    max_tokens=1024,
)

outputs = llm.generate(["Your prompt here"], sampling_params)
print(outputs[0].outputs[0].text)

Training

Fine-tuned using supervised learning on domain-specific data.
Dataset is not included due to privacy constraints.

Limitations

  • May produce incorrect outputs
  • Sensitive to prompt quality
  • Domain bias may exist

Safety

Not intended for critical decision-making without human validation.

Evaluation

  • BLEU
  • ROUGE

Deployment

Optimized for vLLM and GPU-efficient inference.

Downloads last month
56
Safetensors
Model size
8B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cococoomo/Exaone3.5-7.8B_ReST_V0_Quantized

Quantized
(24)
this model