Kinetic-FC-LoRA

Kinetic-FC-LoRA is a rank-64 LoRA adapter for Qwen/Qwen3-4B-Instruct-2507, fine-tuned at Conscious Engines for function / tool calling against the Composio tool ecosystem.

Applied on top of the base model, this adapter produces Kinetic-4B, a 4B-parameter tool-calling model that on our 300-sample Composio eval beats frontier hosted models on the same task:

Model Params Accuracy p95 latency
Qwen3-4B + Kinetic-FC-LoRA 4B 82.33% 1.61 s
Claude Haiku 4.5 โ€” 80.00% 4.02 s
Qwen3-4B-Instruct-2507 (base) 4B 78.67% 1.84 s
GPT-OSS-120B 120B 76.33% 7.99 s

Full write-up: Kinetic-4B blog post.

What it's for

  • Picking the correct tool from a menu of up to ~10 options drawn from a single Composio toolkit.
  • Producing syntactically valid arguments conforming to the tool's JSON schema.
  • Emitting tool calls in Qwen3's native <tool_call>{ "name": ..., "arguments": ... }</tool_call> format.

It is not a general chat model โ€” it's a narrow specialist. For freeform conversation, use the base Qwen3-4B-Instruct-2507 directly.

Adapter details

Base model Qwen/Qwen3-4B-Instruct-2507
Method LoRA (PEFT)
Rank r 64
lora_alpha 128
lora_dropout 0.05
Target modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable params 132 M / 4.15 B (3.18%)
Precision bf16
Epochs 2
Effective batch 16 (1 ร— 16 grad accum)
Learning rate 2e-4, cosine, 5% warmup
Max seq length 10 240
Training data 13 694 synthetic samples across the top-20 Composio toolkits, 10 tools per sample (1 ground-truth + 9 distractors from the same toolkit)

Inference with PyTorch + PEFT

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

BASE  = "Qwen/Qwen3-4B-Instruct-2507"
ADAPT = "consciousengines/Kinetic-FC-LoRA"

device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
dtype  = torch.bfloat16 if device != "cpu" else torch.float32

tokenizer = AutoTokenizer.from_pretrained(BASE)
base      = AutoModelForCausalLM.from_pretrained(BASE, dtype=dtype).to(device)
model     = PeftModel.from_pretrained(base, ADAPT).eval()

# Optional: merge LoRA into the base weights for a small inference speedup.
# model = model.merge_and_unload()

tools = [{
    "type": "function",
    "function": {
        "name": "SALESFORCE_ADD_CONTACT_TO_CAMPAIGN",
        "description": "Adds a contact to a campaign by creating a CampaignMember record.",
        "parameters": {
            "type": "object",
            "properties": {
                "campaign_id": {"type": "string", "description": "Salesforce campaign ID."},
                "contact_id":  {"type": "string", "description": "Salesforce contact ID."},
                "status":      {"type": "string", "description": "Member status, e.g. 'Attended'."},
            },
            "required": ["campaign_id", "contact_id"],
        },
    },
}]

messages = [
    {"role": "system", "content": "You are a helpful assistant with access to tools."},
    {"role": "user",   "content": "Please enroll Contact ID 0035g00000ZZtopAA into Campaign 7015g000000XyZ9AA (mark them as Attended)."},
]

inputs = tokenizer.apply_chat_template(
    messages, tools=tools, add_generation_prompt=True,
    return_tensors="pt", return_dict=True,
).to(device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=256, do_sample=False, pad_token_id=tokenizer.eos_token_id)

print(tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=False))

Expected completion (format is Qwen3-native):

<tool_call>
{"name": "SALESFORCE_ADD_CONTACT_TO_CAMPAIGN", "arguments": {"campaign_id": "7015g000000XyZ9AA", "contact_id": "0035g00000ZZtopAA", "status": "Attended"}}
</tool_call>

Serving with vLLM

The merged model is also published separately if you'd rather serve a single artifact:

vllm serve Qwen/Qwen3-4B-Instruct-2507 \
  --enable-lora \
  --lora-modules kinetic-fc=consciousengines/Kinetic-FC-LoRA \
  --tool-call-parser hermes \
  --enable-auto-tool-choice

Then hit /v1/chat/completions with model: "kinetic-fc" and an OpenAI-style tools array.

Intended use & limitations

  • Designed for structured function / tool calls on Composio-style JSON schemas, presented 1โ€“10 at a time.
  • Not designed for long-form chat, coding assistance, math, or retrieval-augmented question answering. The adapter was not trained on these distributions and will underperform the base model on them.
  • Like any small model, it can hallucinate argument values (e.g. IDs) when the user query is ambiguous or incomplete.
  • Evaluated only in English, and primarily on SaaS-API-flavoured schemas.

Citation

@misc{kinetic4b2026,
  title  = {Kinetic-4B: A 4-Billion Parameter Model That Outperforms Claude Haiku at Tool Calling},
  author = {Pal, Ritam and Kundan, Kautuk},
  year   = {2026},
  url    = {https://www.consciousengines.com/blog/kinetic-4b-a-4-billion-parameter-model-that-outperforms-claude-haiku-at-tool-calling}
}

Acknowledgements

Built by Ritam Pal and Kautuk Kundan at Conscious Engines, as part of the LossFunk residency.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for consciousengines/Kinetic-FC-LoRA

Adapter
(5499)
this model