Kinetic-FC-LoRA

Kinetic-FC-LoRA is a rank-64 LoRA adapter for Qwen/Qwen3-4B-Instruct-2507, fine-tuned at Conscious Engines for function / tool calling against the Composio tool ecosystem.

Applied on top of the base model, this adapter produces Kinetic-4B, a 4B-parameter tool-calling model that on our 300-sample Composio eval beats frontier hosted models on the same task:

Model	Params	Accuracy	p95 latency
Qwen3-4B + Kinetic-FC-LoRA	4B	82.33%	1.61 s
Claude Haiku 4.5	—	80.00%	4.02 s
Qwen3-4B-Instruct-2507 (base)	4B	78.67%	1.84 s
GPT-OSS-120B	120B	76.33%	7.99 s

Full write-up: Kinetic-4B blog post.

What it's for

Picking the correct tool from a menu of up to ~10 options drawn from a single Composio toolkit.
Producing syntactically valid arguments conforming to the tool's JSON schema.
Emitting tool calls in Qwen3's native <tool_call>{ "name": ..., "arguments": ... }</tool_call> format.

It is not a general chat model — it's a narrow specialist. For freeform conversation, use the base Qwen3-4B-Instruct-2507 directly.

Adapter details


Base model	`Qwen/Qwen3-4B-Instruct-2507`
Method	LoRA (PEFT)
Rank `r`	64
`lora_alpha`	128
`lora_dropout`	0.05
Target modules	`q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj`
Trainable params	132 M / 4.15 B (3.18%)
Precision	bf16
Epochs	2
Effective batch	16 (1 × 16 grad accum)
Learning rate	2e-4, cosine, 5% warmup
Max seq length	10 240
Training data	13 694 synthetic samples across the top-20 Composio toolkits, 10 tools per sample (1 ground-truth + 9 distractors from the same toolkit)

Inference with PyTorch + PEFT

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

BASE  = "Qwen/Qwen3-4B-Instruct-2507"
ADAPT = "consciousengines/Kinetic-FC-LoRA"

device = "cuda" if torch.cuda.is_available() else ("mps" if torch.backends.mps.is_available() else "cpu")
dtype  = torch.bfloat16 if device != "cpu" else torch.float32

tokenizer = AutoTokenizer.from_pretrained(BASE)
base      = AutoModelForCausalLM.from_pretrained(BASE, dtype=dtype).to(device)
model     = PeftModel.from_pretrained(base, ADAPT).eval()

# Optional: merge LoRA into the base weights for a small inference speedup.
# model = model.merge_and_unload()

tools = [{
    "type": "function",
    "function": {
        "name": "SALESFORCE_ADD_CONTACT_TO_CAMPAIGN",
        "description": "Adds a contact to a campaign by creating a CampaignMember record.",
        "parameters": {
            "type": "object",
            "properties": {
                "campaign_id": {"type": "string", "description": "Salesforce campaign ID."},
                "contact_id":  {"type": "string", "description": "Salesforce contact ID."},
                "status":      {"type": "string", "description": "Member status, e.g. 'Attended'."},
            },
            "required": ["campaign_id", "contact_id"],
        },
    },
}]

messages = [
    {"role": "system", "content": "You are a helpful assistant with access to tools."},
    {"role": "user",   "content": "Please enroll Contact ID 0035g00000ZZtopAA into Campaign 7015g000000XyZ9AA (mark them as Attended)."},
]

inputs = tokenizer.apply_chat_template(
    messages, tools=tools, add_generation_prompt=True,
    return_tensors="pt", return_dict=True,
).to(device)

with torch.inference_mode():
    out = model.generate(**inputs, max_new_tokens=256, do_sample=False, pad_token_id=tokenizer.eos_token_id)

print(tokenizer.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=False))

Expected completion (format is Qwen3-native):

<tool_call>
{"name": "SALESFORCE_ADD_CONTACT_TO_CAMPAIGN", "arguments": {"campaign_id": "7015g000000XyZ9AA", "contact_id": "0035g00000ZZtopAA", "status": "Attended"}}
</tool_call>

Serving with vLLM

The merged model is also published separately if you'd rather serve a single artifact:

vllm serve Qwen/Qwen3-4B-Instruct-2507 \
  --enable-lora \
  --lora-modules kinetic-fc=consciousengines/Kinetic-FC-LoRA \
  --tool-call-parser hermes \
  --enable-auto-tool-choice

Then hit /v1/chat/completions with model: "kinetic-fc" and an OpenAI-style tools array.

Intended use & limitations

Designed for structured function / tool calls on Composio-style JSON schemas, presented 1–10 at a time.
Not designed for long-form chat, coding assistance, math, or retrieval-augmented question answering. The adapter was not trained on these distributions and will underperform the base model on them.
Like any small model, it can hallucinate argument values (e.g. IDs) when the user query is ambiguous or incomplete.
Evaluated only in English, and primarily on SaaS-API-flavoured schemas.

Citation

@misc{kinetic4b2026,
  title  = {Kinetic-4B: A 4-Billion Parameter Model That Outperforms Claude Haiku at Tool Calling},
  author = {Pal, Ritam and Kundan, Kautuk},
  year   = {2026},
  url    = {https://www.consciousengines.com/blog/kinetic-4b-a-4-billion-parameter-model-that-outperforms-claude-haiku-at-tool-calling}
}

Acknowledgements

Built by Ritam Pal and Kautuk Kundan at Conscious Engines, as part of the LossFunk residency.

Downloads last month: 2

Model tree for consciousengines/Kinetic-FC-LoRA

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5499)

this model