TCH-Net: Multi-Branch IoT Botnet Detection on BRIDGE

Paper: BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection
Authors: Ammar Bhilwarawala, Likhamba Rongmei, Harsh Sharma, Arya Jena, Kaushal Singh, Jayashree Piri, Raghunath Dey — KIIT University
Submitted to: Journal of Network and Computer Applications
Dataset: Ammar-ss/BRIDGE
Code: github.com/Ammar-ss/TCH-Net

What is this?

The IoT botnet detection field has a quiet problem. Almost every published system gets trained on one dataset, reports numbers in the high 90s, and calls it done. The trouble is those numbers don't travel. A model tuned to CICIDS-2017 will see completely different traffic statistics when you point it at Bot-IoT or N-BaIoT — different capture tools, different devices, different attack toolkits. The benchmark looked easy because it was a closed world.

TCH-Net is a multi-branch neural network built to handle this more honestly. It's trained and evaluated on BRIDGE, a unified benchmark that maps five structurally distinct public datasets into a shared 46-feature space. The goal was to build something that could survive being tested on genuinely heterogeneous data — and then to measure exactly how hard that actually is.

The architecture has three parallel branches. The Temporal branch (T) runs three paths simultaneously: a residual depthwise-separable convolutional BiGRU for local and medium-range patterns, a stride-downsampled BiGRU for coarser dynamics, and a full-resolution pre-LayerNorm Transformer covering all 32 timesteps for global context. Different botnet attack categories manifest at different temporal scales — DDoS flooding shows up in burst-level signatures, C&C beaconing in medium-scale periodic patterns, scan-then-exploit sequences in global ordering — and a single-resolution encoder has to trade one against the others. Three paths running in parallel resolves that.

The Statistical branch (H) mean-pools the window and runs it through an MLP. It captures distributional structure that doesn't depend on ordering at all, which is orthogonal to what the T-branch does. The Contextual branch (C) encodes the source dataset and device category as learned embeddings. On its own it's nearly random (AUC ≈ 0.50) — it doesn't predict attack labels independently. What it does is condition the fusion mechanism on where the input came from.

All three branches get fused through CB-GAF (Cross-Branch Gated Attention Fusion). Each branch queries the other two simultaneously via cross-attention, then a learned sigmoid vector gate — 128-dimensional, not a scalar — controls feature-wise how much cross-branch information gets absorbed. That vector gating is important in the heterogeneous setting. For a dataset like N-BaIoT where 85% of canonical features are zero-filled, the gate can learn to downweight the largely empty H-branch at specific dimensions rather than hard-coding that decision or just averaging in noise across the board.

2,692,696 parameters. Inference latency 6.43ms on Tesla T4. Fits on NVIDIA Jetson with room to spare.

Intended Use

In-Scope

IoT botnet detection research on network flow data
Cross-dataset generalisation benchmarking in heterogeneous IDS settings
Ablation or architecture comparison studies using the BRIDGE benchmark

Out-of-Scope

Production deployment without retraining: The LODO gap (0.2719 F1) indicates the model does not reliably generalise to unseen dataset distributions without adaptation. Do not deploy this in a live network without fine-tuning on in-distribution data.
Non-IoT or enterprise network traffic: BRIDGE covers IoT-specific datasets. Behaviour on corporate LAN/WAN traffic is not evaluated.
Real-time per-packet classification: TCH-Net operates on flow-level feature windows of 32 timesteps. It requires completed or windowed flows, not individual packets.
Unknown context at scale: The contextual branch requires dataset source and device category IDs. If these are unavailable, pass ctx = torch.zeros(B, 2, dtype=torch.long) — performance will degrade modestly but gracefully.

Limitations

LODO F1 = 0.5577. The model does not generalise well to unseen dataset distributions. This is the best LODO result across all evaluated architectures (+0.09–0.17 above baselines), but the gap is real and quantified.
N-BaIoT achieves high F1 (0.9854) largely because Mirai/BASHLITE signatures are statistically distinct in only 7 of 46 features. This is not a BRIDGE-wide pattern.
Edge-IIoTset is the hardest case (F1 = 0.6755) due to IIoT packet-level traffic structures differing from the flow-level distributions dominating training.
85% zero-fill on N-BaIoT canonical features is a BRIDGE artefact — the canonical feature space was built around CICIDS-style flow features, which do not map cleanly to all source datasets.
Not validated on live traffic captures or real-world deployment scenarios.

Results (5 seeds: 42, 123, 456, 789, 2024)

Metric	TCH-Net	Best Baseline (Transformer-IDS)
F1	0.8296 ± 0.0028	0.7958 ± 0.0030
ROC-AUC	0.9380 ± 0.0025	0.9147 ± 0.0012
MCC	0.6972 ± 0.0056	0.6255 ± 0.0067
PR-AUC	0.8912 ± 0.0031	0.8699 ± 0.0041

TCH-Net outperforms all 12 baselines on all four metrics. All differences are statistically significant (p < 0.05, one-sided paired Wilcoxon signed-rank test).

Full Comparison Table

Model	F1	ROC-AUC	MCC	ΔF1
TCH-Net (Ours)	0.8296 ± 0.0028	0.9380	0.6972	—
Transformer-IDS	0.7958 ± 0.0030	0.9147	0.6255	+0.0338**
1D-CNN-IDS	0.7932 ± 0.0076	0.9076	0.6213	+0.0364*
CNN-LSTM	0.7919 ± 0.0137	0.9056	0.6208	+0.0377*
BiLSTM-IDS	0.7805 ± 0.0010	0.8975	0.5972	+0.0491**
BiGRU-IDS	0.7805 ± 0.0011	0.8962	0.5987	+0.0491**
DeepDefense	0.7627 ± 0.0011	0.8776	0.5638	+0.0669***
XGBoost	0.7265 ± 0.0014	0.8704	0.5542	+0.1031***
GraphSAGE-Approx	0.7097 ± 0.0004	0.8259	0.4465	+0.1199***
Kitsune-AE	0.7045 ± 0.0007	0.8200	0.4362	+0.1251***
MLP-IDS	0.7039 ± 0.0008	0.8152	0.4348	+0.1257***
IoT-DNN	0.7009 ± 0.0002	0.8146	0.4278	+0.1287***
Random Forest	0.4323 ± 0.0082	0.8005	0.3557	+0.3973***

Per-Dataset Performance

Dataset	Coverage	DetRate	False Alarm	F1
CICIDS-2017	93%	0.9433	0.0309	0.9505
CIC-IoT-2023	87%	0.8827	0.0257	0.9211
N-BaIoT	15%	0.9982	0.0206	0.9854
Edge-IIoTset	22%	0.6844	0.2589	0.6755

N-BaIoT achieves the highest F1 despite 85% of features being zero-filled. Mirai and BASHLITE botnet traffic is statistically distinctive enough in just 7 features that the separation is stark. Edge-IIoTset is the hardest case — IIoT packet-level traffic structures differently from the flow-level distributions that dominate training.

Leave-One-Dataset-Out (LODO) Generalisation

The honest number. Train on four datasets, test on the fifth, repeated five times.

Held-Out	LODO F1	LODO AUC
CICIDS-2017	0.3128 ± 0.232	0.0509
CIC-IoT-2023	0.6013 ± 0.000	0.1440
Bot-IoT	0.5934 ± 0.011	0.5693
Edge-IIoTset	0.6791 ± 0.008	0.6841
N-BaIoT	0.6021 ± 0.000	0.8171
MEAN	0.5577	0.4531

Generalisation gap: random-split F1 (0.8296) − LODO mean (0.5577) = +0.2719.

This gap is not a TCH-Net problem. All five deep learning baselines scored between 0.39–0.47 LODO F1 under the same protocol. TCH-Net's 0.5577 is the highest LODO score across all evaluated architectures — +0.09 to +0.17 above baselines. The gap is a measurement of how hard the cross-dataset problem actually is. The BRIDGE LODO mean of 0.5577 is the first formally quantified community generalisation baseline in heterogeneous IoT intrusion detection.

Temporal Split Check

Split	F1	AUC	MCC
Random (5 seeds)	0.8296	0.9380	0.6972
Temporal (1 seed)	0.8203	0.9261	0.6831
Δ	−0.0093	−0.0119	−0.0141

The small drop under temporal splitting confirms TCH-Net's performance is not driven by temporal leakage.

Architecture

Input: (B, 32, 46)  — batch × window × canonical features

Shared Feature Projection (residual):
  Linear(46→92) → LayerNorm → GELU → Dropout(δ/2)
  → Linear(92→46) → LayerNorm        X̃ = X + f_proj(X)

T-Branch — three parallel paths, merged to 512d:
  Path 1: ResConvSE×3 + MaxPool → BiGRU(128/dir, 2L)              → 8×256
  Path 2: StrideConv(s=2,64ch) → BiGRU(64/dir, 1L) → AvgPool(8)  → 8×128
  Path 3: Linear(46→128) + LearnablePE → TransEnc(Pre-LN,2L,8H)
           → strip CLS → AvgPool(8)                                → 8×128
  Merge:  concat → LayerNorm(512) → MHA(8 heads) → mean pool       → 512d

H-Branch:   mean(X̃, dim=time) → MLP(46→128→64, BN+GELU+Dropout)  → 64d

C-Branch:   Embed_ds(5,32)[c_ds] ‖ Embed_dev(6,32)[c_dev]         → 64d

CB-GAF (Cross-Branch Gated Attention Fusion):
  Project each branch to d_f=128
  Each branch queries both others simultaneously via cross-attention
  Per-branch vector gate g^i ∈ (0,1)^128  (feature-wise, not scalar)
  x_fused = g^i ⊙ x_self + (1−g^i) ⊙ x_cross
  concat(T_fused, C_fused, H_fused) → LayerNorm                   → 384d

Classifier (residual head):
  raw_proj(mean(X̃)) → 64d
  concat(384d fused, 64d raw) → z ∈ 448d
  Linear(448→256) → BN+GELU+Dropout
  Linear(256→128) + Wskip(448→128)  ← residual skip
  Linear(128→2) → softmax

Aux Decoder (training only, λ=0.05):
  MLP(384→64→46) — prevents information collapse in CB-GAF

Ablation Summary

Branch Ablation (2 seeds, CB-GAF replaced with concat)

Variant	F1	ΔF1
T+C+H (Full)	0.8296	—
T+H	0.7756	−0.0540
T+C	0.7752	−0.0544
T only	0.7753	−0.0543
H only	0.7054	−0.1242
C+H	0.7061	−0.1235
C only	0.6000	−0.2296

The C-branch alone gets near-random performance (AUC ≈ 0.50) — confirming its role is fusion conditioning, not independent prediction.

Novelty Component Ablation (2 seeds)

Variant	F1	ΔF1
Full TCH-Net	0.8296	—
w/o CB-GAF	0.7759	−0.0537
w/o MSTE (Three-Path)	0.7760	−0.0536
w/o Aux Loss	0.7755	−0.0541
w/o All (v2 baseline)	0.7752	−0.0544

Removing any single novel component costs ~0.054 F1.

Computational Profile (NVIDIA Tesla T4)

Model	Params	Latency	Throughput	Memory	F1
TCH-Net	2.692M	6.43±0.18ms	20.5k sps	10.27MB	0.8296
BiLSTM-IDS	0.609M	0.74±0.02ms	34.2k sps	2.32MB	0.7805
Transformer-IDS	0.618M	1.22±0.03ms	36.8k sps	2.36MB	0.7958
1D-CNN-IDS	0.068M	0.69±0.03ms	406.9k sps	0.26MB	0.7932

The 6.43ms latency supports 20,000+ detections/second under batch processing. The 10.27MB footprint is deployable on NVIDIA Jetson hardware. For microcontroller-class endpoints (Cortex-M, ESP32), quantisation or knowledge distillation would be needed.

Installation

pip install torch scikit-learn numpy huggingface_hub

Tested with Python 3.9+. No specific version pinning required beyond a modern PyTorch (≥2.0 recommended for torch.compile compatibility).

Loading the Model

import torch
import pickle
import numpy as np
import torch.nn.functional as F
from huggingface_hub import hf_hub_download

# ── Download files ──────────────────────────────────────────────────────────
ckpt_path   = hf_hub_download("Ammar-ss/BRIDGE_and_TCH-Net", "tch_net_best.pth")
scaler_path = hf_hub_download("Ammar-ss/BRIDGE_and_TCH-Net", "scaler.pkl")

# ── Load scaler ──────────────────────────────────────────────────────────────
with open(scaler_path, "rb") as f:
    scaler = pickle.load(f)

# ── Load checkpoint ──────────────────────────────────────────────────────────
ckpt   = torch.load(ckpt_path, map_location="cpu")
config = ckpt["config"]

# ── Define TCHNet ─────────────────────────────────────────────────────────────
# Full class definition is in bridge-and-tch-net.ipynb and the GitHub repo.
# Paste or import TCHNet before instantiating:
#   from tch_net import TCHNet   (if using the GitHub repo)
#   OR copy the class from the notebook.

model = TCHNet(
    nf=config["n_features"],   # 46
    ws=config["window_size"],  # 32
    nc=config["n_classes"],    # 2
)
model.load_state_dict(ckpt["state_dict"])
model.eval()

# ── Preprocess ───────────────────────────────────────────────────────────────
# X_raw: np.ndarray of shape (N, 46) — raw canonical flow features
X_scaled = np.clip(scaler.transform(X_raw), -10, 10).astype(np.float32)

# ── Inference ────────────────────────────────────────────────────────────────
# x:   FloatTensor (B, 32, 46) — windowed, scaled flow features
# ctx: LongTensor  (B, 2)      — [dataset_source_id, device_category_id]
#
#   dataset_source_id:  0=CICIDS-2017  1=CIC-IoT-2023  2=Bot-IoT
#                       3=Edge-IIoTset  4=N-BaIoT
#   device_category_id: 0=sensor  1=camera  2=appliance  3=IIoT
#                       4=server  5=unknown
#
#   If context is unknown: ctx = torch.zeros(B, 2, dtype=torch.long)
#   The C-branch has no independent predictive power — unknown context
#   degrades gracefully, it does not break inference.

with torch.no_grad():
    logits, _ = model(x, ctx)
    probs = F.softmax(logits, dim=-1)
    preds = logits.argmax(dim=-1)  # 0 = benign, 1 = attack

Files

File	Description
`tch_net_best.pth`	Best checkpoint (highest F1 across all 5 seeds)
`tch_net_seed_42.pth`	Per-seed checkpoint, seed 42
`tch_net_seed_123.pth`	Per-seed checkpoint, seed 123
`tch_net_seed_456.pth`	Per-seed checkpoint, seed 456
`tch_net_seed_789.pth`	Per-seed checkpoint, seed 789
`tch_net_seed_2024.pth`	Per-seed checkpoint, seed 2024
`scaler.pkl`	RobustScaler (q5–q95) fitted on BRIDGE training split — required for inference
`manifest.json`	Config, per-seed metrics, feature names
`BRIDGE and TCH-Net (FULL PAPER).ipynb`	Complete experimental notebook (all 12 baselines, branch ablation, novelty ablation, LODO, temporal split, adversarial robustness, HP sensitivity)
`bridge-and-tch-net.ipynb`	Clean training-only notebook (TCH-Net, 5 seeds, saves checkpoints)

Training Hyperparameters

Parameter	Value
Optimizer	AdamW
Learning rate	5×10⁻⁴
Weight decay	5×10⁻⁵
Scheduler	Cosine annealing, 2-epoch warmup
Loss	Focal (γ=2.0, α-weighted, ε=0.05) + Aux (λ=0.05)
Batch size	512
Max epochs / patience	30 / 5
Sequence length	W=32, stride S=4
Dropout	0.15
Input augmentation	Gaussian noise (σ=0.01, p=0.30, train only)
AMP	fp16 on CUDA

Citation

@article{bhilwarawala2026bridge,
  title   = {{BRIDGE} and {TCH-Net}: Heterogeneous Benchmark and Multi-Branch
             Baseline for Cross-Domain {IoT} Botnet Detection},
  author  = {Bhilwarawala, Ammar and Rongmei, Likhamba and Sharma, Harsh
             and Jena, Arya and Singh, Kaushal and Piri, Jayashree and Dey, Raghunath},
  journal = {arXiv preprint arXiv:2604.11324},
  year    = {2026}
}

Model Card Authors

Ammar Bhilwarawala, KIIT University.
For questions or issues, open a discussion on this repository.

Downloads last month: -; Downloads are not tracked for this model. How to track

Dataset used to train Ammar-ss/BRIDGE_and_TCH-Net

Space using Ammar-ss/BRIDGE_and_TCH-Net 1

Paper for Ammar-ss/BRIDGE_and_TCH-Net

BRIDGE and TCH-Net: Heterogeneous Benchmark and Multi-Branch Baseline for Cross-Domain IoT Botnet Detection

Paper • 2604.11324 • Published 16 days ago • 1

Evaluation results

F1 (macro, 5 seeds) on BRIDGE
self-reported

0.830
ROC-AUC (5 seeds) on BRIDGE
self-reported

0.938
MCC (5 seeds) on BRIDGE
self-reported

0.697
LODO F1 (leave-one-dataset-out) on BRIDGE
self-reported

0.558