MAGNI — SparseWorld 4D Occupancy World Model

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries Chenxu Dang, Haiyan Liu, Jason Bao, et al. arXiv:2510.17482 (AAAI 2026)

Architecture

SparseWorld uses 1040 sparse dynamic queries (720 current + 320 future) with a 6-layer OPUS decoder to predict 4D occupancy from camera images. Key components:

ResNet-50 + FPN backbone (pretrained on nuImages)
OPUS Transformer Decoder with 6 layers, multi-scale deformable attention
Range-Adaptive Perception (RAP) for multi-scale voxel encoding
State-Conditioned Forecasting (SCF) for temporal prediction
76M parameters total

Training Status

Phase	Epochs	Loss (start -> end)	Status
Pretrain	5	54.15 -> 18.53	DONE
Finetune	59	-	Pending

Pretrain Details

Hardware: 7x NVIDIA L4 (23GB each)
Batch: samples_per_gpu=2, effective_batch=14
Optimizer: AdamW, lr=2e-4, weight_decay=1e-2
LR Schedule: Cosine annealing with 500-iter linear warmup
Dataset: nuScenes v1.0-trainval (17,704 samples)
Duration: ~4.8 hours
Loss: Focal Loss + SmoothL1 + Trajectory L2

Available Checkpoints

File	Phase	Epoch	Description
`pretrain/checkpoints/pretrain_epoch1.pth`	Pretrain	1	After first pass through data
`pretrain/checkpoints/pretrain_epoch2.pth`	Pretrain	2
`pretrain/checkpoints/pretrain_epoch3.pth`	Pretrain	3
`pretrain/checkpoints/pretrain_epoch4.pth`	Pretrain	4
`pretrain/checkpoints/pretrain_epoch5.pth`	Pretrain	5	Best pretrain checkpoint

Each checkpoint is ~870 MB and contains model weights, optimizer state, and scheduler state for resume.

Usage

# Load pretrain checkpoint for finetuning
import torch
from mmcv import Config
from mmdet3d.models import build_model

cfg = Config.fromfile("configs/sparseworld/nuscenes-temporal/sparseworld-traj-finetune.py")
model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))

# Load pretrained weights
ckpt = torch.load("pretrain/checkpoints/pretrain_epoch5.pth", map_location='cpu')
model.load_state_dict(ckpt['state_dict'])

Upcoming Exports

After finetune training completes:

SafeTensors format
ONNX export (opset 17)
TensorRT FP16/FP32 (for Jetson deployment)

Data

nuScenes v1.0-trainval: 17,704 training samples (camera-only, 6 views x 5 temporal frames)
Occ3D-nuScenes: Voxel occupancy ground truth (200x200x16, 0.4m resolution, 17 classes)

License

Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED

Citation

@inproceedings{dang2026sparseworld,
  title={SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries},
  author={Dang, Chenxu and Liu, Haiyan and Bao, Jason and others},
  booktitle={AAAI},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Robotics

Paper for ilessio-aiflowlab/project_magni

SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries

Paper • 2510.17482 • Published Oct 20, 2025