MAGNI — SparseWorld 4D Occupancy World Model

Part of the ANIMA Perception Suite by Robot Flow Labs.

Paper

SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries Chenxu Dang, Haiyan Liu, Jason Bao, et al. arXiv:2510.17482 (AAAI 2026)

Architecture

SparseWorld uses 1040 sparse dynamic queries (720 current + 320 future) with a 6-layer OPUS decoder to predict 4D occupancy from camera images. Key components:

  • ResNet-50 + FPN backbone (pretrained on nuImages)
  • OPUS Transformer Decoder with 6 layers, multi-scale deformable attention
  • Range-Adaptive Perception (RAP) for multi-scale voxel encoding
  • State-Conditioned Forecasting (SCF) for temporal prediction
  • 76M parameters total

Training Status

Phase Epochs Loss (start -> end) Status
Pretrain 5 54.15 -> 18.53 DONE
Finetune 59 - Pending

Pretrain Details

  • Hardware: 7x NVIDIA L4 (23GB each)
  • Batch: samples_per_gpu=2, effective_batch=14
  • Optimizer: AdamW, lr=2e-4, weight_decay=1e-2
  • LR Schedule: Cosine annealing with 500-iter linear warmup
  • Dataset: nuScenes v1.0-trainval (17,704 samples)
  • Duration: ~4.8 hours
  • Loss: Focal Loss + SmoothL1 + Trajectory L2

Available Checkpoints

File Phase Epoch Description
pretrain/checkpoints/pretrain_epoch1.pth Pretrain 1 After first pass through data
pretrain/checkpoints/pretrain_epoch2.pth Pretrain 2
pretrain/checkpoints/pretrain_epoch3.pth Pretrain 3
pretrain/checkpoints/pretrain_epoch4.pth Pretrain 4
pretrain/checkpoints/pretrain_epoch5.pth Pretrain 5 Best pretrain checkpoint

Each checkpoint is ~870 MB and contains model weights, optimizer state, and scheduler state for resume.

Usage

# Load pretrain checkpoint for finetuning
import torch
from mmcv import Config
from mmdet3d.models import build_model

cfg = Config.fromfile("configs/sparseworld/nuscenes-temporal/sparseworld-traj-finetune.py")
model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))

# Load pretrained weights
ckpt = torch.load("pretrain/checkpoints/pretrain_epoch5.pth", map_location='cpu')
model.load_state_dict(ckpt['state_dict'])

Upcoming Exports

After finetune training completes:

  • SafeTensors format
  • ONNX export (opset 17)
  • TensorRT FP16/FP32 (for Jetson deployment)

Data

  • nuScenes v1.0-trainval: 17,704 training samples (camera-only, 6 views x 5 temporal frames)
  • Occ3D-nuScenes: Voxel occupancy ground truth (200x200x16, 0.4m resolution, 17 classes)

License

Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED

Citation

@inproceedings{dang2026sparseworld,
  title={SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries},
  author={Dang, Chenxu and Liu, Haiyan and Bao, Jason and others},
  booktitle={AAAI},
  year={2026}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Paper for ilessio-aiflowlab/project_magni