SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries
Paper • 2510.17482 • Published
Part of the ANIMA Perception Suite by Robot Flow Labs.
SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries Chenxu Dang, Haiyan Liu, Jason Bao, et al. arXiv:2510.17482 (AAAI 2026)
SparseWorld uses 1040 sparse dynamic queries (720 current + 320 future) with a 6-layer OPUS decoder to predict 4D occupancy from camera images. Key components:
| Phase | Epochs | Loss (start -> end) | Status |
|---|---|---|---|
| Pretrain | 5 | 54.15 -> 18.53 | DONE |
| Finetune | 59 | - | Pending |
| File | Phase | Epoch | Description |
|---|---|---|---|
pretrain/checkpoints/pretrain_epoch1.pth |
Pretrain | 1 | After first pass through data |
pretrain/checkpoints/pretrain_epoch2.pth |
Pretrain | 2 | |
pretrain/checkpoints/pretrain_epoch3.pth |
Pretrain | 3 | |
pretrain/checkpoints/pretrain_epoch4.pth |
Pretrain | 4 | |
pretrain/checkpoints/pretrain_epoch5.pth |
Pretrain | 5 | Best pretrain checkpoint |
Each checkpoint is ~870 MB and contains model weights, optimizer state, and scheduler state for resume.
# Load pretrain checkpoint for finetuning
import torch
from mmcv import Config
from mmdet3d.models import build_model
cfg = Config.fromfile("configs/sparseworld/nuscenes-temporal/sparseworld-traj-finetune.py")
model = build_model(cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))
# Load pretrained weights
ckpt = torch.load("pretrain/checkpoints/pretrain_epoch5.pth", map_location='cpu')
model.load_state_dict(ckpt['state_dict'])
After finetune training completes:
Apache 2.0 — Robot Flow Labs / AIFLOW LABS LIMITED
@inproceedings{dang2026sparseworld,
title={SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries},
author={Dang, Chenxu and Liu, Haiyan and Bao, Jason and others},
booktitle={AAAI},
year={2026}
}