RAD: Training an End-to-End Driving Policy via Large-Scale

3DGS-based Reinforcement Learning

1Huazhong University of Science & Technology 2Horizon Robotics

Intern of Horizon Robotics. Project Lead. Corresponding Author.

Abstract

Existing end-to-end autonomous driving (AD) algorithms typically follow the Imitation Learning (IL) paradigm, which faces challenges such as causal confusion and the open-loop gap. In this work, we establish a 3DGS-based closed-loop Reinforcement Learning (RL) training paradigm. By leveraging 3DGS techniques, we construct a photorealistic digital replica of the real physical world, enabling the AD policy to extensively explore the state space and learn to handle out-of-distribution scenarios through large-scale trial and error. To enhance safety, we design specialized rewards that guide the policy to effectively respond to safety-critical events and understand real-world causal relationships. For better alignment with human driving behavior, IL is incorporated into RL training as a regularization term. We introduce a closed-loop evaluation benchmark consisting of diverse, previously unseen 3DGS environments. Compared to IL-based methods, RAD achieves stronger performance in most closed-loop metrics, especially 3x lower collision rate.


Closed-loop Evaluation in 3DGS-based Environments

Framework

RAD takes a three-stage training paradigm. In the perception pre-training, ground-truths of map and agent are used to guide instance-level tokens to encode corresponding information. In the planning pre-training stage, large-scale driving demonstrations are used to initialize the action distribution. In the reinforced post-training stage, RL and IL synergistically fine-tune the AD policy.

IL-only Policy vs RAD: Closed-loop Evaluation

Left: IL-only Policy| Right: RAD

Clip 1 : Detour; Right-turn

IL-only Policy: Dynamic Collision

RAD: Success

Clip 2 : Yield to Pedestrians

IL-only Policy: Dynamic Collision

RAD: Success

Clip 3 : U-turn

IL-only Policy: Dynamic Collision

RAD: Success

Clip 4 : Crawl in Dense Traffic

IL-only Policy: Position Deviation

RAD: Success

Clip 5 : Unprotected Left-turn

IL-only Policy: Dynamic Collision

RAD: Success

Clip 6 : Crawl in Dense Traffic

IL-only Policy: Dynamic Collision

RAD: Success

Clip 7 : Unprotected Left-turn

IL-only Policy: Dynamic Collision

RAD: Success

Clip 8 : Detour; Traverse Narrow Lane

IL-only Policy: Dynamic Collision

RAD: Success

Clip 9 : Unprotected Left-turn

IL-only Policy: Dynamic Collision

RAD: Success

Clip 10 : Stop-and-Start Following

IL-only Policy: Heading Deviation

RAD: Success

BibTeX

@article{RAD,
  title={RAD: Training an End-to-End Driving Policy via Large-Scale 3DGS-based Reinforcement Learning},
  author={Gao, Hao and Chen, Shaoyu and Jiang, Bo and Liao, Bencheng and Shi, Yiang and Guo, Xiaoyang and Pu, Yuechuan and Yin, Haoran and Li, Xiangyu and Zhang, Xinbang and Zhang, Ying and Liu, Wenyu and Zhang, Qian and Wang, Xinggang},
  journal={arXiv preprint arXiv:2502.13144},
  year={2025}
}