A Lightweight Spatiotemporal Saliency Detection Framework for VR Panoramic Dynamic Scenes
Abstract
Saliency detection in virtual reality (VR) panoramic dynamic scenes faces two major challenges: geometric distortion caused by equirectangular projection (ERP) and the high computational cost of modeling long-term temporal dependencies. To address these issues, we propose TAD-Net, a lightweight spatiotemporal saliency detection framework that integrates cubemap projection (CMP), temporal attention, knowledge distillation, and adversarial training. CMP efficiently reduces panoramic distortion while enabling standard 2D convolutional processing. A dual-stream network extracts spatial appearance and temporal motion features, and a temporal attention module enhances dynamic saliency discrimination. To reconcile the accuracy–latency trade-off, a heavy teacher model transfers long-range temporal knowledge to a lightweight student model via distillation, while adversarial training improves boundary sharpness. Extensive experiments on Salient360-Dynamic and VR-EyeDynamic demonstrate that TAD-Net achieves state-of-the-art performance, improving AUC-Judd by up to 5.2% while maintaining real-time inference at 35.1 FPS on an RTX 3080 GPU. Cross-dataset evaluation confirms robust generalization under domain shifts. The results indicate that the proposed projection–perception-distillation pipeline effectively balances geometric correction, temporal reasoning, and real-time constraints in VR applications.
Keywords
Full Text:
PDF
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
![]() |

Journal of Computing and Information Technology
