A Lightweight Spatiotemporal Saliency Detection Framework for VR Panoramic Dynamic Scenes

Dezhi Kong; Huijuan Hao; Bo Gao

A Lightweight Spatiotemporal Saliency Detection Framework for VR Panoramic Dynamic Scenes

Dezhi Kong, Huijuan Hao, Bo Gao

Abstract

Saliency detection in virtual reality (VR) panoramic dynamic scenes faces two major challenges: geometric distortion caused by equirectangular projection (ERP) and the high computational cost of modeling long-term temporal dependencies. To address these issues, we propose TAD-Net, a lightweight spatiotemporal saliency detection framework that integrates cubemap projection (CMP), temporal attention, knowledge distillation, and adversarial training. CMP efficiently reduces panoramic distortion while enabling standard 2D convolutional processing. A dual-stream network extracts spatial appearance and temporal motion features, and a temporal attention module enhances dynamic saliency discrimination. To reconcile the accuracy–latency trade-off, a heavy teacher model transfers long-range temporal knowledge to a lightweight student model via distillation, while adversarial training improves boundary sharpness. Extensive experiments on Salient360-Dynamic and VR-EyeDynamic demonstrate that TAD-Net achieves state-of-the-art performance, improving AUC-Judd by up to 5.2% while maintaining real-time inference at 35.1 FPS on an RTX 3080 GPU. Cross-dataset evaluation confirms robust generalization under domain shifts. The results indicate that the proposed projection–perception-distillation pipeline effectively balances geometric correction, temporal reasoning, and real-time constraints in VR applications.

Keywords

VR panoramic images, dynamic scenes, saliency detection, spatiotemporal features, attention mechanism, lightweight optimization

Full Text:

PDF

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Username
Password
Remember me

Journal of Computing and Information Technology