Open this publication in new window or tab >>2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]
The demand for immersive visual experiences in applications like virtual reality and telepresence has highlighted the limitations of traditional 2D imaging. Light field (LF) imaging addresses this by capturing a 4D representation of a scene, encoding both spatial (texture) and angular (viewpoint) information. This richness enables true parallax and depth perception but creates a significant data bottleneck, as the massive data volumes are a major obstacle to efficient storage, transmission, and real-time processing. Conventional compression methods often treat LF data as a simple sequence of images, failing to effectively exploit the underlying spatial-angular structure, which leads to sub-optimal performance.
This thesis addresses the challenge of efficient LF compression by developing a principled, learning-based framework centered on spatial-angular disentanglement. The core of the work is a series of Variational Autoencoder (VAE)-based architectures that explicitly separate spatial and angular features into distinct latent representations. This approach provides greater flexibility and efficiency by allowing each domain to be modeled according to its unique statistical properties. The foundational VAE model is progressively advanced through two key contributions: first, the integration of dual-hyperprior entropy models to learn tailored probability distributions for each latent stream, improving rate-distortion performance; and second, the introduction of an information-theoretic regularizer to ensure robust feature separation. Finally, a lightweight, modular compression pipeline is proposed to further compress these latent representations without requiring network retraining.
The proposed methods were rigorously evaluated on standard public LF datasets as well as a novel spherical LF dataset created as part of this research to support immersive telepresence scenarios. Objective evaluations demonstrate that the disentangled frameworks achieve a superior rate-distortion performance, with significant Bjontegaard Delta-Peak Signal-to-Noise Ratio (BD-PSNR) gains over state-of-the-art learning-based and traditional codecs. Crucially, the methods also offer substantially faster encoding and decoding times, a critical requirement for real-time applications. To assess perceptual performance, a formal subjective quality study was conducted, which confirmed that the proposed methods deliver improved visual quality, particularly in preserving angular consistency and reducing artifacts that impair the immersive experience.
In conclusion, this thesis demonstrates that explicitly disentangling, modeling, and compressing the spatial and angular components of light fields is a highly effective strategy. The developed frameworks and tools advance the state-of-the-art by providing practical and scalable solutions that balance compression efficiency, computational speed, and perceptual quality. This work makes a significant contribution toward the feasibility of using high-quality LF imaging in bandwidth-constrained immersive applications. This compilation thesis is based on the contributions of six peer-reviewed scientific publications.
Place, publisher, year, edition, pages
Sundsvall: Mid Sweden University, 2026. p. 70
Series
Mid Sweden University doctoral thesis, ISSN 1652-893X ; 440
Series
École Doctoral: Centre Inria de l’Université de Rennes ; 601
National Category
Computer Vision and Learning Systems
Identifiers
urn:nbn:se:miun:diva-56370 (URN)978-91-90017-39-5 (ISBN)
Public defence
2026-01-22, L111, Holmgatan 10, Sundsvall, 09:15 (English)
Opponent
Supervisors
Note
As part of a double degree with Université de Rennes.
2026-01-092026-01-072026-01-19Bibliographically approved