Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient and Fast Light Field Compression via VAE-Based Spatial and Angular Disentanglement
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-). INRIA, Rennes, France. (Realistic3D)ORCID iD: 0009-0008-5477-0920
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-). (Realistic3D)ORCID iD: 0000-0001-7416-5615
INRIA, Rennes, France.
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-). (Realistic3D)ORCID iD: 0000-0003-3751-6089
2025 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 13, p. 18594-18607Article in journal (Refereed) Published
Abstract [en]

Light field (LF) imaging captures both spatial and angular information, which is essential for applications such as depth estimation, view synthesis, and post-capture refocusing. However, the efficient processing of this data, particularly in terms of compression and encoding/decoding time, presents challenges. We propose a Variational Autoencoder (VAE)-based framework to disentangle the spatial and angular features of light field images, focusing on fast and efficient compression. Our method uses two separate sub-encoders-one for spatial and one for angular features-to allow for independent processing in the latent space, which facilitates more streamlined compression workflows. Evaluations on standard light field datasets demonstrate that our approach reduces encoding and decoding time significantly, with a slight trade-off in Rate-Distortion (RD) performance, making it suitable for real-time applications.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2025. Vol. 13, p. 18594-18607
Keywords [en]
Light fields, Image coding, Decoding, Kernel, Image reconstruction, Feature extraction, Training, Imaging, Streaming media, Redundancy, Light field, compression, disentangling, variational auto-encoder
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:miun:diva-53824DOI: 10.1109/ACCESS.2025.3532608ISI: 001410367700049Scopus ID: 2-s2.0-85216292711OAI: oai:DiVA.org:miun-53824DiVA, id: diva2:1937676
Available from: 2025-02-14 Created: 2025-02-14 Last updated: 2026-01-09
In thesis
1. VAE-Based Compression of Light Field Images Using Disentangled Latent Modeling and Perceptual Quality Assessment
Open this publication in new window or tab >>VAE-Based Compression of Light Field Images Using Disentangled Latent Modeling and Perceptual Quality Assessment
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The demand for immersive visual experiences in applications like virtual reality and telepresence has highlighted the limitations of traditional 2D imaging. Light field (LF) imaging addresses this by capturing a 4D representation of a scene, encoding both spatial (texture) and angular (viewpoint) information. This richness enables true parallax and depth perception but creates a significant data bottleneck, as the massive data volumes are a major obstacle to efficient storage, transmission, and real-time processing. Conventional compression methods often treat LF data as a simple sequence of images, failing to effectively exploit the underlying spatial-angular structure, which leads to sub-optimal performance.

This thesis addresses the challenge of efficient LF compression by developing a principled, learning-based framework centered on spatial-angular disentanglement. The core of the work is a series of Variational Autoencoder (VAE)-based architectures that explicitly separate spatial and angular features into distinct latent representations. This approach provides greater flexibility and efficiency by allowing each domain to be modeled according to its unique statistical properties. The foundational VAE model is progressively advanced through two key contributions: first, the integration of dual-hyperprior entropy models to learn tailored probability distributions for each latent stream, improving rate-distortion performance; and second, the introduction of an information-theoretic regularizer to ensure robust feature separation. Finally, a lightweight, modular compression pipeline is proposed to further compress these latent representations without requiring network retraining.

The proposed methods were rigorously evaluated on standard public LF datasets as well as a novel spherical LF dataset created as part of this research to support immersive telepresence scenarios. Objective evaluations demonstrate that the disentangled frameworks achieve a superior rate-distortion performance, with significant Bjontegaard Delta-Peak Signal-to-Noise Ratio (BD-PSNR) gains over state-of-the-art learning-based and traditional codecs. Crucially, the methods also offer substantially faster encoding and decoding times, a critical requirement for real-time applications. To assess perceptual performance, a formal subjective quality study was conducted, which confirmed that the proposed methods deliver improved visual quality, particularly in preserving angular consistency and reducing artifacts that impair the immersive experience.

In conclusion, this thesis demonstrates that explicitly disentangling, modeling, and compressing the spatial and angular components of light fields is a highly effective strategy. The developed frameworks and tools advance the state-of-the-art by providing practical and scalable solutions that balance compression efficiency, computational speed, and perceptual quality. This work makes a significant contribution toward the feasibility of using high-quality LF imaging in bandwidth-constrained immersive applications. This compilation thesis is based on the contributions of six peer-reviewed scientific publications.

Place, publisher, year, edition, pages
Sundsvall: Mid Sweden University, 2026. p. 70
Series
Mid Sweden University doctoral thesis, ISSN 1652-893X ; 440
Series
École Doctoral: Centre Inria de l’Université de Rennes ; 601
National Category
Computer Vision and Learning Systems
Identifiers
urn:nbn:se:miun:diva-56370 (URN)978-91-90017-39-5 (ISBN)
Public defence
2026-01-22, L111, Holmgatan 10, Sundsvall, 09:15 (English)
Opponent
Supervisors
Note

As part of a double degree with Université de Rennes.

Available from: 2026-01-09 Created: 2026-01-07 Last updated: 2026-01-19Bibliographically approved

Open Access in DiVA

fulltext(4792 kB)138 downloads
File information
File name FULLTEXT01.pdfFile size 4792 kBChecksum SHA-512
3e9c9c211b729653255bf401e454d28a30013297a5f6db5b499bda2f5575d9ba9a788ce65d01c3cbc6b59fc99ceb5fa59ffb42f10c0b62508379f7d5ec77e8ce
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Takhtardeshir, SoheibOlsson, RogerSjöström, Mårten

Search in DiVA

By author/editor
Takhtardeshir, SoheibOlsson, RogerSjöström, Mårten
By organisation
Department of Computer and Electrical Engineering (2023-)
In the same journal
IEEE Access
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 138 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 891 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf