Mittuniversitetet

miun.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
VAE-Based Compression of Light Field Images Using Disentangled Latent Modeling and Perceptual Quality Assessment
Mittuniversitetet, Fakulteten för naturvetenskap, teknik och medier, Institutionen för data- och elektroteknik (2023-).ORCID-id: 0009-0008-5477-0920
2026 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The demand for immersive visual experiences in applications like virtual reality and telepresence has highlighted the limitations of traditional 2D imaging. Light field (LF) imaging addresses this by capturing a 4D representation of a scene, encoding both spatial (texture) and angular (viewpoint) information. This richness enables true parallax and depth perception but creates a significant data bottleneck, as the massive data volumes are a major obstacle to efficient storage, transmission, and real-time processing. Conventional compression methods often treat LF data as a simple sequence of images, failing to effectively exploit the underlying spatial-angular structure, which leads to sub-optimal performance.

This thesis addresses the challenge of efficient LF compression by developing a principled, learning-based framework centered on spatial-angular disentanglement. The core of the work is a series of Variational Autoencoder (VAE)-based architectures that explicitly separate spatial and angular features into distinct latent representations. This approach provides greater flexibility and efficiency by allowing each domain to be modeled according to its unique statistical properties. The foundational VAE model is progressively advanced through two key contributions: first, the integration of dual-hyperprior entropy models to learn tailored probability distributions for each latent stream, improving rate-distortion performance; and second, the introduction of an information-theoretic regularizer to ensure robust feature separation. Finally, a lightweight, modular compression pipeline is proposed to further compress these latent representations without requiring network retraining.

The proposed methods were rigorously evaluated on standard public LF datasets as well as a novel spherical LF dataset created as part of this research to support immersive telepresence scenarios. Objective evaluations demonstrate that the disentangled frameworks achieve a superior rate-distortion performance, with significant Bjontegaard Delta-Peak Signal-to-Noise Ratio (BD-PSNR) gains over state-of-the-art learning-based and traditional codecs. Crucially, the methods also offer substantially faster encoding and decoding times, a critical requirement for real-time applications. To assess perceptual performance, a formal subjective quality study was conducted, which confirmed that the proposed methods deliver improved visual quality, particularly in preserving angular consistency and reducing artifacts that impair the immersive experience.

In conclusion, this thesis demonstrates that explicitly disentangling, modeling, and compressing the spatial and angular components of light fields is a highly effective strategy. The developed frameworks and tools advance the state-of-the-art by providing practical and scalable solutions that balance compression efficiency, computational speed, and perceptual quality. This work makes a significant contribution toward the feasibility of using high-quality LF imaging in bandwidth-constrained immersive applications. This compilation thesis is based on the contributions of six peer-reviewed scientific publications.

Ort, förlag, år, upplaga, sidor
Sundsvall: Mid Sweden University , 2026. , s. 70
Serie
Mid Sweden University doctoral thesis, ISSN 1652-893X ; 440
Serie
École Doctoral: Centre Inria de l’Université de Rennes ; 601
Nationell ämneskategori
Datorseende och lärande system
Identifikatorer
URN: urn:nbn:se:miun:diva-56370ISBN: 978-91-90017-39-5 (tryckt)OAI: oai:DiVA.org:miun-56370DiVA, id: diva2:2025552
Disputation
2026-01-22, L111, Holmgatan 10, Sundsvall, 09:15 (Engelska)
Opponent
Handledare
Anmärkning

As part of a double degree with Université de Rennes.

Tillgänglig från: 2026-01-09 Skapad: 2026-01-07 Senast uppdaterad: 2026-01-19Bibliografiskt granskad
Delarbeten
1. A Deep Learning based Light Field Image Compression as Pseudo Video Sequences with Additional in-loop Filtering
Öppna denna publikation i ny flik eller fönster >>A Deep Learning based Light Field Image Compression as Pseudo Video Sequences with Additional in-loop Filtering
2024 (Engelska)Ingår i: 3D Imaging and Applications 2024-Electronic Imaging, San Francisco Airport in Burlingame, California: Society for Imaging Science & Technology , 2024, Vol. 36, s. 1-6Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In recent years, several deep learning-based architectures have been proposed to compress Light Field (LF) images as pseudo video sequences. However, most of these techniques employ conventional compression-focused networks. In this paper, we introduce a version of a previously designed deep learning video compression network, adapted and optimized specifically for LF image compression. We enhance this network by incorporating an in-loop filtering block, along with additional adjustments and fine-tuning. By treating LF images as pseudo video sequences and deploying our adapted network, we manage to address challenges presented by the unique features of LF images, such as high resolution and large data sizes. Our method compresses these images competently, preserving their quality and unique characteristics. With the thorough fine-tuning and inclusion of the in-loop filtering network, our approach shows improved performance in terms of Peak Signal-to-Noise Ratio (PSNR) and Mean Structural Similarity Index Measure (MSSIM) when compared to other existing techniques. Our method provides a feasible path for LF image compression and may contribute to the emergence of new applications and advancements in this field.

Ort, förlag, år, upplaga, sidor
San Francisco Airport in Burlingame, California: Society for Imaging Science & Technology, 2024
Nyckelord
Compression, Deep Learning, Light Field Coding, Pseudo Video Sequence
Nationell ämneskategori
Datorteknik
Identifikatorer
urn:nbn:se:miun:diva-50480 (URN)10.2352/EI.2024.36.18.3DIA-103 (DOI)
Konferens
3D Imaging and Applications 2024
Projekt
Plenoptima
Tillgänglig från: 2024-02-07 Skapad: 2024-02-07 Senast uppdaterad: 2026-01-09Bibliografiskt granskad
2. A Spherical Light Field Database for Immersive Telecommunication and Telepresence Applications
Öppna denna publikation i ny flik eller fönster >>A Spherical Light Field Database for Immersive Telecommunication and Telepresence Applications
Visa övriga...
2024 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Immersive imaging technologies provide an enhanced user experience for visual applications and are getting ready for commonplace use by the industry and the general populace. In particular, light field is a promising technology that enables the capture and reproduction of real light rays from the scene, which can provide a backbone for immersive telecommunication and telepresence applications. Nevertheless, there are still many challenges in transmitting and reproducing light field data. This paper proposes a spherical light field dataset that can be used as a foundation for developing telepresence applications. The Spherical Light Field Database (SLFDB) consists of a light field of 60 views captured with an omnidirectional camera in 20 scenes. To show the usefulness of the proposed database, we provide two use cases: compression and viewpoint estimation. The initial results validate that the publicly available SLFDB will benefit the scientific community.

Ort, förlag, år, upplaga, sidor
IEEE, 2024
Nyckelord
light field, omnidirectional, compression, view synthesis, telepresence
Nationell ämneskategori
Signalbehandling Datorteknik
Identifikatorer
urn:nbn:se:miun:diva-52159 (URN)10.1109/QoMEX61742.2024.10598264 (DOI)001289486600036 ()2-s2.0-85201055576 (Scopus ID)
Konferens
2024 16th International Conference on Quality of Multimedia Experience (QoMEX)
Projekt
InfoVizPlenoptima
Forskningsfinansiär
KK-stiftelsen, 2019-0251EU, Horisont 2020, 956770
Tillgänglig från: 2024-08-19 Skapad: 2024-08-19 Senast uppdaterad: 2026-01-09
3. Efficient and Fast Light Field Compression via VAE-Based Spatial and Angular Disentanglement
Öppna denna publikation i ny flik eller fönster >>Efficient and Fast Light Field Compression via VAE-Based Spatial and Angular Disentanglement
2025 (Engelska)Ingår i: IEEE Access, E-ISSN 2169-3536, Vol. 13, s. 18594-18607Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Light field (LF) imaging captures both spatial and angular information, which is essential for applications such as depth estimation, view synthesis, and post-capture refocusing. However, the efficient processing of this data, particularly in terms of compression and encoding/decoding time, presents challenges. We propose a Variational Autoencoder (VAE)-based framework to disentangle the spatial and angular features of light field images, focusing on fast and efficient compression. Our method uses two separate sub-encoders-one for spatial and one for angular features-to allow for independent processing in the latent space, which facilitates more streamlined compression workflows. Evaluations on standard light field datasets demonstrate that our approach reduces encoding and decoding time significantly, with a slight trade-off in Rate-Distortion (RD) performance, making it suitable for real-time applications.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2025
Nyckelord
Light fields, Image coding, Decoding, Kernel, Image reconstruction, Feature extraction, Training, Imaging, Streaming media, Redundancy, Light field, compression, disentangling, variational auto-encoder
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:miun:diva-53824 (URN)10.1109/ACCESS.2025.3532608 (DOI)001410367700049 ()2-s2.0-85216292711 (Scopus ID)
Tillgänglig från: 2025-02-14 Skapad: 2025-02-14 Senast uppdaterad: 2026-01-09
4. Subjective Visual Quality Assessment of Compressed Light Field Images: Learning-based vs. Conventional Methods
Öppna denna publikation i ny flik eller fönster >>Subjective Visual Quality Assessment of Compressed Light Field Images: Learning-based vs. Conventional Methods
Visa övriga...
2025 (Engelska)Ingår i: 2025 IEEE International Workshop on Multimedia Signal Processing (MMSP), IEEE conference proceedings, 2025, s. 96-101Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Light fields (LF) technology enables the capture and reproduction of a 3D scene accurately, which enhances visual experience in various applications. The sheer volume of multiplicity of the captured views creates logistical problems in both storage and data transmission, which makes LF compression crucial. Even though many LF compression techniques have been proposed and evaluated in recent years, the leading-edge learning-based approaches have not been subject to the same level of scrutiny. This paper presents a subjective quality assessment study on four different LF compression methods, including two learning-based LF compression methods, which have not been studied before from a subjective quality point of view. For this purpose, subjective opinion scores were collected from viewers in two different universities for a cross-lab study. The results indicate that the learning-based compression methods have different behavior in their rate-distortion curves, and that there is room for improvement for learning-based methods. A qualitative analysis also shows that their artifact structures are different from conventional ones. The results highlight the need for a perceptual objective quality metric that takes different types of artifacts into account. The obtained subjective quality database (MiX-LFQDB) is made public to support further research in this area: \url{https://doi.org/10.5281/zenodo.16778670

Ort, förlag, år, upplaga, sidor
IEEE conference proceedings, 2025
Nyckelord
light field compression, subjective quality assessment, cross-lab study, perceived quality, autoencoders
Nationell ämneskategori
Signalbehandling Datorseende och lärande system
Identifikatorer
urn:nbn:se:miun:diva-55990 (URN)10.1109/MMSP64401.2025.11324097 (DOI)979-8-3315-9241-7 (ISBN)
Konferens
2025 IEEE International Workshop on Multimedia Signal Processing (MMSP)
Projekt
PlenoptimaIMMERSEInfoViz
Forskningsfinansiär
EU, Horisont 2020, 956770KK-stiftelsen, 2019-0251Interreg Aurora, 20366448
Tillgänglig från: 2025-11-18 Skapad: 2025-11-18 Senast uppdaterad: 2026-01-23Bibliografiskt granskad
5. DUALF-D: Disentangled dual-hyperprior approach for light field image compression
Öppna denna publikation i ny flik eller fönster >>DUALF-D: Disentangled dual-hyperprior approach for light field image compression
2026 (Engelska)Ingår i: Signal processing. Image communication, ISSN 0923-5965, E-ISSN 1879-2677, Vol. 140, artikel-id 117436Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Light field (LF) imaging captures spatial and angular information, offering a 4D scene representation enabling enhanced visual understanding. However, high dimensionality and redundancy across spatial and angular domains present major challenges for compression, particularly where storage, transmission bandwidth, or processing latency are constrained. We present a novel Variational Autoencoder (VAE)-based framework that explicitly disentangles spatial and angular features using two parallel latent branches. Each branch is coupled with an independent hyperprior model, allowing more precise distribution estimation for entropy coding and finer rate-distortion control. This dual-hyperprior structure enables the network to adaptively compress spatial and angular information based on their unique statistical characteristics, improving coding efficiency. To further enhance latent feature specialization and promote disentanglement, we introduce a mutual information-based regularization term that minimizes redundancy between the two branches while preserving feature diversity. Unlike prior methods relying on covariance-based penalties prone to collapse, our information-theoretic regularizer provides more stable and interpretable latent separation. Experimental results on publicly available LF datasets demonstrate our method achieves strong compression performance, yielding an average BD-PSNR gain of 2.91 dB over HEVC and high compression ratios (e.g., 200:1). Additionally, our design enables fast inference, with a total end-to-end time over 19x faster than the JPEG Pleno standard, making it well-suited for real-time and bandwidth-sensitive applications. By jointly leveraging disentangled representation learning, dual-hyperprior modeling, and information-theoretic regularization, our approach offers a scalable, effective solution for practical light field image compression.

Ort, förlag, år, upplaga, sidor
Elsevier BV, 2026
Nyckelord
Light Field Compression, Variational Autoencoder, Disentanglement, Dual Hyperprior, Spatial-Angular Representation
Nationell ämneskategori
Signalbehandling Datorseende och lärande system
Identifikatorer
urn:nbn:se:miun:diva-55991 (URN)10.1016/j.image.2025.117436 (DOI)001620918400001 ()2-s2.0-105021843764 (Scopus ID)
Forskningsfinansiär
EU, Horisont 2020, 956770
Tillgänglig från: 2025-11-18 Skapad: 2025-11-18 Senast uppdaterad: 2026-01-09Bibliografiskt granskad
6. DUALF-C: Disentangled Light Field Compression with Entropy-Aware Bitstream Generation
Öppna denna publikation i ny flik eller fönster >>DUALF-C: Disentangled Light Field Compression with Entropy-Aware Bitstream Generation
2025 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Recent advancements in learned light field (LF) image compression highlight the advantages of modeling spatial and angular redundancies using deep generative models. Among these, Variational Autoencoders (VAEs) have shown strong potential in learning compact latent representations of LF data. However, many existing approaches rely on simple uniform quantization and basic entropy coding, which limits compression efficiency in practical applications. This paper introduces DUALF-C, a lightweight and retrainingfree compression pipeline that augments pretrained VAE-based LF compression models using structured post-encoding transformations. The proposed framework integrates bitplane slicing, latent channel reordering, non-uniform quantization, and patch-based vector quantization to improve bitrate efficiency while preserving reconstruction quality. Experimental evaluations demonstrate that DUALF-C significantly reduces bit-per-pixel (BPP) without degrading image quality, making it a practical solution for bandwidthconstrained immersive imaging systems.

Nyckelord
Light field compression, variational autoencoder, quantization, feature reordering, vector quantization
Nationell ämneskategori
Datorseende och lärande system
Identifikatorer
urn:nbn:se:miun:diva-56382 (URN)
Konferens
Visual Communications and Image Processing (VCIP 2025) Conference, Klagenfurt, Austria, December 1-4, 2025
Tillgänglig från: 2026-01-08 Skapad: 2026-01-08 Senast uppdaterad: 2026-01-09

Open Access i DiVA

fulltext(4429 kB)50 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 4429 kBChecksumma SHA-512
2b16f2f8503a2995fb678bd57498a9262992cca737d6bfa266bc76a96d4760c9aef686340ea59b4add42f30c06660b431c43cf7ff0d53d971e102c43160f2d91
Typ fulltextMimetyp application/pdf

Person

Takhtardeshir, Soheib

Sök vidare i DiVA

Av författaren/redaktören
Takhtardeshir, Soheib
Av organisationen
Institutionen för data- och elektroteknik (2023-)
Datorseende och lärande system

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 51 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 527 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf