Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
High Efficiency Light Field Image Compression: Hierarchical Bit Allocation and Shearlet-based View Interpolation
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology. (Realistic3D)
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Over the years, the pursuit of capturing the precise visual information of a scenehas resulted in various enhancements in digital camera technology, such as highdynamic range, extended depth of field, and high resolution. However, traditionaldigital cameras only capture the spatial information of the scene and cannot pro-vide an immersive presentation of it. Light field (LF) capturing is a new-generationimaging technology that records the spatial and angular information of the scene. Inrecent years, LF imaging has become increasingly popular among the industry andresearch community mainly for two reasons: (1) the advancements made in optical and computational technology have facilitated the process of capturing and processing LF information and (2) LF data have the potential to offer various post-processing applications, such as refocusing at different depth planes, synthetic aperture, 3Dscene reconstruction, and novel view generation. Generally, LF-capturing devicesacquire large amounts of data, which poses a challenge for storage and transmissionresources. Off-the-shelf image and video compression schemes, built on assump-tions drawn from natural images and video, tend to exploit spatial and temporalcorrelations. However, 4D LF data inherit different properties, and hence there is aneed to advance the current compression methods to efficiently address the correla-tion present in LF data.

In this thesis, compression of LF data captured using a plenoptic camera andmulti-camera system (MCS) is considered. Perspective views of a scene capturedfrom different positions are interpreted as a frame of multiple pseudo-video se-quences and given as an input to a multi-view extension of high-efficiency videocoding (MV-HEVC). A 2D prediction and hierarchical coding scheme is proposedin MV-HEVC to improve the compression efficiency of LF data. To further increasethe compression efficiency of views captured using an MCS, an LF reconstructionscheme based on shearlet transform is introduced in LF compression. A sparse set of views is coded using MV-HEVC and later used to predict the remaining views by applying shearlet transform. The prediction error is also coded to further increase the compression efficiency. Publicly available LF datasets are used to benchmark the proposed compression schemes. The anchor scheme specified in the JPEG Plenocommon test conditions is used to evaluate the performance of the proposed scheme. Objective evaluations show that the proposed scheme outperforms state-of-the-art schemes in the compression of LF data captured using a plenoptic camera and an MCS. Moreover, the introduction of shearlet transform in LF compression further improves the compression efficiency at low bitrates, at which the human vision sys-tem is sensitive to the perceived quality.The work presented in this thesis has been published in four peer-reviewed con-ference proceedings and two scientific journals. The proposed compression solu-tions outlined in this thesis significantly improve the rate-distortion efficiency forLF content, which reduces the transmission and storage resources. The MV-HEVC-based LF coding scheme is made publicly available, which can help researchers totest novel compression tools and it can serve as an anchor scheme for future researchstudies. The shearlet-transform-based LF compression scheme presents a compre-hensive framework for testing LF reconstruction methods in the context of LF com-pression.

Place, publisher, year, edition, pages
Sundsvall: Mid Sweden University , 2021. , p. 46
Series
Mid Sweden University doctoral thesis, ISSN 1652-893X ; 341
National Category
Information Systems
Identifiers
URN: urn:nbn:se:miun:diva-41704ISBN: 978-91-88947-81-9 (print)OAI: oai:DiVA.org:miun-41704DiVA, id: diva2:1538491
Public defence
2021-04-22, C312, Holmgatan 10, Sundsvall, 09:00 (English)
Opponent
Supervisors
Available from: 2021-03-23 Created: 2021-03-19 Last updated: 2025-09-25Bibliographically approved
List of papers
1. Interpreting Plenoptic Images as Multi-View Sequences for Improved Compression
Open this publication in new window or tab >>Interpreting Plenoptic Images as Multi-View Sequences for Improved Compression
2017 (English)In: ICIP 2017, IEEE, 2017, p. 4557-4561Conference paper, Published paper (Refereed)
Abstract [en]

Over the last decade, advancements in optical devices have made it possible for new novel image acquisition technologies to appear. Angular information for each spatial point is acquired in addition to the spatial information of the scene that enables 3D scene reconstruction and various post-processing effects. Current generation of plenoptic cameras spatially multiplex the angular information, which implies an increase in image resolution to retain the level of spatial information gathered by conventional cameras. In this work, the resulting plenoptic image is interpreted as a multi-view sequence that is efficiently compressed using the multi-view extension of high efficiency video coding (MV-HEVC). A novel two dimensional weighted prediction and rate allocation scheme is proposed to adopt the HEVC compression structure to the plenoptic image properties. The proposed coding approach is a response to ICIP 2017 Grand Challenge: Light field Image Coding. The proposed scheme outperforms all ICME contestants, and improves on the JPEG-anchor of ICME with an average PSNR gain of 7.5 dB and the HEVC-anchor of ICIP 2017 Grand Challenge with an average PSNR gain of 2.4 dB.

Place, publisher, year, edition, pages
IEEE, 2017
Keywords
Light field, plenoptic, MV-HEVC
Identifiers
urn:nbn:se:miun:diva-31455 (URN)10.1109/ICIP.2017.8297145 (DOI)000428410704138 ()2-s2.0-85045337163 (Scopus ID)978-1-5090-2175-8 (ISBN)
Conference
24TH IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), Beijing, China 17-20 September 2017
Note

Accepted paper.

Available from: 2017-08-22 Created: 2017-08-22 Last updated: 2025-09-25Bibliographically approved
2. Compression scheme for sparsely sampled light field data based on pseudo multi-view sequences
Open this publication in new window or tab >>Compression scheme for sparsely sampled light field data based on pseudo multi-view sequences
2018 (English)In: OPTICS, PHOTONICS, AND DIGITAL TECHNOLOGIES FOR IMAGING APPLICATIONS V Proceedings of SPIE - The International Society for Optical Engineering, SPIE - International Society for Optical Engineering, 2018, Vol. 10679, article id 106790MConference paper, Published paper (Refereed)
Abstract [en]

With the advent of light field acquisition technologies, the captured information of the scene is enriched by having both angular and spatial information. The captured information provides additional capabilities in the post processing stage, e.g. refocusing, 3D scene reconstruction, synthetic aperture etc. Light field capturing devices are classified in two categories. In the first category, a single plenoptic camera is used to capture a densely sampled light field, and in second category, multiple traditional cameras are used to capture a sparsely sampled light field. In both cases, the size of captured data increases with the additional angular information. The recent call for proposal related to compression of light field data by JPEG, also called “JPEG Pleno”, reflects the need of a new and efficient light field compression solution. In this paper, we propose a compression solution for sparsely sampled light field data. In a multi-camera system, each view depicts the scene from a single perspective. We propose to interpret each single view as a frame of pseudo video sequence. In this way, complete MxN views of multi-camera system are treated as M pseudo video sequences, where each pseudo video sequence contains N frames. The central pseudo video sequence is taken as base View and first frame in all the pseudo video sequences is taken as base Picture Order Count (POC). The frame contained in base view and base POC is labeled as base frame. The remaining frames are divided into three predictor levels. Frames placed in each successive level can take prediction from previously encoded frames. However, the frames assigned with last prediction level are not used for prediction of other frames. Moreover, the rate-allocation for each frame is performed by taking into account its predictor level, its frame distance and view wise decoding distance relative to the base frame. The multi-view extension of high efficiency video coding (MV-HEVC) is used to compress the pseudo multi-view sequences. The MV-HEVC compression standard enables the frames to take prediction in both direction (horizontal and vertical d), and MV-HEVC parameters are used to implement the proposed 2D prediction and rate allocation scheme. A subset of four light field images from Stanford dataset are compressed, using the proposed compression scheme on four bitrates in order to cover the low to high bit-rates scenarios. The comparison is made with state-of-art reference encoder HEVC and its real-time implementation X265. The 17x17 grid is converted into a single pseudo sequence of 289 frames by following the order explained in JPEG Pleno call for proposal and given as input to the both reference schemes. The rate distortion analysis shows that the proposed compression scheme outperforms both reference schemes in all tested bitrate scenarios for all test images. The average BD-PSNR gain is 1.36 dB over HEVC and 2.15 dB over X265.

Place, publisher, year, edition, pages
SPIE - International Society for Optical Engineering, 2018
Series
Proceedings of SPIE, ISSN 0277-786X, E-ISSN 1996-756X
Keywords
Light field, MV-HEVC, Compression, Plenoptic, Multi-Camera
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:miun:diva-33352 (URN)10.1117/12.2315597 (DOI)000452663000017 ()2-s2.0-85052527607 (Scopus ID)
Conference
SPIE Photonics Europe 2018 Strasbourg, France, 22-26 April 2018
Available from: 2018-03-26 Created: 2018-03-26 Last updated: 2025-09-25Bibliographically approved
3. Towards a generic compression solution for densely and sparsely sampled light field data
Open this publication in new window or tab >>Towards a generic compression solution for densely and sparsely sampled light field data
2018 (English)In: Proceedings of 25TH IEEE International Conference On Image Processing, 2018, p. 654-658, article id 8451051Conference paper, Published paper (Refereed)
Abstract [en]

Light field (LF) acquisition technologies capture the spatial and angular information present in scenes. The angular information paves the way for various post-processing applications such as scene reconstruction, refocusing, and synthetic aperture. The light field is usually captured by a single plenoptic camera or by multiple traditional cameras. The former captures a dense LF, while the latter captures a sparse LF. This paper presents a generic compression scheme that efficiently compresses both densely and sparsely sampled LFs. A plenoptic image is converted into sub-aperture images, and each sub-aperture image is interpreted as a frame of a multiview sequence. In comparison, each view of the multi-camera system is treated as a frame of a multi-view sequence. The multi-view extension of high efficiency video coding (MVHEVC) is used to encode the pseudo multi-view sequence.This paper proposes an adaptive prediction and rate allocation scheme that efficiently compresses LF data irrespective of the acquisition technology used.

Keywords
Light field, plenoptic, Multi-camera, MVHEVC
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:miun:diva-34283 (URN)10.1109/ICIP.2018.8451051 (DOI)000455181500132 ()2-s2.0-85062918597 (Scopus ID)978-1-4799-7061-2 (ISBN)
Conference
25TH IEEE International Conference On Image Processing (ICIP), Athens, Greece, October 7-10, 2018
Funder
EU, Horizon 2020
Available from: 2018-08-20 Created: 2018-08-20 Last updated: 2025-09-25
4. Computationally Efficient Light Field Image Compression Using a Multiview HEVC Framework
Open this publication in new window or tab >>Computationally Efficient Light Field Image Compression Using a Multiview HEVC Framework
Show others...
2019 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 7, p. 143002-143014, article id 8853251Article in journal (Refereed) Published
Abstract [en]

The acquisition of the spatial and angular information of a scene using light eld (LF) technologies supplement a wide range of post-processing applications, such as scene reconstruction, refocusing, virtual view synthesis, and so forth. The additional angular information possessed by LF data increases the size of the overall data captured while offering the same spatial resolution. The main contributor to the size of captured data (i.e., angular information) contains a high correlation that is exploited by state-of-the-art video encoders by treating the LF as a pseudo video sequence (PVS). The interpretation of LF as a single PVS restricts the encoding scheme to only utilize a single-dimensional angular correlation present in the LF data. In this paper, we present an LF compression framework that efciently exploits the spatial and angular correlation using a multiview extension of high-efciency video coding (MV-HEVC). The input LF views are converted into multiple PVSs and are organized hierarchically. The rate-allocation scheme takes into account the assigned organization of frames and distributes quality/bits among them accordingly. Subsequently, the reference picture selection scheme prioritizes the reference frames based on the assigned quality. The proposed compression scheme is evaluated by following the common test conditions set by JPEG Pleno. The proposed scheme performs 0.75 dB better compared to state-of-the-art compression schemes and 2.5 dB better compared to the x265-based JPEG Pleno anchor scheme. Moreover, an optimized motionsearch scheme is proposed in the framework that reduces the computational complexity (in terms of the sum of absolute difference [SAD] computations) of motion estimation by up to 87% with a negligible loss in visual quality (approximately 0.05 dB).

Keywords
Compression, light field, MV-HEVC, plenoptic
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:miun:diva-37489 (URN)10.1109/ACCESS.2019.2944765 (DOI)000497156000230 ()2-s2.0-85077687836 (Scopus ID)
Available from: 2019-10-07 Created: 2019-10-07 Last updated: 2025-09-25Bibliographically approved
5. Shearlet Transform Based Prediction Scheme for Light Field Compression
Open this publication in new window or tab >>Shearlet Transform Based Prediction Scheme for Light Field Compression
Show others...
2018 (English)In: 2018 DATA COMPRESSION CONFERENCE (DCC 2018) / [ed] Bilgin, A Marcellin, MW SerraSagrista, J Storer, JA, IEEE, 2018, p. 396-396Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2018
Series
IEEE Data Compression Conference, ISSN 1068-0314
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:miun:diva-39964 (URN)10.1109/DCC.2018.00049 (DOI)000540644700042 ()978-1-5386-4883-4 (ISBN)
Conference
Data Compression Conference (DCC), MAR 27-30, 2018, Snowbird, UT
Available from: 2020-09-28 Created: 2020-09-28 Last updated: 2025-09-25Bibliographically approved
6. Shearlet Transform-Based Light Field Compression under Low Bitrates
Open this publication in new window or tab >>Shearlet Transform-Based Light Field Compression under Low Bitrates
Show others...
2020 (English)In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 29, p. 4269-4280, article id 8974608Article in journal (Refereed) Published
Abstract [en]

Light field (LF) acquisition devices capture spatial and angular information of a scene. In contrast with traditional cameras, the additional angular information enables novel post-processing applications, such as 3D scene reconstruction, the ability to refocus at different depth planes, and synthetic aperture. In this paper, we present a novel compression scheme for LF data captured using multiple traditional cameras. The input LF views were divided into two groups: key views and decimated views. The key views were compressed using the multi-view extension of high-efficiency video coding (MV-HEVC) scheme, and decimated views were predicted using the shearlet-transform-based prediction (STBP) scheme. Additionally, the residual information of predicted views was also encoded and sent along with the coded stream of key views. The proposed scheme was evaluated over a benchmark multi-camera based LF datasets, demonstrating that incorporating the residual information into the compression scheme increased the overall peak signal to noise ratio (PSNR) by 2 dB. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. 

Place, publisher, year, edition, pages
IEEE, 2020
Keywords
Light field (LF) coding, multi-view extension of high-efficiency video coding (MV-HEVC), multiple camera system (MCS) coding, shearlet
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:miun:diva-38493 (URN)10.1109/TIP.2020.2969087 (DOI)000619156700007 ()2-s2.0-85079506505 (Scopus ID)
Available from: 2020-02-24 Created: 2020-02-24 Last updated: 2025-09-25Bibliographically approved

Open Access in DiVA

fulltext(5354 kB)1199 downloads
File information
File name FULLTEXT01.pdfFile size 5354 kBChecksum SHA-512
16a53b8f28c36a847c2aa4b6209f67ded5a88425e51c1ed28fa84021751a3d230b707aca993229675e0fbb448f2170462d621d82a58313658427d29948439d08
Type fulltextMimetype application/pdf

Authority records

Ahmad, Waqas

Search in DiVA

By author/editor
Ahmad, Waqas
By organisation
Department of Information Systems and Technology
Information Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 1200 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2126 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf