Mid Sweden University

miun.sePublications
Change search
Refine search result
1 - 14 of 14
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Ahmad, Waqas
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Computationally Efficient Light Field Image Compression using a Multiview HEVC Framework2019Data set
    Download full text (zip)
    dataset
  • 2.
    Ahmad, Waqas
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    High Efficiency Light Field Image Compression: Hierarchical Bit Allocation and Shearlet-based View Interpolation2021Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Over the years, the pursuit of capturing the precise visual information of a scenehas resulted in various enhancements in digital camera technology, such as highdynamic range, extended depth of field, and high resolution. However, traditionaldigital cameras only capture the spatial information of the scene and cannot pro-vide an immersive presentation of it. Light field (LF) capturing is a new-generationimaging technology that records the spatial and angular information of the scene. Inrecent years, LF imaging has become increasingly popular among the industry andresearch community mainly for two reasons: (1) the advancements made in optical and computational technology have facilitated the process of capturing and processing LF information and (2) LF data have the potential to offer various post-processing applications, such as refocusing at different depth planes, synthetic aperture, 3Dscene reconstruction, and novel view generation. Generally, LF-capturing devicesacquire large amounts of data, which poses a challenge for storage and transmissionresources. Off-the-shelf image and video compression schemes, built on assump-tions drawn from natural images and video, tend to exploit spatial and temporalcorrelations. However, 4D LF data inherit different properties, and hence there is aneed to advance the current compression methods to efficiently address the correla-tion present in LF data.

    In this thesis, compression of LF data captured using a plenoptic camera andmulti-camera system (MCS) is considered. Perspective views of a scene capturedfrom different positions are interpreted as a frame of multiple pseudo-video se-quences and given as an input to a multi-view extension of high-efficiency videocoding (MV-HEVC). A 2D prediction and hierarchical coding scheme is proposedin MV-HEVC to improve the compression efficiency of LF data. To further increasethe compression efficiency of views captured using an MCS, an LF reconstructionscheme based on shearlet transform is introduced in LF compression. A sparse set of views is coded using MV-HEVC and later used to predict the remaining views by applying shearlet transform. The prediction error is also coded to further increase the compression efficiency. Publicly available LF datasets are used to benchmark the proposed compression schemes. The anchor scheme specified in the JPEG Plenocommon test conditions is used to evaluate the performance of the proposed scheme. Objective evaluations show that the proposed scheme outperforms state-of-the-art schemes in the compression of LF data captured using a plenoptic camera and an MCS. Moreover, the introduction of shearlet transform in LF compression further improves the compression efficiency at low bitrates, at which the human vision sys-tem is sensitive to the perceived quality.The work presented in this thesis has been published in four peer-reviewed con-ference proceedings and two scientific journals. The proposed compression solu-tions outlined in this thesis significantly improve the rate-distortion efficiency forLF content, which reduces the transmission and storage resources. The MV-HEVC-based LF coding scheme is made publicly available, which can help researchers totest novel compression tools and it can serve as an anchor scheme for future researchstudies. The shearlet-transform-based LF compression scheme presents a compre-hensive framework for testing LF reconstruction methods in the context of LF com-pression.

    Download full text (pdf)
    fulltext
  • 3.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Ghafoor, Mubeen
    COMSATS University Islamabad, Pakistan.
    Tariq, Syed Ali
    COMSATS University Islamabad, Pakistan.
    Hassan, Ali
    COMSATS University Islamabad, Pakistan.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Olsson, Roger
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Computationally Efficient Light Field Image Compression Using a Multiview HEVC Framework2019In: IEEE Access, E-ISSN 2169-3536, Vol. 7, p. 143002-143014, article id 8853251Article in journal (Refereed)
    Abstract [en]

    The acquisition of the spatial and angular information of a scene using light eld (LF) technologies supplement a wide range of post-processing applications, such as scene reconstruction, refocusing, virtual view synthesis, and so forth. The additional angular information possessed by LF data increases the size of the overall data captured while offering the same spatial resolution. The main contributor to the size of captured data (i.e., angular information) contains a high correlation that is exploited by state-of-the-art video encoders by treating the LF as a pseudo video sequence (PVS). The interpretation of LF as a single PVS restricts the encoding scheme to only utilize a single-dimensional angular correlation present in the LF data. In this paper, we present an LF compression framework that efciently exploits the spatial and angular correlation using a multiview extension of high-efciency video coding (MV-HEVC). The input LF views are converted into multiple PVSs and are organized hierarchically. The rate-allocation scheme takes into account the assigned organization of frames and distributes quality/bits among them accordingly. Subsequently, the reference picture selection scheme prioritizes the reference frames based on the assigned quality. The proposed compression scheme is evaluated by following the common test conditions set by JPEG Pleno. The proposed scheme performs 0.75 dB better compared to state-of-the-art compression schemes and 2.5 dB better compared to the x265-based JPEG Pleno anchor scheme. Moreover, an optimized motionsearch scheme is proposed in the framework that reduces the computational complexity (in terms of the sum of absolute difference [SAD] computations) of motion estimation by up to 87% with a negligible loss in visual quality (approximately 0.05 dB).

    Download full text (pdf)
    fulltext
  • 4.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Olsson, Roger
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Interpreting Plenoptic Images as Multi-View Sequences for Improved Compression2017Data set
    Abstract [en]

    The paper is written in the response to ICIP 2017, Grand challenge on plenoptic image compression. The input image format and compression rates set out by the competition are followed to estimate the results.

    Download full text (pdf)
    data set
  • 5.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Olsson, Roger
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Interpreting Plenoptic Images as Multi-View Sequences for Improved Compression2017In: ICIP 2017, IEEE, 2017, p. 4557-4561Conference paper (Refereed)
    Abstract [en]

    Over the last decade, advancements in optical devices have made it possible for new novel image acquisition technologies to appear. Angular information for each spatial point is acquired in addition to the spatial information of the scene that enables 3D scene reconstruction and various post-processing effects. Current generation of plenoptic cameras spatially multiplex the angular information, which implies an increase in image resolution to retain the level of spatial information gathered by conventional cameras. In this work, the resulting plenoptic image is interpreted as a multi-view sequence that is efficiently compressed using the multi-view extension of high efficiency video coding (MV-HEVC). A novel two dimensional weighted prediction and rate allocation scheme is proposed to adopt the HEVC compression structure to the plenoptic image properties. The proposed coding approach is a response to ICIP 2017 Grand Challenge: Light field Image Coding. The proposed scheme outperforms all ICME contestants, and improves on the JPEG-anchor of ICME with an average PSNR gain of 7.5 dB and the HEVC-anchor of ICIP 2017 Grand Challenge with an average PSNR gain of 2.4 dB.

    Download full text (pdf)
    fulltext
  • 6.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Olsson, Roger
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Towards a generic compression solution for densely and sparsely sampled light field data2018In: Proceedings of 25TH IEEE International Conference On Image Processing, 2018, p. 654-658, article id 8451051Conference paper (Refereed)
    Abstract [en]

    Light field (LF) acquisition technologies capture the spatial and angular information present in scenes. The angular information paves the way for various post-processing applications such as scene reconstruction, refocusing, and synthetic aperture. The light field is usually captured by a single plenoptic camera or by multiple traditional cameras. The former captures a dense LF, while the latter captures a sparse LF. This paper presents a generic compression scheme that efficiently compresses both densely and sparsely sampled LFs. A plenoptic image is converted into sub-aperture images, and each sub-aperture image is interpreted as a frame of a multiview sequence. In comparison, each view of the multi-camera system is treated as a frame of a multi-view sequence. The multi-view extension of high efficiency video coding (MVHEVC) is used to encode the pseudo multi-view sequence.This paper proposes an adaptive prediction and rate allocation scheme that efficiently compresses LF data irrespective of the acquisition technology used.

    Download full text (pdf)
    fulltext
  • 7.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Palmieri, Luca
    Christian-Albrechts-Universität, Kiel, Germany.
    Koch, Reinhard
    Christian-Albrechts-Universität, Kiel, Germany.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Matching Light Field Datasets From Plenoptic Cameras 1.0 And 2.02018In: Proceedings of the 2018 3DTV Conference, 2018, article id 8478611Conference paper (Refereed)
    Abstract [en]

    The capturing of angular and spatial information of the scene using single camera is made possible by new emerging technology referred to as plenoptic camera. Both angular and spatial information, enable various post-processing applications, e.g. refocusing, synthetic aperture, super-resolution, and 3D scene reconstruction. In the past, multiple traditional cameras were used to capture the angular and spatial information of the scene. However, recently with the advancement in optical technology, plenoptic cameras have been introduced to capture the scene information. In a plenoptic camera, a lenslet array is placed between the main lens and the image sensor that allows multiplexing of the spatial and angular information onto a single image, also referred to as plenoptic image. The placement of the lenslet array relative to the main lens and the image sensor, results in two different optical design sof a plenoptic camera, also referred to as plenoptic 1.0 and plenoptic 2.0. In this work, we present a novel dataset captured with plenoptic 1.0 (Lytro Illum) and plenoptic 2.0(Raytrix R29) cameras for the same scenes under the same conditions. The dataset provides the benchmark contents for various research and development activities for plenoptic images.

    Download full text (pdf)
    fulltext
  • 8.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Palmieri, Luca
    University of Padova, Italy.
    Koch, Reinhard
    Christian-Albrechts-University of Kiel, Germany.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    The Plenoptic Dataset2018Data set
    Abstract [en]

    The dataset is captured using two different plenoptic cameras, namely Illum from Lytro (based on plenoptic 1.0 model) and R29 from Raytrix (based on plenoptic 2.0 model). The scenes selected for the dataset were captured under controlled conditions. The cameras were mounted onto a multi-camera rig that was mechanically controlled to move the cameras with millimeter precision. In this way, both cameras captured the scene from the same viewpoint.

  • 9.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Olsson, Roger
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Compression scheme for sparsely sampled light field data based on pseudo multi-view sequences2018In: OPTICS, PHOTONICS, AND DIGITAL TECHNOLOGIES FOR IMAGING APPLICATIONS V Proceedings of SPIE - The International Society for Optical Engineering, SPIE - International Society for Optical Engineering, 2018, Vol. 10679, article id 106790MConference paper (Refereed)
    Abstract [en]

    With the advent of light field acquisition technologies, the captured information of the scene is enriched by having both angular and spatial information. The captured information provides additional capabilities in the post processing stage, e.g. refocusing, 3D scene reconstruction, synthetic aperture etc. Light field capturing devices are classified in two categories. In the first category, a single plenoptic camera is used to capture a densely sampled light field, and in second category, multiple traditional cameras are used to capture a sparsely sampled light field. In both cases, the size of captured data increases with the additional angular information. The recent call for proposal related to compression of light field data by JPEG, also called “JPEG Pleno”, reflects the need of a new and efficient light field compression solution. In this paper, we propose a compression solution for sparsely sampled light field data. In a multi-camera system, each view depicts the scene from a single perspective. We propose to interpret each single view as a frame of pseudo video sequence. In this way, complete MxN views of multi-camera system are treated as M pseudo video sequences, where each pseudo video sequence contains N frames. The central pseudo video sequence is taken as base View and first frame in all the pseudo video sequences is taken as base Picture Order Count (POC). The frame contained in base view and base POC is labeled as base frame. The remaining frames are divided into three predictor levels. Frames placed in each successive level can take prediction from previously encoded frames. However, the frames assigned with last prediction level are not used for prediction of other frames. Moreover, the rate-allocation for each frame is performed by taking into account its predictor level, its frame distance and view wise decoding distance relative to the base frame. The multi-view extension of high efficiency video coding (MV-HEVC) is used to compress the pseudo multi-view sequences. The MV-HEVC compression standard enables the frames to take prediction in both direction (horizontal and vertical d), and MV-HEVC parameters are used to implement the proposed 2D prediction and rate allocation scheme. A subset of four light field images from Stanford dataset are compressed, using the proposed compression scheme on four bitrates in order to cover the low to high bit-rates scenarios. The comparison is made with state-of-art reference encoder HEVC and its real-time implementation X265. The 17x17 grid is converted into a single pseudo sequence of 289 frames by following the order explained in JPEG Pleno call for proposal and given as input to the both reference schemes. The rate distortion analysis shows that the proposed compression scheme outperforms both reference schemes in all tested bitrate scenarios for all test images. The average BD-PSNR gain is 1.36 dB over HEVC and 2.15 dB over X265.

    Download full text (pdf)
    fulltext
  • 10.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Vagharshakyan, Suren
    Tampere University of Technology, Finland.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Gotchev, Atanas
    Tampere University of Technology, Finland.
    Bregovic, Robert
    Tampere University of Technology, Finland.
    Olsson, Roger
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Shearlet Transform Based Prediction Scheme for Light Field Compression2018Conference paper (Refereed)
    Abstract [en]

    Light field acquisition technologies capture angular and spatial information ofthe scene. The spatial and angular information enables various post processingapplications, e.g. 3D scene reconstruction, refocusing, synthetic aperture etc at theexpense of an increased data size. In this paper, we present a novel prediction tool forcompression of light field data acquired with multiple camera system. The captured lightfield (LF) can be described using two plane parametrization as, L(u, v, s, t), where (u, v)represents each view image plane coordinates and (s, t) represents the coordinates of thecapturing plane. In the proposed scheme, the captured LF is uniformly decimated by afactor d in both directions (in s and t coordinates), resulting in a sparse set of views alsoreferred to as key views. The key views are converted into a pseudo video sequence andcompressed using high efficiency video coding (HEVC). The shearlet transform basedreconstruction approach, presented in [1], is used at the decoder side to predict thedecimated views with the help of the key views.Four LF images (Truck, Bunny from Stanford dataset, Set2 and Set9 from High DensityCamera Array dataset) are used in the experiments. Input LF views are converted into apseudo video sequence and compressed with HEVC to serve as anchor. Rate distortionanalysis shows the average PSNR gain of 0.98 dB over the anchor scheme. Moreover, inlow bit-rates, the compression efficiency of the proposed scheme is higher compared tothe anchor and on the other hand the performance of the anchor is better in high bit-rates.Different compression response of the proposed and anchor scheme is a consequence oftheir utilization of input information. In the high bit-rate scenario, high quality residualinformation enables the anchor to achieve efficient compression. On the contrary, theshearlet transform relies on key views to predict the decimated views withoutincorporating residual information. Hence, it has inherit reconstruction error. In the lowbit-rate scenario, the bit budget of the proposed compression scheme allows the encoderto achieve high quality for the key views. The HEVC anchor scheme distributes the samebit budget among all the input LF views that results in degradation of the overall visualquality. The sensitivity of human vision system toward compression artifacts in low-bitratecases favours the proposed compression scheme over the anchor scheme.

    Download full text (pdf)
    fulltext
    Download full text (pdf)
    fulltext
  • 11.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Vagharshakyan, Suren
    Tampere Univ Technol, Korkeakoulunkatu 10, Tampere 33720, Finland..
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Gotchev, Atanas
    Tampere Univ Technol, Korkeakoulunkatu 10, Tampere 33720, Finland..
    Bregovic, Robert
    Tampere Univ Technol, Korkeakoulunkatu 10, Tampere 33720, Finland..
    Olsson, Roger
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Shearlet Transform Based Prediction Scheme for Light Field Compression2018In: 2018 DATA COMPRESSION CONFERENCE (DCC 2018) / [ed] Bilgin, A Marcellin, MW SerraSagrista, J Storer, JA, IEEE, 2018, p. 396-396Conference paper (Refereed)
  • 12.
    Ahmad, Waqas
    et al.
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Vagharshakyan, Suren
    Tampere University, Tampere, Finland.
    Sjöström, Mårten
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Gotchev, Atanas
    Tampere University, Tampere, Finland.
    Bregovic, Robert
    Tampere University, Tampere, Finland.
    Olsson, Roger
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Shearlet Transform-Based Light Field Compression under Low Bitrates2020In: IEEE Transactions on Image Processing, ISSN 1057-7149, E-ISSN 1941-0042, Vol. 29, p. 4269-4280, article id 8974608Article in journal (Refereed)
    Abstract [en]

    Light field (LF) acquisition devices capture spatial and angular information of a scene. In contrast with traditional cameras, the additional angular information enables novel post-processing applications, such as 3D scene reconstruction, the ability to refocus at different depth planes, and synthetic aperture. In this paper, we present a novel compression scheme for LF data captured using multiple traditional cameras. The input LF views were divided into two groups: key views and decimated views. The key views were compressed using the multi-view extension of high-efficiency video coding (MV-HEVC) scheme, and decimated views were predicted using the shearlet-transform-based prediction (STBP) scheme. Additionally, the residual information of predicted views was also encoded and sent along with the coded stream of key views. The proposed scheme was evaluated over a benchmark multi-camera based LF datasets, demonstrating that incorporating the residual information into the compression scheme increased the overall peak signal to noise ratio (PSNR) by 2 dB. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. The proposed compression scheme performed significantly better at low bit rates compared to anchor schemes, which have a better level of compression efficiency in high bit-rate scenarios. The sensitivity of the human vision system towards compression artifacts, specifically at low bit rates, favors the proposed compression scheme over anchor schemes. 

    Download full text (pdf)
    fulltext
  • 13.
    Ghafoor, Mubeen
    et al.
    COMSATS Inst Informat Technol, Islamabad, Pakistan.
    Tariq, Syed Ali
    COMSATS Inst Informat Technol, Islamabad, Pakistan.
    Abu Bakr, M.
    COMSATS Inst Informat Technol, Islamabad, Pakistan.
    Jibran, J.
    COMSATS Inst Informat Technol, Islamabad, Pakistan.
    Ahmad, Waqas
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    Zia, Tehseen
    COMSATS Inst Informat Technol, Islamabad, Pakistan.
    Perceptually Lossless Surgical Telementoring System Based on Non-Parametric Segmentation2019In: Journal of Medical Imaging and Health Informatics, ISSN 2156-7018, E-ISSN 2156-7026, Vol. 9, no 3, p. 464-473Article in journal (Refereed)
    Abstract [en]

    Bandwidth constraint is one of the significant concerns of surgical telementoring, especially in rural areas. High-Efficiency Video Coding (H.265/HEVC) based video compression techniques have shown promising results for telementoring applications. However, there is a tradeoff between the quality of video received by the remote surgeon and the bandwidth resources required for video transmission. In order to efficiently compress and transmit real-time surgical videos, a hybrid lossless-lossy approach is proposed where surgical incision region (location of surgery) is coded in high quality while the background (non-incision) region is coded in medium to low quality depending on the nature of the region. The surgical incision region is detected based on an efficient color and location-based non-parametric segmentation approach. This approach takes explicitly into account the physiological nature of the human visual system and efficiently encodes the video by providing good overall visual impact in the location of surgery. The results of the proposed approach are shown in terms of video quality metrics such as Bjontegaard delta bitrate (BD-BR), Bjontegaard delta peak signal-to-noise ratio (BD-PSNR), and structural similarity index measurement (SSIM). Experimental results showed that in comparison with default full-frame HEVC encoding, the proposed surgical incision region based encoding achieved an average BD-BR reduction of 77.5% at high-quality settings (QP in range of 0 to 20 in surgical incision region and an increasing QP in skin and background region). The average gain in BD-PSNR of the proposed algorithm was 6.99 dB in surgical incision region at high-quality setting, and the average SSIM index came out to be 0.9926 which is only 0.006% less than the default full-frame HEVC coding. Based on these results, the proposed encoding algorithm can be considered as an efficient and effective solution for surgical telementoring systems for limited bandwidth networks.

  • 14.
    Hassan, A.
    et al.
    COMSATS University, Islamabad, Pakistan.
    Ghafoor, M.
    COMSATS University, Islamabad, Pakistan.
    Tariq, S.A.
    COMSATS University, Islamabad, Pakistan.
    Zia, T.
    COMSATS University, Islamabad, Pakistan.
    Ahmad, Waqas
    Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
    High Efficiency Video Coding (HEVC)–Based Surgical Telementoring System Using Shallow Convolutional Neural Network2019In: Journal of digital imaging, ISSN 0897-1889, E-ISSN 1618-727X, Vol. 32, no 6, p. 1027-1043Article in journal (Refereed)
    Abstract [en]

    Surgical telementoring systems have gained lots of interest, especially in remote locations. However, bandwidth constraint has been the primary bottleneck for efficient telementoring systems. This study aims to establish an efficient surgical telementoring system, where the qualified surgeon (mentor) provides real-time guidance and technical assistance for surgical procedures to the on-spot physician (surgeon). High Efficiency Video Coding (HEVC/H.265)–based video compression has shown promising results for telementoring applications. However, there is a trade-off between the bandwidth resources required for video transmission and quality of video received by the remote surgeon. In order to efficiently compress and transmit real-time surgical videos, a hybrid lossless-lossy approach is proposed where surgical incision region is coded in high quality whereas the background region is coded in low quality based on distance from the surgical incision region. For surgical incision region extraction, state-of-the-art deep learning (DL) architectures for semantic segmentation can be used. However, the computational complexity of these architectures is high resulting in large training and inference times. For telementoring systems, encoding time is crucial; therefore, very deep architectures are not suitable for surgical incision extraction. In this study, we propose a shallow convolutional neural network (S-CNN)–based segmentation approach that consists of encoder network only for surgical region extraction. The segmentation performance of S-CNN is compared with one of the state-of-the-art image segmentation networks (SegNet), and results demonstrate the effectiveness of the proposed network. The proposed telementoring system is efficient and explicitly considers the physiological nature of the human visual system to encode the video by providing good overall visual impact in the location of surgery. The results of the proposed S-CNN-based segmentation demonstrated a pixel accuracy of 97% and a mean intersection over union accuracy of 79%. Similarly, HEVC experimental results showed that the proposed surgical region–based encoding scheme achieved an average bitrate reduction of 88.8% at high-quality settings in comparison with default full-frame HEVC encoding. The average gain in encoding performance (signal-to-noise) of the proposed algorithm is 11.5 dB in the surgical region. The bitrate saving and visual quality of the proposed optimal bit allocation scheme are compared with the mean shift segmentation–based coding scheme for fair comparison. The results show that the proposed scheme maintains high visual quality in surgical incision region along with achieving good bitrate saving. Based on comparison and results, the proposed encoding algorithm can be considered as an efficient and effective solution for surgical telementoring systems for low-bandwidth networks.

    Download full text (pdf)
    fulltext
1 - 14 of 14
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf