Mid Sweden University

miun.sePublications
Change search
Link to record
Permanent link

Direct link
Sjöström, Mårten
Publications (10 of 146) Show all publications
Takhtardeshir, S., Olsson, R., Guillemot, C. & Sjöström, M. (2024). A Deep Learning based Light Field Image Compression as Pseudo Video Sequences with Additional in-loop Filtering. In: 3D Imaging and Applications 2024-Electronic Imaging: . Paper presented at 3D Imaging and Applications 2024 (pp. 1-6). San Francisco Airport in Burlingame, California: Society for Imaging Science & Technology
Open this publication in new window or tab >>A Deep Learning based Light Field Image Compression as Pseudo Video Sequences with Additional in-loop Filtering
2024 (English)In: 3D Imaging and Applications 2024-Electronic Imaging, San Francisco Airport in Burlingame, California: Society for Imaging Science & Technology , 2024, p. 1-6Conference paper, Published paper (Refereed)
Abstract [en]

In recent years, several deep learning-based architectures have been proposed to compress Light Field (LF) images as pseudo video sequences. However, most of these techniques employ conventional compression-focused networks. In this paper, we introduce a version of a previously designed deep learning video compression network, adapted and optimized specifically for LF image compression. We enhance this network by incorporating an in-loop filtering block, along with additional adjustments and fine-tuning. By treating LF images as pseudo video sequences and deploying our adapted network, we manage to address challenges presented by the unique features of LF images, such as high resolution and large data sizes. Our method compresses these images competently, preserving their quality and unique characteristics. With the thorough fine-tuning and inclusion of the in-loop filtering network, our approach shows improved performance in terms of Peak Signal-to-Noise Ratio (PSNR) and Mean Structural Similarity Index Measure (MSSIM) when compared to other existing techniques. Our method provides a feasible path for LF image compression and may contribute to the emergence of new applications and advancements in this field.

Place, publisher, year, edition, pages
San Francisco Airport in Burlingame, California: Society for Imaging Science & Technology, 2024
Keywords
Compression, Deep Learning, Light Field Coding, Pseudo Video Sequence
National Category
Computer Engineering
Identifiers
urn:nbn:se:miun:diva-50480 (URN)10.2352/EI.2024.36.18.3DIA-103 (DOI)
Conference
3D Imaging and Applications 2024
Projects
Plenoptima
Available from: 2024-02-07 Created: 2024-02-07 Last updated: 2024-02-26Bibliographically approved
Jiang, M., Nnonyelu, C. J., Lundgren, J., Thungström, G. & Sjöström, M. (2023). A Coherent Wideband Acoustic Source Localization Using a Uniform Circular Array. Sensors, 23(11), Article ID 5061.
Open this publication in new window or tab >>A Coherent Wideband Acoustic Source Localization Using a Uniform Circular Array
Show others...
2023 (English)In: Sensors, E-ISSN 1424-8220, Vol. 23, no 11, article id 5061Article in journal (Refereed) Published
Abstract [en]

In modern applications such as robotics, autonomous vehicles, and speaker localization, the computational power for sound source localization applications can be limited when other functionalities get more complex. In such application fields, there is a need to maintain high localization accuracy for several sound sources while reducing computational complexity. The array manifold interpolation (AMI) method applied with the Multiple Signal Classification (MUSIC) algorithm enables sound source localization of multiple sources with high accuracy. However, the computational complexity has so far been relatively high. This paper presents a modified AMI for uniform circular array (UCA) that offers reduced computational complexity compared to the original AMI. The complexity reduction is based on the proposed UCA-specific focusing matrix which eliminates the calculation of the Bessel function. The simulation comparison is done with the existing methods of iMUSIC, the Weighted Squared Test of Orthogonality of Projected Subspaces (WS-TOPS), and the original AMI. The experiment result under different scenarios shows that the proposed algorithm outperforms the original AMI method in terms of estimation accuracy and up to a 30% reduction in computation time. An advantage offered by this proposed method is the ability to implement wideband array processing on low-end microprocessors.

Place, publisher, year, edition, pages
MDPI, 2023
Keywords
array manifold interpolation, direction of arrival estimation, wideband sources
National Category
Signal Processing
Identifiers
urn:nbn:se:miun:diva-48473 (URN)10.3390/s23115061 (DOI)001005309700001 ()37299788 (PubMedID)2-s2.0-85161608613 (Scopus ID)
Available from: 2023-06-12 Created: 2023-06-12 Last updated: 2023-06-30Bibliographically approved
Rafiei, S., Singhal, C., Brunnström, K. & Sjöström, M. (2023). Human Interaction in Industrial Tele-Operated Driving: Laboratory Investigation. In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX): . Paper presented at 2023 15th International Conference on Quality of Multimedia Experience, QoMEX 2023 (pp. 91-94). IEEE conference proceedings
Open this publication in new window or tab >>Human Interaction in Industrial Tele-Operated Driving: Laboratory Investigation
2023 (English)In: 2023 15th International Conference on Quality of Multimedia Experience (QoMEX), IEEE conference proceedings, 2023, p. 91-94Conference paper, Published paper (Refereed)
Abstract [en]

Tele-operated driving enables industrial operators to control heavy machinery remotely. By doing so, they could work in improved and safe workplaces. However, some challenges need to be investigated while presenting visual information from on-site scenes for operators sitting at a distance in a remote site. This paper discusses the impact of video quality (spatial resolution), field of view, and latency on users' depth perception, experience, and performance in a lab-based tele-operated application. We performed user experience evaluation experiments to study these impacts. Overall, the user experience and comfort decrease while the users' performance error increases with an increase in the glass-to-glass latency. The user comfort reduces, and the user performance error increases with reduced video quality (spatial resolution). 

Place, publisher, year, edition, pages
IEEE conference proceedings, 2023
Keywords
depth estimation, field of view, Industrial Tele-operation, latency, User and Quality of experience, video quality
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:miun:diva-49146 (URN)10.1109/QoMEX58391.2023.10178441 (DOI)001037196100017 ()2-s2.0-85167344580 (Scopus ID)9798350311730 (ISBN)
Conference
2023 15th International Conference on Quality of Multimedia Experience, QoMEX 2023
Available from: 2023-08-22 Created: 2023-08-22 Last updated: 2023-09-18Bibliographically approved
Gond, M., Zerman, E., Knorr, S. & Sjöström, M. (2023). LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image. In: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production: . Paper presented at European Conference on Visual Media Production (CVMP) (pp. 1-10). New York, NY, United States: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>LFSphereNet: Real Time Spherical Light Field Reconstruction from a Single Omnidirectional Image
2023 (English)In: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production, New York, NY, United States: Association for Computing Machinery (ACM), 2023, p. 1-10Conference paper, Published paper (Refereed)
Abstract [en]

Recent developments in immersive imaging technologies have enabled improved telepresence applications. Being fully matured in the commercial sense, omnidirectional (360-degree) content provides full vision around the camera with three degrees of freedom (3DoF). Considering the applications in real-time immersive telepresence, this paper investigates how a single omnidirectional image (ODI) can be used to extend 3DoF to 6DoF. To achieve this, we propose a fully learning-based method for spherical light field reconstruction from a single omnidirectional image. The proposed LFSphereNet utilizes two different networks: The first network learns to reconstruct the light field in cubemap projection (CMP) format given the six cube faces of an omnidirectional image and the corresponding cube face positions as input. The cubemap format implies a linear re-projection, which is more appropriate for a neural network. The second network refines the reconstructed cubemaps in equirectangular projection (ERP) format by removing cubemap border artifacts. The network learns the geometric features implicitly for both translation and zooming when an appropriate cost function is employed. Furthermore, it runs with very low inference time, which enables real-time applications. We demonstrate that LFSphereNet outperforms state-of-the-art approaches in terms of quality and speed when tested on different synthetic and real world scenes. The proposed method represents a significant step towards achieving real-time immersive remote telepresence experiences.

Place, publisher, year, edition, pages
New York, NY, United States: Association for Computing Machinery (ACM), 2023
Keywords
Computer Graphics, Neural Networks, Light Field, Omnidirectional Images, 360 Image, Spherical Light Field
National Category
Computer Engineering
Identifiers
urn:nbn:se:miun:diva-50066 (URN)10.1145/3626495.3626500 (DOI)001122588100003 ()2-s2.0-85180123650 (Scopus ID)979-8-4007-0426-0 (ISBN)
Conference
European Conference on Visual Media Production (CVMP)
Projects
Plenoptima
Available from: 2023-12-08 Created: 2023-12-08 Last updated: 2024-01-19Bibliographically approved
Willingham, S., Sjöström, M. & Guillemot, C. (2023). Prior for Multi-Task Inverse Problems in Image Reconstruction Using Deep Equilibrium Models. In: 2023 31st European Signal Processing Conference (EUSIPCO): . Paper presented at European Signal Processing Conference. IEEE conference proceedings
Open this publication in new window or tab >>Prior for Multi-Task Inverse Problems in Image Reconstruction Using Deep Equilibrium Models
2023 (English)In: 2023 31st European Signal Processing Conference (EUSIPCO), IEEE conference proceedings, 2023Conference paper, Published paper (Refereed)
Abstract [en]

Inverse problems in imaging consider the reconstruction of clean images from degraded observations, like deblurring or inpainting.These inverse problems are generally ill-posed. Solving them therefore requires regularization, which exists in multiple approaches: plug-and-play (pnp) methods are designed to generically solve any inverse problem by replacing a regularizing proximal operator with a denoiser. Unrolled methods perform a fixed number of iterations and train a network end-to-end for a specific degradation, but necessitate re-training for each specific degradation. Deep equilibrium models (DEQs), on the other hand, iterate an unrolled method until convergence and thereby enable end-to-end training on the reconstruction error with simplified back-propagation. We have investigated to what extent a solution for several inverse problems can be found by employing a multi-task DEQ (MTDEQ). This MTDEQ is used to train a prior on the actual estimation error, in contrast to a theoretical noise model used for pnp methods. This has the advantage that the resulting prior is trained for a range of degradations beyond pure Gaussian denoising. The investigation also demonstrates that different search methods can be used in training (forward-backward) and in testing (alternating direction method of multipliers).

Place, publisher, year, edition, pages
IEEE conference proceedings, 2023
Keywords
Inverse Problems, Computer Vision, Image Reconstruction, Deep Equilibrium Models
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:miun:diva-48905 (URN)10.23919/EUSIPCO58844.2023.10289950 (DOI)2-s2.0-85178353498 (Scopus ID)978-9-4645-9360-0 (ISBN)
Conference
European Signal Processing Conference
Projects
PLENOPTIMA
Available from: 2023-11-30 Created: 2023-11-30 Last updated: 2023-12-13Bibliographically approved
Li, Y.-H. -., Sjöström, M., Knorr, S. & Sikora, T. (2023). Segmentation-based Initialization for Steered Mixture of Experts. In: 2023 IEEE International Conference on Visual Communications and Image Processing (VCIP): . Paper presented at 2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023. IEEE conference proceedings
Open this publication in new window or tab >>Segmentation-based Initialization for Steered Mixture of Experts
2023 (English)In: 2023 IEEE International Conference on Visual Communications and Image Processing (VCIP), IEEE conference proceedings, 2023Conference paper, Published paper (Refereed)
Abstract [en]

The Steered-Mixture-of-Experts (SMoE) model is an edge-Aware kernel representation that has successfully been explored for the compression of images, video, and higher-dimensional data such as light fields. The present work aims to leverage the potential for enhanced compression gains through efficient kernel reduction. We propose a fast segmentation-based strategy to identify a sufficient number of kernels for representing an image and giving initial kernel parametrization. The strategy implies both reduced memory footprint and reduced computational complexity for the subsequent parameter optimization, resulting in an overall faster processing time. Fewer kernels, when combined with the inherent sparsity of the SMoEs, further enhance the overall compression performance. Empirical evaluations demonstrate a gain of 0.3-1.0 dB in PSNR for a constant number of kernels, and the use of 23 % less kernels and 25 % less time for constant PSNR. The results highlight the feasibility and practicality of the approach, positioning it as a valuable solution for various image-related applications, including image compression. 

Place, publisher, year, edition, pages
IEEE conference proceedings, 2023
Keywords
compression, gating network, segmentation, Computer vision, Image segmentation, Compression of images, Edge aware, High dimensional data, Kernel representation, Light fields, Mixture of experts, Mixture-of-experts model, Image compression
National Category
Computer Engineering
Identifiers
urn:nbn:se:miun:diva-50594 (URN)10.1109/VCIP59821.2023.10402643 (DOI)2-s2.0-85184853593 (Scopus ID)9798350359855 (ISBN)
Conference
2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
Available from: 2024-02-20 Created: 2024-02-20 Last updated: 2024-02-20Bibliographically approved
Gao, S., Qu, G., Sjöström, M. & Liu, Y. (2022). A TV regularisation sparse light field reconstruction model based on guided-filtering. Signal processing. Image communication, 109, Article ID 116852.
Open this publication in new window or tab >>A TV regularisation sparse light field reconstruction model based on guided-filtering
2022 (English)In: Signal processing. Image communication, ISSN 0923-5965, E-ISSN 1879-2677, Vol. 109, article id 116852Article in journal (Refereed) Published
Abstract [en]

Obtaining and representing the 4D light field is important for a number of computer vision applications. Due to the high dimensionality, acquiring the light field directly is costly. One way to overcome this deficiency is to reconstruct the light field from a limited number of measurements. Existing approaches involve either a depth estimation process or require a large number of measurements to obtain high-quality reconstructed results. In this paper, we propose a total variation (TV) regularisation sparse model with the alternating direction method of multipliers (ADMM) based on guided filtering, which addresses this depth-dependence problem with only a few measurements. As one of the sparse optimisation methods, TV regularisation based on ADMM is well suited to solve ill-posed problems such as this. Moreover, guided filtering has good edge-preserving smoothing properties, which can be incorporated into the light field reconstruction process. Therefore, high precision light field reconstruction is established with our model. Specifically, the updated image in the iteration step contains the guidance image, and an initialiser for the least squares method using a QR factorisation (LSQR) algorithm is involved in one of the subproblems. The model outperforms other methods in both visual assessments and objective metrics – in simulation experiments from synthetic data and photographic data using produced focal stacks from light field contents – and it works well in experiments using captured focal stacks. We also show a further application for arbitrary refocusing by using the reconstructed light field.

Keywords
Light field reconstruction, Total variation regularisation, guided filter
National Category
Signal Processing Media and Communication Technology
Identifiers
urn:nbn:se:miun:diva-45834 (URN)10.1016/j.image.2022.116852 (DOI)000860683700002 ()2-s2.0-85137158712 (Scopus ID)
Available from: 2022-08-17 Created: 2022-08-17 Last updated: 2022-10-06Bibliographically approved
Rafiei, S., Dima, E., Sjöström, M. & Brunnström, K. (2022). Augmented Remote Operating System for Scaling in smart mining applications: Quality of Experience aspects. In: Damon Chandler, Mark McCourt, Jeffrey Mulligan (Ed.), Proceedings of Human Vision and Electronic Imaging 2022.: . Paper presented at Human Vision and Electronic Imaging 2022. [DIGITAL], January 17-26, 2022.. , Article ID HVEI-166.
Open this publication in new window or tab >>Augmented Remote Operating System for Scaling in smart mining applications: Quality of Experience aspects
2022 (English)In: Proceedings of Human Vision and Electronic Imaging 2022. / [ed] Damon Chandler, Mark McCourt, Jeffrey Mulligan, 2022, article id HVEI-166Conference paper, Published paper (Refereed)
Abstract [en]

Remote operation and Augmented Telepresence are fields of interest for novel industrial applications in e.g., construction and mining. In this study, we report on an ongoing investigation of the Quality of Experience aspects of an Augmented Telepresence system for remote operation. The system can achieve view augmentation with selective content removal and Novel Perspective view generation. Two formal subjective studies have been performed with test participants scoring their experience while using the system with different levels of view augmentation. The participants also gave free-form feedback on the system and their experiences. The first experiment focused on the effects of in-view augmentations and interface distributions on wall patterns perception. The second one focused on the effects of augmentations on depth and 3D environment understanding. The participants’ feedback from experiment 1 showed that the majority of participants preferred to use the original camera views and the Disocclusion Augmentation view instead of Novel Perspective views. Moreover, the Disocclusion Augmentation, that was shown in combination with other views seemed beneficial. When the views were isolated in experiment 2, the impact of the Disocclusion Augmentation view was found to be lower than the Novel Perspective views.

Keywords
Quality of Experience, Augmented Telepresence, Remote operation, Mining, Disocclusion, Novel perspective views.
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:miun:diva-44357 (URN)10.2352/EI.2022.34.11.HVEI-166 (DOI)2-s2.0-85132420611 (Scopus ID)
Conference
Human Vision and Electronic Imaging 2022. [DIGITAL], January 17-26, 2022.
Funder
Swedish Foundation for Strategic Research, FID18-0030
Available from: 2022-02-21 Created: 2022-02-21 Last updated: 2022-08-01Bibliographically approved
Karbalaie, A., Abtahi, F. & Sjöström, M. (2022). Event detection in surveillance videos: a review. Multimedia tools and applications, 81(24), 35463-35501
Open this publication in new window or tab >>Event detection in surveillance videos: a review
2022 (English)In: Multimedia tools and applications, ISSN 1380-7501, E-ISSN 1573-7721, Vol. 81, no 24, p. 35463-35501Article in journal (Refereed) Published
Abstract [en]

Since 2008, a variety of systems have been designed to detect events in security cameras. There are also more than a hundred journal articles and conference papers published in this field. However, no survey has focused on recognizing events in the surveillance system. Thus, motivated us to provide a comprehensive review of the different developed event detection systems. We start our discussion with the pioneering methods that used the TRECVid-SED dataset and then developed methods using VIRAT dataset in TRECVid evaluation. To better understand the designed systems, we describe the components of each method and the modifications of the existing method separately. We have outlined the significant challenges related to untrimmed security video action detection. Suitable metrics are also presented for assessing the performance of the proposed models. Our study indicated that the majority of researchers classified events into two groups on the basis of the number of participants and the duration of the event for the TRECVid-SED Dataset. Depending on the group of events, one or more models to identify all the events were used. For the VIRAT dataset, object detection models to localize the first stage activities were used throughout the work. Except one study, a 3D convolutional neural network (3D-CNN) to extract Spatio-temporal features or classifying different activities were used. From the review that has been carried, it is possible to conclude that developing an automatic surveillance event detection system requires three factors: accurate and fast object detection in the first stage to localize the activities, and classification model to draw some conclusion from the input values.

Keywords
Event detection, Surveillance videos system, Action and activity recognition
National Category
Signal Processing Computer Sciences
Identifiers
urn:nbn:se:miun:diva-45384 (URN)10.1007/s11042-021-11864-2 (DOI)000815440600001 ()2-s2.0-85136167981 (Scopus ID)
Funder
Mid Sweden University
Available from: 2022-06-27 Created: 2022-06-27 Last updated: 2022-11-02Bibliographically approved
Hassan, A., Sjöström, M., Zhang, T. & Egiazarian, K. (2022). Light-Weight EPINET Architecture for Fast Light Field Disparity Estimation. In: Light-Weight EPINET Architecture for Fast Light Field Disparity Estimation: 26-28 Sept. 2022, Shanghai, China. Paper presented at IEEE 24th International Workshop on Multimedia Signal Processing (MMSP) 2022 (pp. 1-5). Shanghai, China: IEEE Signal Processing Society
Open this publication in new window or tab >>Light-Weight EPINET Architecture for Fast Light Field Disparity Estimation
2022 (English)In: Light-Weight EPINET Architecture for Fast Light Field Disparity Estimation: 26-28 Sept. 2022, Shanghai, China, Shanghai, China: IEEE Signal Processing Society, 2022, p. 1-5Conference paper, Published paper (Refereed)
Abstract [en]

Recent deep learning-based light field disparity estimation algorithms require millions of parameters, which demand high computational cost and limit the model deployment. In this paper, an investigation is carried out to analyze the effect of depthwise separable convolution and ghost modules on state-of-the-art EPINET architecture for disparity estimation. Based on this investigation, four convolutional blocks are proposed to make the EPINET architecture a fast and light-weight network for disparity estimation. The experimental results exhibit that the proposed convolutional blocks have significantly reduced the computational cost of EPINET architecture by up to a factor of 3.89, while achieving comparable disparity maps on HCI Benchmark dataset.

Place, publisher, year, edition, pages
Shanghai, China: IEEE Signal Processing Society, 2022
Keywords
Light Field, Deep Learning, Disparity Estimation, Compression, Depthwise Separable Convolution
National Category
Computer Sciences
Identifiers
urn:nbn:se:miun:diva-46514 (URN)10.1109/MMSP55362.2022.9949378 (DOI)000893205800116 ()2-s2.0-85143584260 (Scopus ID)978-1-6654-7189-3 (ISBN)
Conference
IEEE 24th International Workshop on Multimedia Signal Processing (MMSP) 2022
Projects
PLENOPTIMA
Available from: 2022-11-25 Created: 2022-11-25 Last updated: 2023-05-17Bibliographically approved
Organisations

Search in DiVA

Show all publications