Mid Sweden University

miun.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 168) Show all publications
Takhtardeshir, S., Olsson, R., Guillemot, C. & Sjöström, M. (2026). DUALF-D: Disentangled dual-hyperprior approach for light field image compression. Signal processing. Image communication, 140, Article ID 117436.
Open this publication in new window or tab >>DUALF-D: Disentangled dual-hyperprior approach for light field image compression
2026 (English)In: Signal processing. Image communication, ISSN 0923-5965, E-ISSN 1879-2677, Vol. 140, article id 117436Article in journal (Refereed) Published
Abstract [en]

Light field (LF) imaging captures spatial and angular information, offering a 4D scene representation enabling enhanced visual understanding. However, high dimensionality and redundancy across spatial and angular domains present major challenges for compression, particularly where storage, transmission bandwidth, or processing latency are constrained. We present a novel Variational Autoencoder (VAE)-based framework that explicitly disentangles spatial and angular features using two parallel latent branches. Each branch is coupled with an independent hyperprior model, allowing more precise distribution estimation for entropy coding and finer rate-distortion control. This dual-hyperprior structure enables the network to adaptively compress spatial and angular information based on their unique statistical characteristics, improving coding efficiency. To further enhance latent feature specialization and promote disentanglement, we introduce a mutual information-based regularization term that minimizes redundancy between the two branches while preserving feature diversity. Unlike prior methods relying on covariance-based penalties prone to collapse, our information-theoretic regularizer provides more stable and interpretable latent separation. Experimental results on publicly available LF datasets demonstrate our method achieves strong compression performance, yielding an average BD-PSNR gain of 2.91 dB over HEVC and high compression ratios (e.g., 200:1). Additionally, our design enables fast inference, with a total end-to-end time over 19x faster than the JPEG Pleno standard, making it well-suited for real-time and bandwidth-sensitive applications. By jointly leveraging disentangled representation learning, dual-hyperprior modeling, and information-theoretic regularization, our approach offers a scalable, effective solution for practical light field image compression.

Place, publisher, year, edition, pages
Elsevier BV, 2026
Keywords
Light Field Compression, Variational Autoencoder, Disentanglement, Dual Hyperprior, Spatial-Angular Representation
National Category
Signal Processing Computer Vision and Learning Systems
Identifiers
urn:nbn:se:miun:diva-55991 (URN)10.1016/j.image.2025.117436 (DOI)001620918400001 ()2-s2.0-105021843764 (Scopus ID)
Funder
EU, Horizon 2020, 956770
Available from: 2025-11-18 Created: 2025-11-18 Last updated: 2025-12-08Bibliographically approved
Li, Y.-H., Sikora, T., Knorr, S. & Sjöström, M. (2025). 3D SMoE Splatting for Edge-aware Realtime Radiance Field Rendering. In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers: . Paper presented at SIGGRAPH Asia 2025, Hong Kong Convention and Exhibition Centre, Hong Kong, Hong Kong, December 15 - 18, 2025. ACM Digital Library, Article ID 137.
Open this publication in new window or tab >>3D SMoE Splatting for Edge-aware Realtime Radiance Field Rendering
2025 (English)In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers, ACM Digital Library, 2025, article id 137Conference paper, Published paper (Refereed)
Abstract [en]

Steered Mixtures-of-Experts (SMoE) is an existing regression framework that has previously been applied for modeling and compression of 2D images and higher-dimensional imagery, including compression of light fields and light-field video. SMoE models are sparse, edge-aware representations that allow rendering of imagery with few Gaussians with excellent quality. In this paper a novel, edge-aware "3D SMoE Splatting" (3DSMoES) framework for 3D rendering is introduced, adopted to fit into the existing "3D Gaussian Splatting" (3DGS) CUDA optimization pipeline. Here, SMoE regression serves as a "plug-and-play" solution that replaces the established 3DGS regression as a novel workhorse. 3DSMoES achieves significant visual quality gains with drastically fewer Gaussian kernels compared to 3DGS. We observe up to approximately 4dB improvement in PSNR on individual scenes with kernel reductions between 20 to 50 percent. The sparse models are significantly faster to train and allow up to 30-50 percent improved rendering speeds.

Place, publisher, year, edition, pages
ACM Digital Library, 2025
National Category
Computer Engineering
Identifiers
urn:nbn:se:miun:diva-55993 (URN)10.1145/3757377.3763899 (DOI)979-8-4007-2137-3 (ISBN)
Conference
SIGGRAPH Asia 2025, Hong Kong Convention and Exhibition Centre, Hong Kong, Hong Kong, December 15 - 18, 2025
Available from: 2025-11-18 Created: 2025-11-18 Last updated: 2025-12-12Bibliographically approved
Gond, M., Zerman, E., Shamshirgarha, M., Knorr, S. & Sjöström, M. (2025). A Visual Quality of Experience Toolkit for Realistic Immersive Telepresence Applications. In: 2025 17th International Conference on Quality of Multimedia Experience (QoMEX): . Paper presented at 2025 17th International Conference on Quality of Multimedia Experience (QoMEX), Madrid, Spain, Sept 30 - Oct 3, 2025. IEEE conference proceedings
Open this publication in new window or tab >>A Visual Quality of Experience Toolkit for Realistic Immersive Telepresence Applications
Show others...
2025 (English)In: 2025 17th International Conference on Quality of Multimedia Experience (QoMEX), IEEE conference proceedings, 2025Conference paper, Published paper (Refereed)
Abstract [en]

Immersive imaging applications gained a lot of traction in the last decade with the advances in capture, processing, compression, transmission, and display technologies. Recent works on spherical light fields and new view synthesis bring new challenges with respect to fast rendering of new views and a smooth quality of experience (QoE). The real-time rendering capabilities enable realistic immersive telepresence applications, which extend the traditional telepresence systems with improved visual realism and possible addition of depth cues. The recent efforts on the spherical light fields and new view synthesis approaches show that a combined light field and spherical data visualization will be beneficial for the scientific community. In this paper, we provide a WebGL- and WebXR-based visual QoE toolkit which can be used both on traditional displays and extended reality headsets, presenting various visual modalities. To validate the usefulness of the proposed toolkit, we conducted a pilot test on our publicly available spherical light field database. This toolkit can be used for easier and faster quality assessment and can support the scientific community for subjective visual QoE studies focusing on telecommunication, telepresence, and augmented telepresence applications.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2025
Keywords
Light Field, View Synthesis, Renderer, QoE, XR Headset, Immersive Imaging
National Category
Computer Vision and Learning Systems Computer Engineering
Identifiers
urn:nbn:se:miun:diva-55301 (URN)10.1109/QoMEX65720.2025.11219996 (DOI)979-8-3315-5435-4 (ISBN)
Conference
2025 17th International Conference on Quality of Multimedia Experience (QoMEX), Madrid, Spain, Sept 30 - Oct 3, 2025
Funder
Interreg Aurora, 20366448EU, Horizon 2020, 956770Knowledge Foundation, 2019-0251
Note

Accepted version of paper that will be published in forthcoming IEEE conference proceeding.

Available from: 2025-08-18 Created: 2025-08-18 Last updated: 2025-12-11Bibliographically approved
Li, Y.-H. -., Knorr, S., Sjöström, M. & Sikora, T. (2025). Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression. IEEE transactions on multimedia
Open this publication in new window or tab >>Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression
2025 (English)In: IEEE transactions on multimedia, ISSN 1520-9210, E-ISSN 1941-0077Article in journal (Refereed) Epub ahead of print
Abstract [en]

Kernel image regression methods have demonstrated excellent efficiency in various image processing tasks, including image and light-field compression, Gaussian Splatting, denoising and super-resolution. The estimation of parameters for these methods commonly employs gradient descent iterative optimization, which poses a significant computational burden for many applications. In this paper, we introduce a novel adaptive segmentation-based initialization method targeted for optimizing Steered-Mixture-of Experts (SMoE) gating networks and RadialBasis-Function (RBF) networks with steering kernels. The novel initialization method allocates kernels into pre-calculated image segments. The optimal number of kernels, kernel positions, and steering parameters are derived per segment in an iterative optimization and kernel sparsification procedure. The kernel information from local segments is then transferred into a global initialization, ready for use in iterative optimization of SMoE, RBF, and related kernel image regression methods. Results demonstrate significant improvements in both objective and subjective quality compared to regular grid, K-Means, deeplearning-based, and previous segmentation-based initialization methods. The proposed initialization method reduces kernel usage by 70% compared to other initialization methods while maintaining the same reconstruction quality. Furthermore, by generating initial parameters closer to optimized results, convergence time is reduced, achieving overall runtime savings of up to 50% compared to prior methods. Additionally, the method supports parallel computation, with initialization time halved when using four GPUs compared to one

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Image kernel regression, mixture of experts, gating network, radial basis function network, optimization, initialization, segmentation, compression, denoising, super-resolution
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:miun:diva-54545 (URN)10.1109/TMM.2025.3618576 (DOI)2-s2.0-105018383703 (Scopus ID)
Available from: 2025-06-02 Created: 2025-06-02 Last updated: 2025-10-21
Zerman, E., Olsson, R. & Sjöström, M. (2025). Challenges in Experiencing Realistic Immersive Telepresence.
Open this publication in new window or tab >>Challenges in Experiencing Realistic Immersive Telepresence
2025 (English)Other (Other academic)
Abstract [en]

Immersive imaging technologies offer a transformative way to change how we experience interacting with remote environments, i.e., telepresence. By leveraging advancements in light field imaging, omnidirectional cameras, and head-mounted displays, these systems enable realistic, real-time visual experiences that can revolutionize how we interact with the remote scene in fields such as healthcare, education, remote collaboration, and entertainment. However, the field faces significant technical and experiential challenges, including efficient data capture and compression, real-time rendering, and quality of experience (QoE) assessment. Expanding on the findings of the authors’ recent publication and situating them within a broader theoretical framework, this article provides an integrated overview of immersive telepresence technologies, focusing on their technological foundations, applications, and the challenges that must be addressed to advance this field.

Series
ACM SIGMM Records, ISSN 1947-4598 ; 17:1
National Category
Computer Engineering Signal Processing Human Computer Interaction
Identifiers
urn:nbn:se:miun:diva-54546 (URN)
Note

The SIGMM Records are the SIG Multimedia’s quarterly newsletter.

Available from: 2025-06-02 Created: 2025-06-02 Last updated: 2025-09-25Bibliographically approved
Hassan, A., Zhang, T., Egiazarian, K. & Sjöström, M. (2025). CR-DARTS: Channel Redistribution-based Differentiable Architecture Search. IEEE Access, 13, 201166-201182
Open this publication in new window or tab >>CR-DARTS: Channel Redistribution-based Differentiable Architecture Search
2025 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 13, p. 201166-201182Article in journal (Refereed) Published
Abstract [en]

Differentiable Architecture Search (DARTS) has shown promising results in automating the design of deep learning models. However, its search process is computationally expensive because it evaluates all candidate operations simultaneously, often leading to an over-parameterized and inefficient search network. To reduce the computational cost, DARTS employs a smaller search network than the final evaluation network, which introduces an architecture optimization gap that limits real-world performance. To overcome this limitation, we introduce CR-DARTS, a multi-stage search framework designed to bridge the architecture optimization gap through an adaptive channel redistribution strategy. CR-DARTS reduces the computational complexity of the search network by compressing the shared input features among candidate operations and restoring the network dimensions via channel-wise feature concatenation. In addition, it progressively eliminates underperforming operations and redistributes the number of channels for more relevant feature extraction, thereby narrowing the gap between the search and evaluation networks. We validated CR-DARTS on two diverse computer vision tasks to assess its generalizability. Experimental results show that the proposed search framework reduces the memory requirement of the DARTS algorithm by up to 4.3×, while addressing the architecture optimization gap. Moreover, in the evaluation phase, the discovered architecture achieves up to 25.3% reductions in computational complexity and 50.6% faster inference time compared to state-of-the-art methods, while maintaining comparable accuracy. It also produces a competitive fire segmentation network that outperforms the state-of-the-art methods while maintaining similar computational efficiency. These results demonstrate that CR-DARTS is a practical solution for neural architecture search. Source code will be made publicly available at https://github.com/Realistic3D-MIUN/CR-DARTS.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Differentiable Architecture Search;Fire Segmentation;Image Classification;Model Optimization;Neural Architecture Search
National Category
Computer Engineering Artificial Intelligence
Identifiers
urn:nbn:se:miun:diva-56051 (URN)10.1109/access.2025.3637375 (DOI)2-s2.0-105023045948 (Scopus ID)
Projects
PLENOPTIMAIMMERSE
Funder
Mid Sweden UniversityEU, Horizon 2020, 956770Interreg Aurora, 20366448Swedish National Infrastructure for Computing (SNIC), 2022-06725
Available from: 2025-11-27 Created: 2025-11-27 Last updated: 2025-12-09Bibliographically approved
Takhtardeshir, S., Olsson, R., Guillemot, C. & Sjöström, M. (2025). Efficient and Fast Light Field Compression via VAE-Based Spatial and Angular Disentanglement. IEEE Access, 13, 18594-18607
Open this publication in new window or tab >>Efficient and Fast Light Field Compression via VAE-Based Spatial and Angular Disentanglement
2025 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 13, p. 18594-18607Article in journal (Refereed) Published
Abstract [en]

Light field (LF) imaging captures both spatial and angular information, which is essential for applications such as depth estimation, view synthesis, and post-capture refocusing. However, the efficient processing of this data, particularly in terms of compression and encoding/decoding time, presents challenges. We propose a Variational Autoencoder (VAE)-based framework to disentangle the spatial and angular features of light field images, focusing on fast and efficient compression. Our method uses two separate sub-encoders-one for spatial and one for angular features-to allow for independent processing in the latent space, which facilitates more streamlined compression workflows. Evaluations on standard light field datasets demonstrate that our approach reduces encoding and decoding time significantly, with a slight trade-off in Rate-Distortion (RD) performance, making it suitable for real-time applications.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Light fields, Image coding, Decoding, Kernel, Image reconstruction, Feature extraction, Training, Imaging, Streaming media, Redundancy, Light field, compression, disentangling, variational auto-encoder
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:miun:diva-53824 (URN)10.1109/ACCESS.2025.3532608 (DOI)001410367700049 ()2-s2.0-85216292711 (Scopus ID)
Available from: 2025-02-14 Created: 2025-02-14 Last updated: 2025-09-25
Hassan, A., Zhang, T., Egiazarian, K. & Sjöström, M. (2025). EPINET-Lite: Rethinking Mixed Convolutions forEfficient Light Field Disparity Estimation Network. In: 2025 IEEE 27th International Workshop on Multimedia Signal Processing (MMSP): . Paper presented at 2025 IEEE 27th International Workshop on Multimedia Signal Processing (MMSP). IEEE conference proceedings
Open this publication in new window or tab >>EPINET-Lite: Rethinking Mixed Convolutions forEfficient Light Field Disparity Estimation Network
2025 (English)In: 2025 IEEE 27th International Workshop on Multimedia Signal Processing (MMSP), IEEE conference proceedings, 2025Conference paper, Published paper (Refereed)
Abstract [en]

Convolutional neural networks are widely used forlight field disparity estimation. However, many state-of-the-artdeep learning models are computationally expensive due to their reliance on standard convolutions with varying kernel sizes. In this paper, we analyze the effect of various advanced convolution operations with different kernel sizes for feature extraction in a state-of-the-art light field disparity estimation network. Based on this investigation, we propose an optimized mixed convolution layer to extract relevant features using multiple kernel sizes in parallel, while maintaining significantly lower computational cost. Experimental results demonstrate that our approach reduces model complexity by up to 4.2× while also improving disparity estimation accuracy. These findings make the proposed convolutional operation more practical for lightfield applications, where efficient spatial and angular feature extraction is essential for improved model performance.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2025
Keywords
Convolutional Neural Network, Deep Learning, Disparity Estimation, Light Field, Optimization
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:miun:diva-56050 (URN)
Conference
2025 IEEE 27th International Workshop on Multimedia Signal Processing (MMSP)
Available from: 2025-11-27 Created: 2025-11-27 Last updated: 2025-11-27
Weinkauf, T., Romero, M., Kerren, A., Larsson, E., Latino, F., Liliequist, E., . . . Gelfgren, S. (2025). InfraVis - The Swedish Research Infrastructure for Visualization Support. In: C. Gillmann, M. Krone, G. Reina, T. Wischgoll (Ed.), VisGap - The Gap between Visualization Research and Visualization Software: . Paper presented at VisGap, Luxembourg, June 2, 2025. The Eurographics Association
Open this publication in new window or tab >>InfraVis - The Swedish Research Infrastructure for Visualization Support
Show others...
2025 (English)In: VisGap - The Gap between Visualization Research and Visualization Software / [ed] C. Gillmann, M. Krone, G. Reina, T. Wischgoll, The Eurographics Association , 2025Conference paper, Published paper (Refereed)
Abstract [en]

Essentially all academic research of today relies on analysis of data from a wide range of sources. Several underpinning, and rapidly developing, technologies are supporting the analysis of this data. Visualization serves as an interface to this ecosystem of tools and methods and integrates them into environments supporting scientific workflows, effectively sharing cognitive load between computers and humans. There is, however, a gap between the state-of-the-art in visual data analysis and current wide-spread academic practice. Support for the introduction of new, improved and tailored, visual data analysis environments thus has the potential to address challenges involving large and complex data, creating competitive advantages for researchers. To fill the gap and capitalize on this opportunity, the InfraVis initiative has been created in Sweden with the mission to operate an infrastructure consisting of visualization experts, software solutions, and access to high-end visualization laboratories. Users of InfraVis are offered assistance through a national helpdesk with rapid response times as well as more in-depth projects addressing specific data and software challenges. InfraVis provides software solutions based on development within connected research groups, curation of international software and best practice, and user training in the form of courses, seminars and on-line documentation. To build an infrastructure with national coverage, we have pooled together nine visualization environments in Sweden interconnected in a nodal structure. The nodes are hosted in proximity to research environments in visualization, which enables direct access to the research front as well as to state-of-art facilities. The governance structure of InfraVis is based on the leading researchers in visualization in Sweden as well as an international advisory board.

Place, publisher, year, edition, pages
The Eurographics Association, 2025
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:miun:diva-54642 (URN)10.2312/visgap.20251157 (DOI)978-3-03868-289-9 (ISBN)
Conference
VisGap, Luxembourg, June 2, 2025
Available from: 2025-06-16 Created: 2025-06-16 Last updated: 2025-09-25Bibliographically approved
Willingham, S., Sjöström, M. & Guillemot, C. (2025). Maximum A Posteriori Training of Diffusion Models for Image Restoration. In: 2025 33rd European Signal Processing Conference (EUSIPCO): . Paper presented at 33rd European Signal Processing Conference, Isola Delle Femmine, Palermo, Italy, September 8-12, 2025 (pp. 1882-1886). IEEE conference proceedings
Open this publication in new window or tab >>Maximum A Posteriori Training of Diffusion Models for Image Restoration
2025 (English)In: 2025 33rd European Signal Processing Conference (EUSIPCO), IEEE conference proceedings, 2025, p. 1882-1886Conference paper, Published paper (Refereed)
Abstract [en]

Inverse problems involve reconstructing clean images from degraded observations. Maximum a Posteriori (MAP) estimation reconstructs the most probable source image from noisy measurements. When combined with Plug-and-Play (PnP) priors defined by an image denoising algorithm, MAP estimation yields high-quality reconstructions. In contrast, Diffusion Models (DMs) address inverse problems by sampling from the posterior distribution using score functions trained on images perturbed by Gaussian noise. Prior work reformulated diffusion sampling as Deep Equilibrium (DEQ) models but did not fine-tune DMs for inverse problems. This work introduces MaximUm a PostEriori Training (MUPET), a framework that leverages PnP gradient descent to enable DEQ fine-tuning of DMs on inverse problems. By refining a generative prior at the fixed-point of MAP estimation, MUPET enhances image restoration via posterior sampling while maintaining quality when sampling from the prior.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2025
Keywords
Inverse Problems, Diffusion Models, Posterior Sampling, Generative Methods, Deep Equilibrium Models
National Category
Computer Vision and Learning Systems
Identifiers
urn:nbn:se:miun:diva-56005 (URN)10.23919/EUSIPCO63237.2025.11226569 (DOI)978-9-4645-9362-4 (ISBN)
Conference
33rd European Signal Processing Conference, Isola Delle Femmine, Palermo, Italy, September 8-12, 2025
Available from: 2025-11-24 Created: 2025-11-24 Last updated: 2025-12-08Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3751-6089

Search in DiVA

Show all publications