Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Steered Mixture-of-Experts for Compactand Edge-aware Representation: From 2D Image Regression to 3D Radiance Fields
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-). (Realistic3D)ORCID iD: 0009-0003-0878-0179
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

As visual computing advances across domains such as image editing, autonomous driving, and digital twins, the need for high-fidelity yet computationally efficient representations has become increasingly critical. Traditional 2D models are constrained by fixed grids, limiting their adaptability and compactness, while emerging 3D techniques often deliver realism at the cost of excessive training time, memory usage, and energy consumption. This thesis tackles a central challenge across both 2D and 3D domains: how to construct scalable, high-quality visual representations without succumbing to inefficiency.

We examine Steered Mixture-of-Experts (SMoE)—a modular, kernel-based architecture that promises localized modeling and interpretability. Yet despite its expressive power, SMoE has historically suffered from impractical training regimes, bloated parameter counts, and poor support for high-dimensional data. This work pursues a cohesive answer to three research questions, aimed at making SMoE fast, compact, and capable of handling 3D visual content.

First, we confront the long-ignored problem of initialization. Through a segmentation-based method that aligns expert kernels with semantic image regions, we drastically reduce redundancy and training duration, producing models that are both compact and structurally aligned with the data. Second, we tackle the inefficiency of gradient-based optimization by introducing a rasterized training scheme, adapted from Gaussian splatting techniques in 3D rendering. By partitioning images into blocks and activating only relevant kernels during each optimization step, we reduce the computational footprint by an order of magnitude without sacrificing accuracy. Third, we generalize SMoE to 3D by reparameterizing its spatial kernels and integrating splatting-based differentiable rendering. This extension maintains the compactness of SMoE while supporting high-quality scene reconstruction, even under sparse supervision.

Experimental results confirm that our methods outperform baseline SMoE implementations in both speed and reconstruction quality across 2D and 3D tasks, and they further surpass existing 3DGS and related Gaussian-based approaches. Moreover, our approach enables previously infeasible applications—real-time training, compact deployment, and scalable modeling of complex scenes.

This thesis transforms SMoE from a theoretically elegant yet impractical construct into a viable backbone for efficient, high-fidelity visual data representation. By grounding mixture models in perceptual structure and exploiting block-level sparsity, we chart a broader design principle for structure-aware, rasterization-friendly learning systems.

Place, publisher, year, edition, pages
Berlin: Technische Universität Berlin , 2026. , p. 121
Series
Mid Sweden University doctoral thesis, ISSN 1652-893X ; 443
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:miun:diva-56611OAI: oai:DiVA.org:miun-56611DiVA, id: diva2:2038102
Public defence
2026-02-05, Berlin, 00:00
Supervisors
Note

The thesis is part of a double PhD degree with Technische Universität Berlin and Mid Sweden University, published at TU Berlin.

At the time of the doctoral defence the following paper was unpublished: paper 3 and 5 (manuscript).

Available from: 2026-02-13 Created: 2026-02-12 Last updated: 2026-02-13Bibliographically approved
List of papers
1. Segmentation-based Initialization for Steered Mixture of Experts
Open this publication in new window or tab >>Segmentation-based Initialization for Steered Mixture of Experts
2023 (English)In: 2023 IEEE International Conference on Visual Communications and Image Processing (VCIP), IEEE conference proceedings, 2023Conference paper, Published paper (Refereed)
Abstract [en]

The Steered-Mixture-of-Experts (SMoE) model is an edge-Aware kernel representation that has successfully been explored for the compression of images, video, and higher-dimensional data such as light fields. The present work aims to leverage the potential for enhanced compression gains through efficient kernel reduction. We propose a fast segmentation-based strategy to identify a sufficient number of kernels for representing an image and giving initial kernel parametrization. The strategy implies both reduced memory footprint and reduced computational complexity for the subsequent parameter optimization, resulting in an overall faster processing time. Fewer kernels, when combined with the inherent sparsity of the SMoEs, further enhance the overall compression performance. Empirical evaluations demonstrate a gain of 0.3-1.0 dB in PSNR for a constant number of kernels, and the use of 23 % less kernels and 25 % less time for constant PSNR. The results highlight the feasibility and practicality of the approach, positioning it as a valuable solution for various image-related applications, including image compression. 

Place, publisher, year, edition, pages
IEEE conference proceedings, 2023
Keywords
compression, gating network, segmentation, Computer vision, Image segmentation, Compression of images, Edge aware, High dimensional data, Kernel representation, Light fields, Mixture of experts, Mixture-of-experts model, Image compression
National Category
Computer Engineering
Identifiers
urn:nbn:se:miun:diva-50594 (URN)10.1109/VCIP59821.2023.10402643 (DOI)2-s2.0-85184853593 (Scopus ID)9798350359855 (ISBN)
Conference
2023 IEEE International Conference on Visual Communications and Image Processing, VCIP 2023
Available from: 2024-02-20 Created: 2024-02-20 Last updated: 2026-04-02Bibliographically approved
2. Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression
Open this publication in new window or tab >>Adaptive Segmentation-Based Initialization for Steered Mixture of Experts Image Regression
2025 (English)In: IEEE transactions on multimedia, ISSN 1520-9210, E-ISSN 1941-0077, Vol. 27, p. 9802-9817Article in journal (Refereed) Published
Abstract [en]

Kernel image regression methods have demonstrated excellent efficiency in various image processing tasks, including image and light-field compression, Gaussian Splatting, denoising and super-resolution. The estimation of parameters for these methods commonly employs gradient descent iterative optimization, which poses a significant computational burden for many applications. In this paper, we introduce a novel adaptive segmentation-based initialization method targeted for optimizing Steered-Mixture-of Experts (SMoE) gating networks and RadialBasis-Function (RBF) networks with steering kernels. The novel initialization method allocates kernels into pre-calculated image segments. The optimal number of kernels, kernel positions, and steering parameters are derived per segment in an iterative optimization and kernel sparsification procedure. The kernel information from local segments is then transferred into a global initialization, ready for use in iterative optimization of SMoE, RBF, and related kernel image regression methods. Results demonstrate significant improvements in both objective and subjective quality compared to regular grid, K-Means, deeplearning-based, and previous segmentation-based initialization methods. The proposed initialization method reduces kernel usage by 70% compared to other initialization methods while maintaining the same reconstruction quality. Furthermore, by generating initial parameters closer to optimized results, convergence time is reduced, achieving overall runtime savings of up to 50% compared to prior methods. Additionally, the method supports parallel computation, with initialization time halved when using four GPUs compared to one

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Image kernel regression, mixture of experts, gating network, radial basis function network, optimization, initialization, segmentation, compression, denoising, super-resolution
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:miun:diva-54545 (URN)10.1109/TMM.2025.3618576 (DOI)001641495700020 ()2-s2.0-105018383703 (Scopus ID)
Available from: 2025-06-02 Created: 2025-06-02 Last updated: 2026-02-12
3.
The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.
4. 3D SMoE Splatting for Edge-aware Realtime Radiance Field Rendering
Open this publication in new window or tab >>3D SMoE Splatting for Edge-aware Realtime Radiance Field Rendering
2025 (English)In: Proceedings of the SIGGRAPH Asia 2025 Conference Papers, ACM Digital Library, 2025, article id 137Conference paper, Published paper (Refereed)
Abstract [en]

Steered Mixtures-of-Experts (SMoE) is an existing regression framework that has previously been applied for modeling and compression of 2D images and higher-dimensional imagery, including compression of light fields and light-field video. SMoE models are sparse, edge-aware representations that allow rendering of imagery with few Gaussians with excellent quality. In this paper a novel, edge-aware "3D SMoE Splatting" (3DSMoES) framework for 3D rendering is introduced, adopted to fit into the existing "3D Gaussian Splatting" (3DGS) CUDA optimization pipeline. Here, SMoE regression serves as a "plug-and-play" solution that replaces the established 3DGS regression as a novel workhorse. 3DSMoES achieves significant visual quality gains with drastically fewer Gaussian kernels compared to 3DGS. We observe up to approximately 4dB improvement in PSNR on individual scenes with kernel reductions between 20 to 50 percent. The sparse models are significantly faster to train and allow up to 30-50 percent improved rendering speeds.

Place, publisher, year, edition, pages
ACM Digital Library, 2025
National Category
Computer Engineering
Identifiers
urn:nbn:se:miun:diva-55993 (URN)10.1145/3757377.3763899 (DOI)2-s2.0-105032510950 (Scopus ID)979-8-4007-2137-3 (ISBN)
Conference
SIGGRAPH Asia 2025, Hong Kong Convention and Exhibition Centre, Hong Kong, Hong Kong, December 15 - 18, 2025
Available from: 2025-11-18 Created: 2025-11-18 Last updated: 2026-04-02Bibliographically approved

Open Access in DiVA

fulltext(57612 kB)85 downloads
File information
File name FULLTEXT01.pdfFile size 57612 kBChecksum SHA-512
3db990ee19326ee6b68bc55a6a6e2f81500497ddf6289cd4d91310ceeffdfb1c17d90f76aff227df753c3f52fb2c66ca68b2e4ef8232eb5f70a5f7422e716d0f
Type fulltextMimetype application/pdf

Authority records

Li, Yi-Hsin

Search in DiVA

By author/editor
Li, Yi-Hsin
By organisation
Department of Computer and Electrical Engineering (2023-)
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 6491 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf