Mid Sweden University

miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient Inference of parallel partitioned hybrid-Vision Transformers
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-). TU Wien, Vienna, Austria.ORCID iD: 0009-0000-8343-9649
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-).
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-). TU Wien, Vienna, Austria.ORCID iD: 0000-0003-2251-0004
Mid Sweden University, Faculty of Science, Technology and Media, Department of Computer and Electrical Engineering (2023-).ORCID iD: 0000-0001-8607-4083
Show others and affiliations
2025 (English)In: 2025 CYBER-PHYSICAL SYSTEMS AND INTERNET-OF-THINGS WEEK, CPS-IOT WEEK WORKSHOPS, Association for Computing Machinery (ACM) , 2025, article id 5Conference paper, Published paper (Refereed)
Abstract [en]

Recent advancements have explored parallel partitioning of Transformers and Convolutional Neural Network (CNN) based models across networks of edge devices to accelerate deep neural network (DNN) inference. However, partitioning strategies for hybrid Vision Transformers-models integrating convolutional and attention layers-remain underdeveloped, particularly in scenarios with low communication data rates. This work introduces a novel partitioning scheme tailored for hybrid Vision Transformers, addressing communication latency through efficient compressed communication and model size reduction. The proposed approach incorporates a trainable quantization and JPEG compression pipeline to minimize overhead. We evaluate our scheme on two state-of-the-art architectures, edgeViT and CoatNet. For a communication data rate of 10 MB/s and partitioning across 12 devices, we achieve up to a 1.74x speed-up and a 5.34x model size reduction for edgeViT-XXS. Similarly, on a customized CoatNet-0, our method achieves a 1.40x speed-up and a 2.66x reduction in model size, demonstrating the efficacy of the approach in real-world scenarios.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM) , 2025. article id 5
Keywords [en]
Partitioning, Parallelization, IoT, ViT, hybrid Transformers
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:miun:diva-54746DOI: 10.1145/3722567.3727849ISI: 001498351300005ISBN: 9798400716119 (print)OAI: oai:DiVA.org:miun-54746DiVA, id: diva2:1975888
Conference
4th International Workshop on Real-time and Intelligent Edge Computing-RAGE, MAY 06-09, 2025, Irvine, CA
Available from: 2025-06-24 Created: 2025-06-24 Last updated: 2025-09-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Berg, Oscar Artur BerndSaqib, EirajO'Nils, MattiasKrug, SilviaShallari, IridaSánchez Leal, Isaac

Search in DiVA

By author/editor
Berg, Oscar Artur BerndSaqib, EirajJantsch, AxelO'Nils, MattiasKrug, SilviaShallari, IridaSánchez Leal, Isaac
By organisation
Department of Computer and Electrical Engineering (2023-)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 128 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf