Quantization Analysis of Transformer Models for Efficient Natural Language Processing: Exploring the Impact of Model Compression on Performance and Efficiency
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [sv]
Denna studie undersöker effekten av kvantiseringstekniker på transformer-modeller inom Natural Language Processing (NLP), med särskilt fokus på DistilGPT-2 och GPT-Neo-modeller. Syftet är att minska de här modellernas beräkningskrav för att underlätta deras implementering på enheter med begränsat minne och bearbetningskraft. Forskningen använder en kvantitativ experimentell design, som utvärderar effekterna av 4-bitars, 8-bitars och standardkvantisering på modellens prestanda, exekveringstid och minnesavtryck. Resultaten visar att 8-bitars kvantisering erbjuder den bästa balansen, genom att bibehålla hög kvalitet på textgenereringen samtidigt som modellstorleken och beräkningskraven minskas avsevärt. Fynden ger en praktisk lösning för att implementera transformer-modeller i resursbegränsade miljöer och bidrar till mer effektiva och tillgängliga NLP-applikationer.
Abstract [en]
This study investigates the impact of quantization techniques on transformer models in Natural Language Processing (NLP), specifically focusing on DistilGPT-2 and GPT-Neo models. The objective is to reduce the computational requirements of these models to ease their implementation on devices with limited memory and processing power. The research employs a quantitative experimental design, evaluating the effects of 4-bit, 8-bit, and default quantization on model performance, execution time, and memory footprint. Results indicate that 8-bit quantization offers the best balance, maintaining high text generation quality while significantly reducing model size and computational requirements. The findings provide a practical solution for deploying transformer models in resource-constrained environments, contributing to more efficient and accessible NLP applications.
Place, publisher, year, edition, pages
2024. , p. 75
Keywords [en]
Quantization, Transformer Models, NLP, DistilGPT-2, GPT-Neo, Model Optimization
Keywords [sv]
Kvantisering, Transformer-modeller, NLP, DistilGPT-2, GPT-Neo, Modelloptimering
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:miun:diva-51801Local ID: DT-V24-G3-030OAI: oai:DiVA.org:miun-51801DiVA, id: diva2:1879022
Subject / course
Computer Engineering DT1
Educational program
Master of Science in Engineering - Computer Engineering TDTEA 300 higher education credits
Supervisors
Examiners
2024-06-272024-06-272024-06-27Bibliographically approved