miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An extended BIRCH-based clustering algorithm for large time-series datasets
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information and Communication systems.
2017 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Temporal data analysis and mining has attracted substantial interest due to theproliferation and ubiquity of time series in many fields. Time series clustering isone of the most popular mining methods, and many time series clustering algorithmsprimarily focus on detecting the clusters in a batch fashion that will use alot of memory space and thus limit the scalability and capability for large timeseries.The BIRCH algorithm has been proven to scale well to large datasets,which is characterized by an incrementally clustering data objects using a singlescan. However the Euclidean distance metric employed in BIRCH has beenproven to not be accurate for time series and will degrade the accuracy performance.To overcome this drawback, this work proposes an extended BIRCH algorithmfor large time series. The BIRCH clustering algorithm is extended bychanging the cluster feature vector to the proposed modified cluster feature, replacingthe original Euclidean distance measure with dynamic time warping andemploying DTW barycenter averaging method as the centroid computation approach,which is more suitable for time-series clustering than any other averagingmethods. To demonstrate the effectiveness of the proposed algorithm, weconducted an extensive evaluation of our algorithm against BIRCH, k-meansand their variants with combinations of competitive distance measures. Experimentalresults show that the extended BIRCH algorithm improves the accuracysignificantly compared to the BIRCH algorithm and its variants, and achievescompetitive and similar accuracy as k-means and its variant, k-DBA. However,unlike k-means and k-DBA, the extended BIRCH algorithm maintains the abilityof incrementally handling continuous incoming data objects, which is thekey to cluster large time-series datasets. Finally the extended BIRCH-based algorithmis applied to solve a subsequence time-series clustering task of a simulationmulti-variate time-series dataset with the help of a sliding window.

Place, publisher, year, edition, pages
2017. , 57 p.
Keyword [en]
Time series, Data stream, Clustering, BIRCH, DTW, DBA.
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:miun:diva-29858Local ID: DT-V16-A2-003OAI: oai:DiVA.org:miun-29858DiVA: diva2:1064522
Subject / course
Computer Engineering DT1
Educational program
International Master's Programme in Computer Engineering TDAAA 120 higher education credits
Supervisors
Examiners
Available from: 2017-01-12 Created: 2017-01-12 Last updated: 2017-01-12Bibliographically approved

Open Access in DiVA

fulltext(1511 kB)207 downloads
File information
File name FULLTEXT01.pdfFile size 1511 kBChecksum SHA-512
410b65cee7fc194f824fc66806e7f308c5507759f87f796094bf3bf297c093f02841a32273f060eaa7845d319ca7649e5e4c92ff08ef3511af0f61788c59eedd
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Lei, Jiahuan
By organisation
Department of Information and Communication systems
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 207 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 682 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf