miun.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Interesting Association Rules Mining Based on Improved Rarity Algorithm
Mid Sweden University, Faculty of Science, Technology and Media, Department of Information Systems and Technology.
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

With the rapid development of science and technology, our society has been in the big data era. In human activities, we produce a lot of data in every second and every minute, what contain much information. Then, how to select the useful information from those complicated data is a significant issue. So the association rules mining, a technique of mining patterns or associations between itemsets, comes into being. And this technique aims to find some important associations in data to get useful knowledge. Nowadays, most scholars at home and abroad focus on the frequent pattern mining. However, it is undeniable that the rare pattern mining also plays an important role in many areas, such as the medical, financial, and scientific field. Comparing with frequent pattern mining, studying rare pattern mining is more valuable, because it tends to find unknown, unexpected, and more interesting rules. But the study of rare pattern mining is little difficult because of the scarcity of data used for verifying rules. In the frequent pattern mining, there are two general algorithms of discovering frequent itemsets, i.e., Apriori, the earliest algorithm which is proposed by R.Agrawal in 1994, and FP-Tree, the improved algorithm which reduced the time complexity. And in rare pattern mining, there are also two algorithms, Arima and Rarity, what are similar to Apriori and FP-Tree algorithms, but they still exist some problems, for example, Arima is time-consuming because of repeatedly scanning the large database, and Rarity is space-consuming because of the establishment of the full-combination tree. Therefore, based on the

Rarity algorithm, this report presents an improved method to efficiently discover interesting association rules among rare itemsets and aims to get a balance between time and space. It is a top-down strategy which uses the graph structure to indicate all combinations of existing items, defines pattern matrix to record itemsets, and combines the hash table to accelerate calculation process. This method decreases both the time cost and the space cost when comparing with Arima, and reduces the space

waste to solve the problem of Rarity, but its searching time of mining rare itemsets is more than Rarity, and we verified the feasibility of this algorithm only on abstract and small databases. Thus in the future, on

the one hand, we will continue improving our method to explore how to decrease the searching time in the process and adjust the hash function to optimize the space utilization. And on the other hand, we will apply our method to actual large databases, such as the clinical database of the diabetic patients to mine association rules in diabetic complications.

Place, publisher, year, edition, pages
2018. , p. 51
Keywords [en]
Association rules mining, Rare pattern mining
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:miun:diva-35320Local ID: DT-V18-A2-006OAI: oai:DiVA.org:miun-35320DiVA, id: diva2:1273437
Subject / course
Computer Engineering DT1
Supervisors
Examiners
Available from: 2018-12-21 Created: 2018-12-21 Last updated: 2018-12-21Bibliographically approved

Open Access in DiVA

fulltext(1403 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 1403 kBChecksum SHA-512
4412d405774b205f9e33be4918ab6d7b31911e39fbbc13d39dda0dad9088ad5dc903d6e00fbbffda57c323fd1771815bb40cf6da81fce2345402b03aeee209f0
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Xiang, Lan
By organisation
Department of Information Systems and Technology
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 23 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 29 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf