miun.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Interesting Association Rules Mining Based on Improved Rarity Algorithm
Mittuniversitetet, Fakulteten för naturvetenskap, teknik och medier, Avdelningen för informationssystem och -teknologi.
2018 (engelsk)Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
Abstract [en]

With the rapid development of science and technology, our society has been in the big data era. In human activities, we produce a lot of data in every second and every minute, what contain much information. Then, how to select the useful information from those complicated data is a significant issue. So the association rules mining, a technique of mining patterns or associations between itemsets, comes into being. And this technique aims to find some important associations in data to get useful knowledge. Nowadays, most scholars at home and abroad focus on the frequent pattern mining. However, it is undeniable that the rare pattern mining also plays an important role in many areas, such as the medical, financial, and scientific field. Comparing with frequent pattern mining, studying rare pattern mining is more valuable, because it tends to find unknown, unexpected, and more interesting rules. But the study of rare pattern mining is little difficult because of the scarcity of data used for verifying rules. In the frequent pattern mining, there are two general algorithms of discovering frequent itemsets, i.e., Apriori, the earliest algorithm which is proposed by R.Agrawal in 1994, and FP-Tree, the improved algorithm which reduced the time complexity. And in rare pattern mining, there are also two algorithms, Arima and Rarity, what are similar to Apriori and FP-Tree algorithms, but they still exist some problems, for example, Arima is time-consuming because of repeatedly scanning the large database, and Rarity is space-consuming because of the establishment of the full-combination tree. Therefore, based on the

Rarity algorithm, this report presents an improved method to efficiently discover interesting association rules among rare itemsets and aims to get a balance between time and space. It is a top-down strategy which uses the graph structure to indicate all combinations of existing items, defines pattern matrix to record itemsets, and combines the hash table to accelerate calculation process. This method decreases both the time cost and the space cost when comparing with Arima, and reduces the space

waste to solve the problem of Rarity, but its searching time of mining rare itemsets is more than Rarity, and we verified the feasibility of this algorithm only on abstract and small databases. Thus in the future, on

the one hand, we will continue improving our method to explore how to decrease the searching time in the process and adjust the hash function to optimize the space utilization. And on the other hand, we will apply our method to actual large databases, such as the clinical database of the diabetic patients to mine association rules in diabetic complications.

sted, utgiver, år, opplag, sider
2018. , s. 51
Emneord [en]
Association rules mining, Rare pattern mining
HSV kategori
Identifikatorer
URN: urn:nbn:se:miun:diva-35320Lokal ID: DT-V18-A2-006OAI: oai:DiVA.org:miun-35320DiVA, id: diva2:1273437
Fag / kurs
Computer Engineering DT1
Veileder
Examiner
Tilgjengelig fra: 2018-12-21 Laget: 2018-12-21 Sist oppdatert: 2018-12-21bibliografisk kontrollert

Open Access i DiVA

fulltext(1403 kB)25 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 1403 kBChecksum SHA-512
4412d405774b205f9e33be4918ab6d7b31911e39fbbc13d39dda0dad9088ad5dc903d6e00fbbffda57c323fd1771815bb40cf6da81fce2345402b03aeee209f0
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Xiang, Lan
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 25 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 33 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf