Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
23
searchinapureformmaybeusefulforsearchspacepruningwhenminingonly
maximalfrequentitemsets.
Top-downsearchwasfirstdiscussedbyZakiforhislattice-theoretical
approachtofrequentitemsetdiscovery[216,221].WhileEclat,whichisthebest
knownoflattice-basedalgorithms,usesbottom-upsearch,similarlyasitsmain
competitorwithinthisfamilyofalgorithmscalledClique,top-downsearchhas
alsobeenconsidered,initiallyasasupplement[221]andlaterasanalternative
[216]tobottom-upsearch.TheMaxEclatandMaxCliquealgorithms[221]use
hybrid(top-down/bottom-up)search,whereasTopDown[216],asitsname
suggests,performstop-downsearchesofitemsetsublattices.Obviously,the
latterthreealgorithms,duetothenatureoftop-downsearch,aresuitableonly
forminingmaximalfrequentitemsets.
2.4.3.DatabaseLayout
Conceptually,adatabaseusedforfrequentitemsetminingcanbeinterpreted
asatwo-dimensionalmatrixwithrowsrepresentingtransactionsandcolumns
representingpurchaseditems(usingtheterminologyfrommarket-basket
analysis,theareafromwhichtheproblemoriginated).Suchamatrixcanbe
implementedintherelationalorobject-relationaldatabaseinseveralways.Two
majorgeneraldataformatsconsideredintheoryandpracticeoffrequentitemset
miningarehorizontalandverticaldataformats.Inthehorizontalformatthe
databaseisorganizedasasetofrecords,whereeachrecordrepresentsa
transactionandcarriesinformationaboutthetransactionidentifier(tid)andthe
setofitemscontainedinthetransaction.Intheverticalformatthedatabaseis
alsoasetofrecordsbutarecordinthiscasecontainsanitemandtransaction
identifiersoftransactionscontainingtheitem.
In[174]aclassificationdistinguishingtwovariantsofhorizontalandvertical
formatswasproposed,resultinginthefollowingfourpossibledatabaselayouts
forfrequentitemsetmining:
Horizontalitem-list(HIL):eachrecordcorrespondstoatransactionand
containsitsitemsstoredasanorderedlist
Horizontalitem-vector(HIV):eachrecordcorrespondstoatransaction
andcontainsabit-vectorwith1’sforitemspresentinthetransactionand
0’sfortheremainingones
Verticaltid-list(VTL):eachrecordcorrespondstoanitemandcontains
identifiersoftransactionscontainingitstoredasanorderedlist(called
tid-list)
Verticaltid-vector(VTV):eachrecordcorrespondstoanitemand
containsabit-vectorwith1’sfortransactionscontainingtheitemand0’s
fortheremainingones.
Alltheaforementioneddatabaselayoutsassumesomeorderingofitemsand
transactionidentifiersrespectivelytobeusedconsistentlyforlistsorbitmaps.