Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
22
Theobservationthatallsupersetsofaninfrequentitemsetmustbeinfrequent
toowasalreadyemployedinthedesignofthefirstfrequentitemsetmining
algorithm(knownasAIS[4]).AISperformsabottom-upsearchoftheitemset
latticebystartingwithcountingthesupportsofsingleitemsfromthedatabase
andthenextendingthefrequentitemsetsfoundsofarwithitemsoccurring
togetherwiththemindatabasetransactions,formingpotentiallyfrequent
itemsets,calledcandidates.Theprocedurerequiresmultiplepassesoverthe
databasebutguaranteesthatnoitemsetthatwasasupersetofanitemsetknown
tobeinfrequentisbeingcounted.Moreover,onlyitemsetsactuallyoccurringin
thedatabasecanbecomecandidates.
Apriori[8]usesanimprovedcandidategenerationprocedurethatfullytakes
advantageofthesupportanti-monotonicity.Notonlyitgeneratescandidatesby
extendingitemsetsfoundtobefrequent,butalsorequiresthatallthesubsetsof
acandidatehavetobefrequent.Aprioriisaclassicexampleofbreadth-first
bottom-upfrequentitemsetsearchasitoperateslevel-wiseandgenerates
candidatesofsizek+1onlyafterfrequentitemsetsofsizekareknown.
TreeProjection[2]isanalgorithmthatphysicallyusesalexicographictreeto
representtheitemsetsearchspacetoconductthebottom-upsearchofthelogical
itemsetlatticebutapartfromthebreadth-firsttraversal,italsoconsiderspure
depth-firsttraversalandthecombinationofboththetraversalstrategies.
Adepth-firsttraversaloftheitemsetsearchspaceisalsoappliedbythe
FP-growth[83]algorithm,whichabandonedthegenerate-and-teststrategyof
discoveringlargerfrequentitemsetsbasedonalreadydiscoveredsmallerones,
typicalforearlyfrequentitemsetminingalgorithms.Instead,FP-growthextends
frequentitemsetsbyfindingfrequentitemsco-occurringwiththem.Sucha
methodologyofgrowingsmallerpatternsintolargeroneswithoutcandidate
generationhasbeenknownaspattern-growth[82].
Ithasbeenobservedthatalgorithmsperformingbreadth-firstsearchhave
problemswithdensedatasets(i.e.,datasetscontainingnumerouslongpatterns).
Ontheotherhand,depth-firstmethods,whichtypicallymaintainsome
sophisticatedcompressedrepresentationofthemineddatainmainmemory,may
notbesuitableforsparsedatasetsthatdonotcompresswell.Analgorithm
OpportuneProject[124]wasproposedasauniversalmethodforsparseand
densedatasets.OpportuneProjectcombinesdepth-firstapproachandbreadth-
firstapproachestoitemsetsearchspacetraversal,andopportunisticallychooses
betweenarray-basedandtree-basedstructurestorepresentprojectedtransaction
subsets.
Top-downsearchoftheitemsetlatticeisfarlesscommonthanbottom-up
search.Theproblemwithtop-downsearchisthatfindingthatanitemsetis
infrequentdoesnotsayanythingaboutitssubsets.Moreover,determiningthat
anitemsetisfrequent,thoughguaranteesthatitssubsetshavetobefrequenttoo,
doesnotprovidetheexactsupportvaluesofthesesubsets.Hence,top-down