Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
17
thatsupportY.Thefrequency(alsocalledrelativesupport)ofanitemsetYinD
isthesupportofYinDdividedbythetotalnumberoftransactionsinD.
Definition2.4
AnitemsetiscalledfrequentinDifitssupportisnolessthanagiven
minimumsupportthreshold.(Alternatively,ifaminimumfrequencythresholdis
provided,anitemsetisfrequentifitsfrequencyisnolessthanagivenminimum
frequencythreshold.)
Problem2.1
GivenadatabaseDandaminimumsupportthresholdminsup(oraminimum
frequencythresholdminfreq),theproblemoffrequentitemsetminingconsistsin
discoveringallfrequentitemsetsinDtogetherwiththeirsupports(or
frequencies).
Intheabovedefinitionswedistinguishbetweensupportandfrequencyofan
itemset,usingtheterminologyfrom[65].Ingeneral,itemsetfrequenciesare
moreinformativeforend-usersthanitemsetsupports,andsimilarlyfrequency
thresholdsaremoreconvenientforthemthansupportthresholds.Ontheother
hand,miningalgorithmsareoftenformulatedfortheminimumsupport
threshold,whichcanbedirectlycomparedtothenumbersofitemsets'
occurrencesinthedatabase.Obviously,minsup=fminfreq*|D|1,soconversion
betweenthetwothresholdsispossible,providedthatthetotalnumberof
transactionsinthedatabaseisknown.Therefore,theconversioncanbedone
afterthefirstscanofthedatabaseperformedbyaminingalgorithm.
Settingaminimumsupportorfrequencythresholdisthedominantmethodof
restrictingthesetofmineditemsetstoonlythemostfrequentones,andthekey
intheformulationofthefrequentitemsetminingproblem.However,settingan
appropriatefrequencythresholdisoftennontrivialandusersmayendupwith
toomanyortoofewdiscoveredfrequentitemsets.Moreover,ifthefrequency
thresholdisnotrestrictiveenoughforagivendataset,theminingprocessmay
takealongtimetofinish.Atemptingalternativecouldbelookingforauser-
specifiednumberofmostfrequentitemsets.Unfortunately,typicallytheitemsets
smallinsizearemorefrequentthatlargerones,whilethelattercarrymore
interestinginformation.Aninterestingproposaltocopewiththeaforementioned
dilemmaisminingtop-nfrequentitemsetsofeachsize[60]withoutthe
frequencyconstraint.However,suchanapproachhasnotbecomeamainstream
researcharea.
Example2.1
LetusconsideradatabaseD={(100,{a,b,c}),(200,{a,c,d}),(300,{a,b,c,d}),
(400,{a,c,d}),(500,{a,b,d})}overasetofitemsI={a,b,c,d}andtheminimum
frequencythresholdminfreq=50%.Theminimumsupportthresholdminsup