Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
Abstract
Thisdissertationisdevotedtofrequentitemsetminingregardedasadvanced
databasequeryingwhereusersspecifythesourcedataset,theminimum
frequencythreshold,andoptionallypatternconstraintsnarrowingtheresults,and
itisuptothedataminingsystemtoexecutetheminingtaskasefficientlyas
possible.Buildinguponexistingsolutionsoptimizingtheexecutionofindividual
queriesorsequencesofqueries,webringfrequentitemsetqueryoptimizationto
anotherlevelandconsidertheproblemofefficientprocessingofsetsoffrequent
itemsetqueries,analogoustomulti-queryoptimizationindatabasesystems.Our
solutionstargetmainlybatchprocessingmodebutcanbeappliedtomulti-user
interactiveenvironmentsaswell.
Inthisdissertationweformulatetheproblemofprocessingsetsoffrequent
itemsetqueriesinthecontextofasimple,generalmodeloffrequentitemset
queriesindependentofparticularlanguagesandinterfaces,andprovideseveral
solutionsaddressingtheproblem.Themajorityofthedevelopedtechniquesare
definedintermsofadatasharingmodelbasedontheconceptofelementarydata
selectionpredicateswhichrepresentpartsofthedatasetsharedamongthe
queries.Thedevelopedmethodsofprocessingsetsoffrequentitemsetqueries
canbebroadlyclassifiedintotwocategories:methodsindependentofa
particularfrequentitemsetminingalgorithm,andtheonesdesignedwitha
specificalgorithminmind.Theexplicitlyaddressedfrequentitemsetmining
algorithmsare:Apriori,FP-growth,andPartition,whichweclaimbelongtothe
mostinfluentialones,andinadditionareimportantfromthepointofviewof
possiblepracticalapplications.Alltheproposedtechniquesareinitially
formulatedandexperimentallyverifiedundertheassumptionthatdatapartitions
correspondingtoelementarydataselectionpredicatescanbeselectively
retrievedfromthedatabase.Afterwards,theoreticalandexperimentalanalysisof
theinfluenceofavailableaccesspathstodataontheproposedtechniquesis
conducted.
Animportantcontributionofthedissertationisrelatedtotheidentified
optimizationproblemoccurringinoneofthetechniquesfortheApriori
algorithm.Theproblemconcernshandlinglargebatchesofqueriesbydividing
thesetofqueriesintosubsetsexecutedindependently.Fortheproblem
formulatedasaparticularcaseofhypergraphpartitioning,itsNP-hardnessis
provedandseveralheuristicsolutionsareprovided.