Processing sets of frequent itemset queries

Marek Wojciechowski

Uzyskaj dostęp

Zakup książkę

ISBN/ISSN:

978-83-7775-265-4

DOI:

Wydawnictwo:

Wydawnictwo Politechniki Poznańskiej

Rok wydania:

2013

Liczba stron:

241

XML:ISBN/ISSN:

ONIX MARC21

Bibliografia:

Wojciechowski, Marek. Processing sets of frequent itemset queries. Red. . Poznań: Wydawnictwo Politechniki Poznańskiej, 2013, 241 s. ISBN 978-83-7775-265-4

Bibliografia refWorks:

RT Book, Whole

SR Electronic(1)

A1 Wojciechowski, M.

T1 Processing sets of frequent itemset queries

PB Wydawnictwo Politechniki Poznańskiej

PP Poznań

YR 2013

SN 978-83-7775-265-4

Bibliografia BibTex:

@Book{ 135370, author = "Wojciechowski, Marek", editor = "", title = "Processing sets of frequent itemset queries", publisher = "Wydawnictwo Politechniki Poznańskiej", year = "2013", address = "Poznań", isbn = "978-83-7775-265-4" }

Bibliografia endNote:

TY - BOOK

AU - Wojciechowski, Marek

ED -

TI - Processing sets of frequent itemset queries

PB - Wydawnictwo Politechniki Poznańskiej

CY - Poznań

PY - 2013

SN - 978-83-7775-265-4

ER -

Netografia (standard APA):

Wojciechowski, Marek. Processing sets of frequent itemset queries [baza danych online] Poznań: Wydawnictwo Politechniki Poznańskiej, 2013 [dostęp: 25 08 2024]. Dostęp w Ibuk Libra: https://libra.ibuk.pl/reader/processing-sets-of-frequent-itemset-queries-marek-wojciechowski-135370.

zbiory częste, zapytania eksploracyjne

This dissertation is devoted to frequent itemset mining regarded as advanced database querying where users specify the source dataset, the minimum frequency threshold, and optionally pattern constraints narrowing the results, and it is up to the d...

Więcej

ISBN/ISSN:

978-83-7775-265-4

DOI:

Wydawnictwo:

Wydawnictwo Politechniki Poznańskiej

Rok wydania:

2013

Liczba stron:

241

XML:ISBN/ISSN:

ONIX MARC21

Abstract7
1. Introduction8
1.1. Data Mining from a Database Perspective8
1.2. Aim and Scope of the Dissertation12
2. Frequent Itemset Mining15
2.1. Overview, Genesis, Applications, and Importance of the Problem15
2.2. Formulation of the Frequent Itemset Mining Problem17
2.3. Computational Complexity of the Problem19
2.4. Overview of Approaches to Frequent Itemset Mining20
2.4.1. Introduction20
2.4.2. Search Space Traversal Strategies21
2.4.3. Database Layout24
2.4.4. Using Memory to Store Mined Data27
2.4.5. Itemset Support Counting28
2.5. Representative Frequent Itemset Mining Algorithms29
2.5.1. Introduction29
2.5.2. Apriori31
2.5.3. FP-growth37
2.5.4. Partition40
2.6. Research Trends in Frequent Itemset Mining42
2.6.1. Introduction42
2.6.2. Taking Advantage of DBMS Functionality in Frequent Itemset Mining43
2.6.3. Sampling for Frequent Itemset Mining45
2.6.4. Concise Representations of Frequent Itemsets47
2.6.5. Parallel and Distributed Frequent Itemset Mining50
2.6.6. Frequent Itemset Mining over Data Streams53
2.6.7. Privacy Preserving Frequent Itemset Mining55
3. Data Mining as Advanced Querying58
3.1. Motivation58
3.2. Prototype Data Mining Query Languages58
3.3. Data Mining Standards61
3.4. Data Mining Queries in Contemporary Database Management Systems68
3.5. Data Mining Queries: Summary of the Current State of the Art and Implications72
4. Frequent Itemset Query Processing74
4.1. Constraint-based Frequent Itemset Mining74
4.2. Reusing Results of Frequent Itemset Queries77
4.3. Reusing Results vs. Pushing Constraints into the Mining Process81
5. Processing Batches of Frequent Itemset Queries83
5.1. Motivation83
5.2. General Model of Frequent Itemset Queries84
5.3. Batches of Frequent Itemset Queries and Problem Formulation86
5.4. Model of Query Data Sharing88
5.5. Related Work91
6. Methods Independent of the Mining Algorithm93
6.1. Sequential Processing with Result Caching and Reusing93
6.2. Result Filtering and Incremental Mining94
6.3. Query Scheduling98
6.4. Query Scheduling with Intermediate Queries101
6.5. Mine Merge107
6.6. Experimental Results112
6.7. Summary and Discussion117
7. Methods for the Apriori Algorithm119
7.1. Common Counting119
7.2. Common Counting with Query Partitioning121
7.2.1. Motivation121
7.2.2. Key Issues122
7.2.3. Query Partitioning as a Case of Hypergraph Partitioning125
7.2.4. Computational Complexity of the Problem129
7.2.5. Algorithm CCRecursive132
7.2.6. Algorithm CCFull134
7.2.7. Algorithm CCCoarsening137
7.2.8. Algorithm CCAgglomerative140
7.2.9. Algorithm CCAgglomerativeNoise141
7.2.10. Algorithm CCGreedy143
7.2.11. Algorithm CCSemiGreedy145
7.3. Common Candidate Tree146
7.4. Experimental Results149
7.4.1. Query Partitioning for Common Counting149
7.4.2. Common Counting vs. Common Candidate Tree160
7.5. Summary and Discussion170
8. Methods for the FP-growth Algorithm172
8.1. Common Building172
8.2. Common FP-tree174
8.3. Experimental Results177
8.4. Summary and Discussion183
9. Methods for the Partition Algorithm187
9.1. Integration of Dataset Scans for Partition187
9.2. Partition Mine Merge Improved188
9.3. Experimental Results192
9.4. Summary and Discussion196
10. Data Access Methods in Processing Sets of Frequent Itemset Queries198
10.1. Comparison of Proposed Techniques in Terms of Data Access Schemes198
10.2. Data Organization and Access Methods in Contemporary DBMSs200
10.3. Techniques of Processing Sets of Frequent Itemset Queries with Full Table Scans203
10.4. Theoretical Cost Analysis205
10.5. Experimental Results208
10.6. Summary and Discussion215
11. Conclusions and Future Work217
Bibliography222
Streszczenie239

Abstract

Thisdissertationisdevotedtofrequentitemsetminingregardedasadvanced

databasequeryingwhereusersspecifythesourcedataset,theminimum

frequencythreshold,andoptionallypatternconstraintsnarrowingtheresults,and

itisuptothedataminingsystemtoexecutetheminingtaskasefficientlyas

possible.Buildinguponexistingsolutionsoptimizingtheexecutionofindividual

queriesorsequencesofqueries,webringfrequentitemsetqueryoptimizationto

anotherlevelandconsidertheproblemofefficientprocessingofsetsoffrequent

itemsetqueries,analogoustomulti-queryoptimizationindatabasesystems.Our

solutionstargetmainlybatchprocessingmodebutcanbeappliedtomulti-user

interactiveenvironmentsaswell.

Inthisdissertationweformulatetheproblemofprocessingsetsoffrequent

itemsetqueriesinthecontextofasimple,generalmodeloffrequentitemset

queriesindependentofparticularlanguagesandinterfaces,andprovideseveral

solutionsaddressingtheproblem.Themajorityofthedevelopedtechniquesare

definedintermsofadatasharingmodelbasedontheconceptofelementarydata

selectionpredicateswhichrepresentpartsofthedatasetsharedamongthe

queries.Thedevelopedmethodsofprocessingsetsoffrequentitemsetqueries

canbebroadlyclassifiedintotwocategories:methodsindependentofa

particularfrequentitemsetminingalgorithm,andtheonesdesignedwitha

specificalgorithminmind.Theexplicitlyaddressedfrequentitemsetmining

algorithmsare:Apriori,FP-growth,andPartition,whichweclaimbelongtothe

mostinfluentialones,andinadditionareimportantfromthepointofviewof

possiblepracticalapplications.Alltheproposedtechniquesareinitially

formulatedandexperimentallyverifiedundertheassumptionthatdatapartitions

correspondingtoelementarydataselectionpredicatescanbeselectively

retrievedfromthedatabase.Afterwards,theoreticalandexperimentalanalysisof

theinfluenceofavailableaccesspathstodataontheproposedtechniquesis

conducted.

Animportantcontributionofthedissertationisrelatedtotheidentified

optimizationproblemoccurringinoneofthetechniquesfortheApriori

algorithm.Theproblemconcernshandlinglargebatchesofqueriesbydividing

thesetofqueriesintosubsetsexecutedindependently.Fortheproblem

formulatedasaparticularcaseofhypergraphpartitioning,itsNP-hardnessis

provedandseveralheuristicsolutionsareprovided.

Brak wyników

Processing sets of frequent itemset queries

Treść książki

Znajdź bibliotekę blisko siebie, i uzyskaj dostęp do ebooka w systemie IBUK Libra