Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
1.Introduction
1.1.DataMiningfromaDatabasePerspective
Widespreaduseofinformationsystemscoupledwiththeavailabilityof
relativelyinexpensivelargecapacitystoragedevicesandarchitecturesresulted
inhugeamountsofdatacollectedandmaintainedbycompanies,institutions,
andorganizations.Additionally,advancesinautomaticdataacquisition
dramaticallyincreasedtherateatwhichdataarecollectedintoday’sworld.
Barcodereadersintheretailindustry,varioussensorsusedinscienceand
technology,andloggingfunctionalityofwebandapplicationserversare
examplesofrelevantsolutions.Entitiesinpossessionoflargevolumesofdata
collectedduringtheiroperationfacedtheproblemofefficientextractionof
usefulknowledgefromthegathereddata.Databasemanagementsystems
(DBMSs),inparticularthosebasedontherelationalmodeldevelopedsincethe
1970s,providedsolutionstomostfundamentalproblemsrelatedtodata
management,including:reliabledatastorage,fastaccesstodata,consistency
andintegrityofthedata,concurrencycontrol,recoveryafterfailure,andquery
optimization.Unfortunately,earlyDBMSsprovidedlittlesupportforadvanced
dataanalysessuchasdiscoveryoftrendsorpatternshiddeninthedata.
Inlate1980stwoprominentresearchtrendsemergedaspotentialsolutionsto
theproblemofturninghugeamountsofcollecteddataintousefulinformation:
datawarehousinganddatamining.Datawarehousingtechnologymakesit
possibletointegratedatafromdifferentsourcesincludingdatabases,
spreadsheets,textfiles,andlegacysystemsintoaformsuitableforcomplexand
comprehensiveanalysesaimingatdecisionsupport.Thedatastoredinadata
warehousearetypicallypre-aggregatedandorganizedaccordingtoa
multidimensionalmodel(implementednativelyorusinganappropriate
relationalschema).Theprocessingmodelcharacteristicofdatawarehousesis
OLAP(onlineanalyticalprocessing),whichenablesuserstoanalyze
multidimensionaldatainteractivelyfrommultipleperspectivesandatdifferent
levelsofaggregation.CommonOLAPoperations,definedintermsof
multidimensionaldatarepresentation,calledOLAPcube,includesliceanddice,
rollup,drilldown,andpivot.
DatawarehousingandOLAPcanberegardedasthefirststepontheroad
fromsystemscapableofdatacollectionandmanagementonlytotheones
supportingdataanalysisandunderstanding.WhileOLAPisundoubtedly
successfulatbusinessreportingandtrendanalysis,itcertainlydoesnotexhaust
thepossibilitiesofdataanalysis.Oneproblemisthattherearetypesofpatterns
andrelationshipsamongdatathatOLAPisunabletodiscover.Anotherproblem
isthatOLAPheavilyreliesontheexpert’sfamiliaritywiththedataandits
supportfordiscoveryofunexpectedpatternsisoftenquestioned.Bothofthe