Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
22
PiotrGawrysiak
4.TAMINGDATAABUNDANCE
Teaboveobservationisimportantwhenweconsiderthecurrentinforma-
tiondigitizationeńortsinhumanities.Onemightriskastatementthatcurrent
eńortisorientedmostlytowardscreatingdigitalarchivesofalmostalltraditio-
naldocumentsinformation,motivatedbythegoalofpreservingtheknowledge
assetsandprovidingunlimitedaccesstothemfromtheweb.Teseeńortsare
undoubtedlysuccessfulexamplesincluderapidgrowthofbothnationaland
institutionalarchives,suchasEuropeanaornationallibraries,e.g.inPoland.
Tebyproductofthistrendishowevercreationoftheso-calleddigitaldatala-
kesthatareeasilyamenabletocomputerizedanalysis,asopposedtotraditional
printedsources.Certainly,performingsuchanalysismightnotbeasimpletask,
howeverevenverycrudetools,whenappliedtolargequantitiesofdata,might
yieldinterestingandusefulresults,impossibletoobtaininhumanitiesviaany
other,traditionalmeans.AgoodexamplemightbeaGoogleN-GramViewer1,
whichisasimpletoolallowingqueryingthedatabaseofGoogleBooksprojects
foroccurrencesofwordsandwordsequences.Usingthistool,onemightana-
lyzethepopularityofagivenconceptinliteratureoverperiodspanningseveral
centuries,oratleastinbooksdigitizedwithinGoogleBooksproject.TeFigu-
re1presentsthedistributionofoccurrencesoftermsXIXcentury,XXcentury
andXXIcentury.ItclearlyshowsunusualnumberofreferencestoXXIcentury
thatstartedappearingevenbeforeyear2000aclearindicationthatthereis
somethingspecialaboutthecurrenttimes,asperceivedbyliteraryauthors.
Fig.1.AsimpleexampleofGoogleN-GramViewerusage
1http://books.google.com/ngrams