Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
16
Chapter1Stateoftheart
beingintheirexistingrawstatesofplaintext,whereasannotatedcorporaare
supplementedwithvarioustypesoflinguisticinformationandtheyareavery
usefultoolforalarge-scaleanalysisofdifferentaspectsoflanguage.Some
ofthemostcommontypesofcorpusannotationaretextualmark-up,part-
of-speech(POS)tagging,syntacticannotation(parsing),semanticannotation,
prosodicannotation,pragmaticannotation,discourseannotation,phonetic
annotationandstylisticannotation(Leech2004).Althoughcorpuslinguisticsis
arelativelyyoungfieldofstudyandthemethodologiesappliedintheprocess
oftextannotationvaryandonecannotspeakofanyuniformanduniversal
wayofannotationoftextsforelectronicanalyses,Leech(2004)acknowledges
thatmorerecentlytherehasbeenafar-reachingtrendtostandardisethe
representationofallphenomenaofacorpus,includingannotations,bymeansof
astandardmark-uplanguage1usuallyoneoftheseriesofrelatedlanguages:
SGML,HTML,andXML.Oneoftheadvantagesofusingtheselanguagesfor
encodingfeaturesinatextisthattheyallowtheinterchangeofdocuments,
includingcorpora,betweenoneuserorresearchsiteandanother.Inthissense,
Leechcomments,SGML/HTML/XMLhavedevelopedintoaworld-wide
standardwhichcanbeappliedtoanylanguage,bothspokenandwritten,as
wellastolanguagesofdifferenthistoricalperiods.Finally,Nesselhauf(2011)
distinguishesthefollowingkindsofcorpora:general/referencecorporawhich
aimatrepresentingalanguageoralanguagevarietyasawholeandthey
containbothspokenandwrittenlanguage(e.g.theBritishNationalCorpusor
theBankofEnglish),historicalcorpora(vs.corporaofpresent-daylanguage)
whichaimatrepresentinganearlierstageorearlierstagesofalanguage(e.g.
theHelsinkiCorpusortheARCHER),regionalcorporawhichaimatrepresent-
ingoneregionalvarietyofalanguage(e.g.theWellingtonCorpusofWritten
NewZealandEnglish),learnercorpora(vs.nativespeakercorpora)whichaim
atrepresentingthelanguageasproducedbylearnersofthislanguage(e.g.the
InternationalCorpusofLearnerEnglish),multilingualcorpora(vs.one-language
corpora)whichaimatrepresentingseveral,atleasttwo,differentlanguages,
oftenwiththesametexttypestoenablecontrastiveanalysis(e.g.thePROIEL
Corpus,aparallelcorpusofNewTestamenttextsfromdifferentlanguages,like
Greek,Latin,Gothic,OldChurchSlavonicandClassicalArmenian),andspoken
corpora(vs.writtencorpora)whichaimatrepresentingspokenlanguage(e.g.
theLondon-LundCorpusofSpokenEnglish).