Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
1.2.Definitionofcorpus
15
Corpusdataprovidefrequencyofoccurrenceoflinguisticitems.
Corpusdatagiveessentialinformationforanumberofappliedareas,like
languageteachingandlanguagetechnology(machinetranslation,speech
synthesis,etc.)
Theseadvantages,andmanyothernotmentionedhere,ofcorporaover
manualinvestigationarethereasonsforthefactthatcorporaareconstantly
beingdevelopedandthatthereisagrowinginterestincorpuslinguistics,
whichresultedintheconstructionofmultiplediachronic(historical)andnon-
diachroniccorporafortheanalysisofvariouslanguagesoftheworld.
1.2.Definitionofcorpus
Inthepast,asLindquist(2009)pointsout,thewordcorpus(Lat.4body’)
wasusedtodescribethetotalworkswrittenbyanindividualauthororacertain
massoftexts,asforexample“TheShakespearecorpus.”Theseweretheso-called
pre-electroniccorpora.Nowadays,thetermcorpusisalmostalwaysassociated
withelectroniccorpus,whichisacollectionoftextsstoredonsomekindof
digitalmediumtobeusedbylinguistswiththepurposeofretrievinglinguistic
itemsforresearchorbylexicographersinmakingdictionaries.Accordingto
Renouf(1987),thetermcorpusreferstoacollectionofwrittenorspoken
textswhichisstoredandprocessedoncomputerforthepurposesoflinguistic
research.Sinclair(1991:171)statesthat“acorpusisacollectionofnaturally-
occurringlanguagetexts,chosentocharacterizeastateorvarietyofalanguage.
Inmoderncomputationallinguistics,acorpustypicallycontainsmanymillions
ofwords:thisisbecauseitisrecognizedthatthecreativityofnaturallanguage
leadstosuchimmensevarietyofexpressionthatitisdifficulttoisolatethe
recurrentpatternsthatarethecluestothelexicalstructureofthelanguage.”
Sinclairdistinguishestwotypesofcorpora,namelysamplecorpusandmonitor
corpus.Theformerisafinitecollectionoftexts,oftenchosenwithgreatcare
andstudiedcarefully.Onestablishingasamplecorpus,itcannotbeaddedto
orchangedinanyway.Asforthelatter,itisacontinually-growingoneand
itre-useslanguagetextwhichhasbeenpreparedinmachine-readableformfor
otherpurposes,likefortypesettersofnewspapers,magazines,booksandalso
word-processors;andthespokenlanguagebasicallyforlegalandbureaucratic
reasons.McEneryandWilson(2001:32)alsodistinguishtwokindsofcorpora,
namely,unannotatedandannotated.2Unannotatedcorporaarecharacterisedby
2
CurzanandPalmer(2006)usethetermsunprincipled(ornon-systematic)vs.principled
corporatomeanunannotatedandannotatedcorporarespectively.