Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
1.3.Corpuscomposition,annotation,size...
1.3.Corpuscomposition,annotation,size
andrepresentativeness
17
Sinclair(2005)discussessomeinstructionsthatshouldbefollowedinthe
compositionofacorpusandinthecompilationoflanguagesamples.Below
arethetenprinciplesthatheconsidersasfundamental:
4.Thecontentsofacorpusshouldbeselectedwithoutregardforthelanguage
itcontains,butaccordingtoitscommunicativefunctioninthecommunity
inwhichtheyarise.
5.Corpuscompilersshouldstrivetomaketheircorpusasrepresentativeas
possibleofthelanguagefromwhichitischosen.
6.Onlythosecomponentsofcorporawhichhavebeendesignedtobe
independentlycontrastiveshouldbecontrasted.
7.Criteriafordeterminingthestructureofacorpusshouldbesmallinnumber,
clearlyseparatefromoneanother,andefficientasagroupindelineating
acorpusthatisrepresentativeofthelanguageorvarietyunderexamination.
8.Anyinformationaboutatextotherthanthealphanumericstringofitswords
andpunctuationshouldbestoredseparatelyfromtheplaintextandmerged
whenrequiredinapplications.
9.Samplesoflanguageforacorpusshouldwhereverpossibleconsistof
entiredocumentsortranscriptionsofcompletespeechevents,orshould
getasclosetothistargetaspossible.Thismeansthatsampleswilldiffer
substantiallyinsize.
10.Thedesignandcompositionofacorpusshouldbedocumentedfully
withinformationaboutthecontentsandargumentsinjustificationofthe
decisionstaken.
11.Acorpuscompilershouldretain,astargetnotions,representativenessand
balance.Whilethesearenotpreciselydefinableandattainablegoals,they
mustbeusedtoguidethedesignofacorpusandtheselectionofits
components.
12.Anycontrolofsubjectmatterinacorpusshouldbeimposedbytheuseof
external,andnotinternal,criteria.
13.Acorpusshouldaimforhomogeneityinitscomponentswhilemaintaining
adequatecoverage,androguetextsshouldbeavoided.
Asfarasannotationisconcerned,McEnryetal.(2006:33)saythat“corpus
annotationcanbeachievedfullyautomatically,byasemi-automaticinteraction
betweenhumanbeingandthemachine,orentirelymanuallybyhumananalysts.”
Theyalsopointout(McEnryetal.2006:33)thattheannotationofacorpus
mayhavemanyformsanditcanbeundertakenatdifferentlevels: