Treść książki

Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
2030Statisticalanalysis
17
etal0(2008)0ItisworthmentioningherethatGiniimpuritymeasureshow
oftenarandomlychosenelementfromasetwouldbeincorrectlylabelledif
itwererandomlylabelledaccordingtothedistributionoflabelsinthesubset0
Theadvantageofthismethodliesinthefactthatitisinsensitivetoabnormal
observationsandcertaindeficienciesindatabases,especiallythelargeones0It
shouldbeemphasisedthattheGinialgorithmisolatessamplesforthelargest
classfromtherestofthedata,andthatitinfluencesatfeaturesofstrongly
differentiatedvalues0
Analysissettingswereasfollows:selectingsplits(impuritymeasure)the
Giniindex;theaprioriprobabilitiesthetreeiscomputedfromthelearning
sample;equalmisclassificationcosts;stoppingrulepruneonmisclassifica-
tionerror;stoppingparameters:minimumnofcasesn=400;maximumn
ofnodesn=1000;v-foldcross-validationv=100Theimportanceofenviron-
mentalfactorsisdeterminedbythevaliditycoefficient0