Treść książki
Przejdź do opcji czytnikaPrzejdź do nawigacjiPrzejdź do informacjiPrzejdź do stopki
2030Statisticalanalysis
17
etal0(2008)0ItisworthmentioningherethatGiniimpuritymeasureshow
oftenarandomlychosenelementfromasetwouldbeincorrectlylabelledif
itwererandomlylabelledaccordingtothedistributionoflabelsinthesubset0
Theadvantageofthismethodliesinthefactthatitisinsensitivetoabnormal
observationsandcertaindeficienciesindatabases,especiallythelargeones0It
shouldbeemphasisedthattheGinialgorithmisolatessamplesforthelargest
classfromtherestofthedata,andthatitinfluencesatfeaturesofstrongly
differentiatedvalues0
Analysissettingswereasfollows:selectingsplits(impuritymeasure)–the
Giniindex;theaprioriprobabilities–thetreeiscomputedfromthelearning
sample;equalmisclassificationcosts;stoppingrule–pruneonmisclassifica-
tionerror;stoppingparameters:minimumnofcasesn=400;maximumn
ofnodesn=1000;v-foldcross-validationv=100Theimportanceofenviron-
mentalfactorsisdeterminedbythevaliditycoefficient0