Application of Classification and Regression Trees:
Robert E. Davis*
Describing avalanche activity, the dependent variable, by a meaningful metric that is physically justified and statistically unique represents one of the fundamental problems of statistical or deterministic studies. Many attempts throughout the world have partitioned avalanche response into a variety of genetic and morphologic classifications. Approaches using different dependent variables run the risk of confounding comparison of forecasting methods. Definitions of avalanche activity or response range from individual path observations and descriptions (Judson and King, 198:S) to hazard levels based on frequency of events (Elder and Armstrong, 1986) to binary outcome of avalanche-day versus non-avalanche-day, (Bois et. al., 1974) to the sum of avalanche sizes on a given day (McClung and Tweedy, 1993).
Correct identification and quantification of independent variables leading to avalanche release potentially present a more difficult problem. Unfortunately, data availability often represents the most severe constraint and scientists are forced to make do with data that has already been collected. While these data are necessary because long-term databases are critical to all non deterministic forecasting techniques, they have usually been collected for another purpose (weather forecasting for cities, agriculture, etc.). Collection sites are often located in valley bottoms, urban areas, and at low elevations, which make it difficult to extrapolate to conditions in avalanche starting zones.
Perla ( 197()) revisited Atwater's ( 1954) ten contributory factors for avalanche hazard evaluation and found precipitation and wind direction to be the most important parameters. Fohn et al., ( 1977) compared conventional forecasting techniques with four statistical methods ranging from principal components analysis (PCA) and discriminate analysis of local and regional data to cluster analysis of local data. They found that all the methods produce about the same results at 70 to 80 percent accuracy, with some slightly better than others. Each method had distinct advantages and disadvantages.
The nearest-neighbor method has been applied in a number of climates with a variety of input variables. Buser ( 1989) gave results from a nearest-neighbor forecasting program introduced by Buser (1983) and used operationally in Switzerland by the ski patrol in the Parsenn area. The program identified the ten days in the record with the most similar conditions to the day in question. Similarity is based on the proximity of weighted meteorological and snowpack variables in data space. The program also creates "elaborated" variables, for example the time trend in a particular meteorological measurement.
Buser et al. (1985) reviewed a broad range of avalanche forecasting methods for short and long time scales and over local and regional spatial scales. Input data collected by conventional field methods and by instruments designed and built for specialized tasks, such as FMCW radar, were discussed for different applications. Forecasting methods from conventional induction to complex statistical models were reviewed.
Although not directly addressing forecasting, Jaccard (199()) used fuzzy factorial analysis to identify important interactions of avalanches related to snowpack, meteorology, terrain and vegetation parameters based on expert opinion. Slope angle and aspect, overall weather conditions and precipitation were found to be the most important factors related to avalanches. Avalanche hazard forecasting has been addressed from a number of different angles and approaches from nearly all of the affected regions of the world. Tables I and II summarize some of the key research on the subject. The lists in Tables I and II are not exhaustive and represent only a portion of the research published in the English language.
There are two types of simple binary decision trees; regression and classification. Regression trees are appropriate where the dependent variable is a ratio scale data type. In other words, if the dependent variable can assume any value over the range of observations, and if the differences are quantitative and consistent, then we want a model that can predict these values and one that is not constrained to particular members. An example is number of avalanches per day. A regression model will predict somewhere between zero and u reasonable maximum number of avalanches for a given day based on the independent variables.
A classification tree is appropriate where the independent variable itself belongs to the data types nominal (named) or ordinal (ordered). Nominal data includes such variables as slope aspect: east, west, etc. Ordinal data exhibits relative, rather than quantitative differences: for example, magnitude 1 through 5 avalanche events. Avalanche magnitudes, like earthquake magnitudes, are expressed on a log scale of magnitude. The difference is that earthquake magnitudes are objectively measured, while avalanche magnitudes are estimated by an observer. Thus a magnitude 4 event is larger than a 2, but not necessarily 102 as large. A regression tree would not make sense in this case because it would predict unsuitable results such as a magnitude 2.76 or 4.89 event.
The type of model chosen, regression or classification. depends in part on the dependent variable type. You cannot apply a regression tree model to classification data. However, you can apply a classification tree model to ratio scale data by generalizing the data into classes. Days with any avalanche activity could be called "avalanche days" and days without activity called "non avalanche days" (as has been done in many previous studies). Then a classification tree model could be used on number of events observed, where the observations have been re-expressed into nominal data, avalanche versus non-avalanche days.
Advantages of tree-based regression and classification models over alternative methods (such as those listed in Table 1) include:
* Gaussian assumptions are not violated by the distribution of one or more independent variables, (tree-based methods are nonparametric or "distribution free"). Trees are valid even using mixed data sets containing multiple distributions. It is not necessary that data be normally distributed or that non-normal data be transformed before analysis.
* Model results are less dependent on missing values in the independent variables (methodology finds "surrogate" values for each decision node). Many statistical models cannot use data sets where one or more attributes for a given observation are missing. Binary trees can use the existing data to statistically predict what the missing elements should be, or to use only the elements that do exist.
* Tree-based models allow complex interactions between the independent variables, which must be specified a priori in standard linear models. For example, snow accumulation may increase up to a critical elevation, then decrease with increasing elevation above that critical point. Standard linear models can only take advantage of that t:act if a mathematical expression for the relationship is formulated and expressed before model implementation.
*Interpretations of complex interactions are clear and often more easily understood than other model constructions. A tree is t:ar more easily interpreted by most people than mathematical expressions or nonlinear equations.
Binary decision trees or predictive tree classifiers of the type used in this study take a vector of measurements x, (xm,m = 1,2,...) of variables from the measurement space X of a result y and calculate the probabilities (P1,P2 ) that y is in each of the possible classes. The tree is constructed by repeated partitioning of subsets of X into two descendent subsets or nodes, where X itself is the root node and the partitions end in a set of terminal nodes. The terminal nodes are assigned a value based on the probabilities that they belong to a given class y. The partition or split at each node is made on the values in y conditionally on values in the sample vector x, based on a single variable in x. For ordinal or ratio scale data, splitting decisions are posed in the form: is xm < c? where c is within the domain of xm . For categorical variables, the decisions may be expressed as: is xm E 5 ?, where S includes all possible combinations of subsets of the categories defined in xm.
In the present study these decisions take the form: is new snow depth < l0 inches or is the snow surface temperature <-4.0 degrees C ? The categorical analog would be similar to: does the azimuth of the starting zone of path xm belong to the subset north ? A portion of the finished binary classification tree may look like the following:
if (SSTi < 65degrees C) and (MAXWSi < 21 .5 mph )
1) method for determining the best split at each node,
We have used both the tree-based model implementation in CART (Breiman et al., 1984) and in the S-PLUS mathematical language, which follows closely the development in Breiman et al. (1984). Both software packages have unique advantages and the user should explore both implementations. Details of the S-PLUS software are explained in Chambers and Hastie (1992). Two applications of tree based models in the natural sciences can be found in Michaelsen et al. (1987 and in press). The output of the software packages includes a ranking of the independent variables in order of importance as primary decision makers, or as surrogates for other independent variables, as well as the decision tree. This is the focus of our discussion.
Table III. Input data used in CART analysis from daily data record at Mammoth Mountain.
Control activities and avalanche observations were recorded at Mammoth Mountain in a format consistent with the standard U.S. Forest Service avalanche control and occurrence chart. This protocol consists of codes for the date, time, path, patroller identification, control type, control number, control surface, avalanche class type (hard slab, soft slab, etc.), avalanche trigger mechanism, avalanche size, and so forth (Perla and Martinelli, 1978). It should be noted that the avalanche size class is somewhat subjective when comparing the data from different areas, but consistent within this study area.
Avalanche observations were aggregated h~to three response variables, the total number of avalanche releases on a given day, and the maximum size class. Our premise for specifying these avalanche activity characteristics was that the number of releases may provide an indication of how widespread the avalanche hazard (i.e. spatial dispersion), the sum of the sizes may indicate the overall intensity of the activity, and that the maximum size may provide an index of the local h~tensity of the hazard. Therefore, a regression tree method was used to evaluate the data with the total number of releases and the sum of the sizes as the response variable; and a classification tree method was used to evaluate the data with the maximum size class on a given day as the response variable.
The overall probability of a case falling into the correct terminal node for the regression trees depended on the avalanche activity variable; the total number of releases (range 0 - 41 ) was 0.68, and the sum of the sizes (range 0 - 69) was 0.71. The overall probability of correct classification for the classification tree (maximum size class with a range 0 - 5) was 0.90. The classification matrices showed some details in how various values of result were predicted. In Table V the entire classification matrix is shown for the outcome of the maximum size class.
Table V. Classification matrix: classification tree constructed using maximum size for avalanche response variable.
Decision tree analysis may not be able to accurately predict details of avalanche activity in terms of numbers or size of releases with only inputs of observations from the current day. This is clearly the experience of the Swiss, reported in many classic works. Much more effort is needed to condition the data sets and specify elaborated variables (e.g. Buser, 1989). Other t:actors also may come into play because we are dealing primarily with artificially released avalanches.
* Cases in these data involve avalanche paths that are repeatedly shot during an avalanche cycle. Therefore the probability of deep slab release is likely to decrease over time.
* There were situations where conditions were ripe for release, but control operations were delayed until the weather improved. This may explain the cases where the prediction was for no avalanche, but releases were observed (top row in Table V).
In order to test this technique effectively and objectively, we need to study other data sets from areas with longer records, which will allow model construction and validation either through unique elements or cross validation. We would also like to test the method in different snow climates to assess model performance and objectively confirm the existence of different snow climates and avalanche response. Both studies are in progress at this time. However, it will be tricky to compare avalanche records where releases from one area are natural or skier triggered, and releases in another area are explosively triggered.
Atwater, M.M., 1954, "Snow avalanches," Scientific American, vol. 190(1), pp. 26-31.
Bois, P., C. Obled and W. Good, 1974, "Multivariate data analysis as a tool for day-by-day avalanche forecast", Proceedings of the Snow Mechanics Symposium, IAHS Pub. No. 114, Grindelwald. pp. 391-403.
Bovis, M.J., 1977, "Statistical forecasting of snow avalanches, San Juan Mountains, southern Colorado, U.S.A.," Journal of Glaciology, Vol.18(78), pp. 87-99.
Breiman, L., J. H. Friedman, R. A. Olshen, and C. J. Stone, 1984, Classification and Regression Trees, Wadsworth & Brooks/Cole Advanced Books & Software, Pacific Grove, CA.
Buser, O., 1983, "Avalanche forecast with the method of nearest neighbors: An interactive approach." Cold Regions Science and Technology, Vol. 8(2), pp. 155- 163.
Buser, O., P. Foehn, W. Good, H. Gubler and B. Salm, 1985, "Different methods for the assessment of avalanche danger", Cold Regions Science and Technology, vol. 8, no. 2, pp. 155-163.
Buser, O., 1989, "Two years experience of operational avalanche forecasting using the nearest neighbors method," Annals of Glaciology: Vol. 13, pp. 31 -34.
Chambers, J. and T. Hastie (eds.), 1992, Statistical Models in S, Wadsworth and Brooks, Pacific Grove, CA.
Davis, R.E., K. Elder and E. Bouzglou, 1992, Applications of classification tree methodology to avalanche data management and forecasting, Proceedings of the ISSW '92, pp. 126- 133.
Elder, K. and R. Armstrong, 1986, "A quantitative approach for verifying avalanche hazard ratings", Avalanche Formation, Movement and Effects, IAHS pub. No. 162, B. Salm and H. Gubler eds., International Association of Hydrological Sciences, Wallingford, UK, pp. 593-601.
Fohn, P.W. Good, P. Bois, and C. Obled, 1977, "Evaluation and comparison of statistical and conventional methods of forecasting avalanche hazard," Journal of Glaciology, Vol. 19(81), pp 375-387.
Jaccard, C., 199(), "Fuzzy tactorial analysis of snow avalanches," Natural Hazards, Vol. 3, pp 329-3401.
Judson, A., and B. J. Erickson, 1973, "Predicting avalanche intensity from weather data: a statistical analysis," Research Paper RM- 112, U. S. Forest Service, Fort Collins, CO.
Judson, A., 1983, "On the potential use of index paths for avalanche assessment," Journal of Glaciology, Vol. 29(101), pp. 178- 184.
Judson, A., and R. King, 1985, "An index of regional snowpack stability based on natural slab avalanches,'' Journal of Glaciology, vol. 31, pp. 67-73.
Kennedy, J. L., 1984, "Avalanche litigation: Technology and liability," Proceedings of the ISSW '84, pp. 99- 101.
LaChapelle, E., 1965, "Avalanche forecasting - A modern synthesis," International Symposium on Scientific Aspects of Snow and Ice Avalanches, IAHS Pub. No. 69, pp. 350-356.
McClung, D. M., and J. Tweedy, 1993, "Characteristics of avalanching: Kootenay Pass, British Columbia," Journal of Glaciology, vol. 39, no. 132, pp 316-322.
Michaelsen, J., F. Davis, M. Borchert, 1987 "A non-parametric method for analyzing hierarchical relationships in ecological data", Coenoses, vol. 2, no. 1, pp. 39-48.
Michaelsen, J., D. Schimel, M. Friedl, F. Davis and R. Dubayah, in press, "Regression tree analysis of satellite and terrain data to guide vegetation sampling and surveys", Journal of Vegetation Science.
Obled, C. and W. Good, 198(). "Recent developments of avalanche forecasting by discriminant analysis techniques: A methodological review and some applications to the Parsenn area (Davos, Switzerland)," Journal of Glaciology, Vol. 25(92), pp. 313-346.
Penniman, D., 1986, "The Alpine Meadows avalanche trial: Conflicting viewpoints of the expert witnesses," Avalanche Formation, Movement and Effects, IAHS pub. No. 162, B. Sahn and H. Gubler eds., International Association of Hydrological Sciences, Walibirtord, UK, pp. 665-677.
Perla, R., 197(), "On contributory factors in avalanche hazard evaluation", Canadian Geotechnical Journal, vol. 7, no.14, pp 414-419.
Perla, R. and M. Martinelli, 1978, "Avalanche Handbook" Agriculture Handbook 489, rev. ed., USDA Forest Service, Washington, D.C..
Sethi, I., 1990, "Entropy nets: from decision trees to neural networks,"Proceedings of the IEEE, Vol. 78(10), pp. 1605-1613.