Abstract
This paper consists of two main parts. In Part I, the problem of selecting variables for regression is addressed in the context of a model for which a well-defined best subset exists. Under this model, various techniques for finding regression subsets are tested to see how well they perform. Also investigated are means of determining whether one has regressed on the correct number of variables. In Part II, "Classification by Thresholding," it is shown that if one wishes to classify an observation x into one of m populations by the rule of maximum likelihood, that it is not always necessary to evaluate all probability density functions to determine which is maximized at x . The reason is that there exist numbers y (subscript ij) such that f (subscript i)(x) > y (subscript ij) -> f(subscript i)(x) > f(subscript j)(x) where f (subscript i) and f(subscript j) are the density functions for the ith and jth populations. A method of computing the Y (subscript ij) given which is very efficient for the multivariable normal case. In addition, bounds for the probability of misclassification and the expected number of densities evaluated are derived. Finally an example of classification by thresholding is given.
Feiveson, Alan Harold (1973). Selecting variables for regression and classification by thresholding. Texas A&M University. Texas A&M University. Libraries. Available electronically from
https : / /hdl .handle .net /1969 .1 /DISSERTATIONS -441431.