miun.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Pavlenko, Tatjana
Publications (10 of 22) Show all publications
Pavlenko, T. & Fridén, H. (2006). Scoring Feature Subsets for Separation power in Supervised Bayes Classification. Advances in Intelligent and Soft Computing, 37, 383-391
Open this publication in new window or tab >>Scoring Feature Subsets for Separation power in Supervised Bayes Classification
2006 (English)In: Advances in Intelligent and Soft Computing, ISSN 1867-5662, E-ISSN 1867-5670, Vol. 37, p. 383-391Article in journal (Refereed) Published
Abstract [en]

We present a method for evaluating the discriminative power of compact feature combinations (blocks) using the distance-based scoring measure, yielding an algorithm for selecting feature blocks that significantly contribute to the outcome variation. To estimate classification performance with subset selection in a high dimensional framework we jointly evaluate both stages of the process: selection of significantly relevant blocks and classification. Classification power and performance properties of the classifier with the proposed subset selection technique has been studied on several simulation models and confirms the benefit of this approach.

Place, publisher, year, edition, pages
Berlin: Springer, 2006
Keywords
multivariate statistics, classification
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-3867 (URN)10.1007/3-540-34777-1_45 (DOI)2-s2.0-58149242746 (Scopus ID)4162 (Local ID)978-3-540-34776-7 (ISBN)4162 (Archive number)4162 (OAI)
Available from: 2008-09-30 Created: 2008-09-30 Last updated: 2017-12-12Bibliographically approved
Pavlenko, T., Hall, M., von Rosen, D. & Andrushchenko, Z. (2004). Towards the optimal feature selection in high-dimensional Bayesian network classifiers. In: LopezDiaz, M; Gil, MA; Grzegorzewski, P; Hryniewicz, O; Lawry, J (Ed.), SOFT METHODOLOGY AND RANDOM INFORMATION SYSTEMS. Paper presented at 2nd International Conference on Soft Methods in Probability and Statistics (SMPS 2004), Sep 02, 2004-Sep 04, 2004 (pp. 613-620). SPRINGER-VERLAG BERLIN
Open this publication in new window or tab >>Towards the optimal feature selection in high-dimensional Bayesian network classifiers
2004 (English)In: SOFT METHODOLOGY AND RANDOM INFORMATION SYSTEMS / [ed] LopezDiaz, M; Gil, MA; Grzegorzewski, P; Hryniewicz, O; Lawry, J, SPRINGER-VERLAG BERLIN , 2004, p. 613-620Conference paper, Published paper (Refereed)
Abstract [en]

We focus on Bayesian network (BN) classifiers and formalize the feature selection from a perspective of improving classification accuracy. To exploring the effect of high-dimensionality we apply the growing dimension asymptotics. We modify the weighted BN by introducing inclusion-exclusion factors which eliminate the features whose separation score do not exceed a given threshold. We establish the asymptotic optimal threshold and demonstrate that the proposed selection technique carries improvements over classification accuracy.

Place, publisher, year, edition, pages
SPRINGER-VERLAG BERLIN, 2004
Series
ADVANCES IN SOFT COMPUTING, ISSN 1615-3871
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-13492 (URN)000224212800076 ()3-540-22264-2 (ISBN)
Conference
2nd International Conference on Soft Methods in Probability and Statistics (SMPS 2004), Sep 02, 2004-Sep 04, 2004
Available from: 2011-04-08 Created: 2011-04-08 Last updated: 2011-04-08Bibliographically approved
Pavlenko, T., Hall, M. & Rosen, D. v. (2004). Towards the optimal feature selection in high-dimensional bayesian network classifiers. Umeå: SLU, Centre of Biostochastics
Open this publication in new window or tab >>Towards the optimal feature selection in high-dimensional bayesian network classifiers
2004 (English)Report (Other academic)
Abstract [en]

Incorporating subset selection into a classification method often carries a number of advantages, especially when operating in the domain of high-dimensional features. In this paper, we focus on Bayesian network (BN) classifiers and formalize the feature selection from a perspective of improving classification accuracy. To exploring the effect of high-dimensionality we apply the

growing dimension asymptotics, meaning that the number of training examples is relatively small compared to the number of feature nodes. In order to ascertain which set of features is indeed relevant for a classification task, we introduce a distance-based scoring measure

reflecting how well the set separates different classes. This score is then employed to feature selection, using the weighted form of BN classifier. The idea is to view weights as inclusion-exclusion factors which eliminates the sets of features whose separation score do not exceed a given threshold. We establish the asymptotic optimal threshold and demonstrate that the proposed selection technique carries improvements over classification accuracy for different a priori assumptions concerning the separation strength.

Place, publisher, year, edition, pages
Umeå: SLU, Centre of Biostochastics, 2004. p. 14
Series
Research report / Centre of Biostochastics,, ISSN 1651-8543 ; 2004 : 1
Keywords
Bayesian network, augmenting, separation strength, growing dimension
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-5653 (URN)1677 (Local ID)1677 (Archive number)1677 (OAI)
Available from: 2008-09-30 Created: 2008-09-30 Last updated: 2011-04-12Bibliographically approved
Pavlenko, T. (2003). Feature informativeness in high-dimensional discriminant analysis. Communications in Statistics: Theory and Methods, 32(2), 459-474
Open this publication in new window or tab >>Feature informativeness in high-dimensional discriminant analysis
2003 (English)In: Communications in Statistics: Theory and Methods, ISSN 0361-0926, Vol. 32, no 2, p. 459-474Article in journal (Refereed) Published
Abstract [en]

A concept of feature informativeness was introduced as a way of measuring the discriminating power of a set of features. A question of interest is how this property of features affects the discrimination performance. The effect is assessed by means of a weighted discriminant function, which distributes weights among features according to their informativeness. The asymptotic normality of the weighted discriminant function is proven and the limiting expressions for the errors are obtained in the growing dimension asymptotic framework, i.e., when the number of features is proportional to the sample size. This makes it possible to establish the optimal in a sense of minimum error probability type of weighting.

Keywords
limiting error probability
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-2268 (URN)10.1081/STA-120018195 (DOI)000181233900010 ()2-s2.0-0037292810 (Scopus ID)1494 (Local ID)1494 (Archive number)1494 (OAI)
Available from: 2008-09-30 Created: 2008-09-30 Last updated: 2016-10-21Bibliographically approved
Pavlenko, T. (2003). On feature selection, curse-of-dimensionality and error probability in discriminant analysis. Journal of statistical planning and inference, 115(2), 565-584
Open this publication in new window or tab >>On feature selection, curse-of-dimensionality and error probability in discriminant analysis
2003 (English)In: Journal of statistical planning and inference, ISSN 0378-3758, Vol. 115, no 2, p. 565-584Article in journal (Refereed) Published
Abstract [en]

Discrimination performance, measured by the limiting error probability, is considered from the point of view of feature discriminating power. For assessing the latter, a concept of feature informativeness is introduced. A threshold feature selection technique is considered. Selection is incorporated into the discriminant function by means of an inclusion-exclusion factor which eliminates the sets of features whose informativeness do not exceed a given threshold. An issue is how this selection procedure affects the error rate when sample based estimates are used in the discriminant function. This effect is evaluated in a growing dimension asymptotic framework. In particular, the increase of the moments of the discriminant function induced by the curse-of-dimensionality is shown together with the effect of the threshold-based feature selection. The asymptotic normality of the discriminant function, which makes it possible to express the overall error probability in a closed form and view it as a function of a given threshold of selection.

Keywords
Growing dimension asymptotics
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-2267 (URN)10.1016/S0378-3758(02)00166-0 (DOI)000183378000015 ()2-s2.0-0037902234 (Scopus ID)1493 (Local ID)1493 (Archive number)1493 (OAI)
Available from: 2008-09-30 Created: 2008-09-30 Last updated: 2016-10-24Bibliographically approved
Pavlenko, T. & von Rosen, D. (2003). On the optimal weighting of high-dimensional Bayesian networks. Umeå: Swedish Univ. of Agricultural Sciences
Open this publication in new window or tab >>On the optimal weighting of high-dimensional Bayesian networks
2003 (English)Report (Other academic)
Place, publisher, year, edition, pages
Umeå: Swedish Univ. of Agricultural Sciences, 2003
Series
Research report / Centre of Biostochastics, ISSN 1651-8543 ; 2003:6
Keywords
Bayesian network, separation score
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-5626 (URN)1495 (Local ID)1495 (Archive number)1495 (OAI)
Available from: 2008-09-30 Created: 2008-09-30 Last updated: 2011-04-12Bibliographically approved
Pavlenko, T. (2002). Augmented naive BN classifier in a high-dimensional framework. In: European Meeting of Statisticians, Prague 2002.
Open this publication in new window or tab >>Augmented naive BN classifier in a high-dimensional framework
2002 (English)In: European Meeting of Statisticians, Prague 2002, 2002Conference paper, Published paper (Refereed)
Keywords
Agumenting, Naive BN
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-2398 (URN)1585 (Local ID)1585 (Archive number)1585 (OAI)
Available from: 2008-09-30 Created: 2008-09-30 Last updated: 2011-04-12Bibliographically approved
Pavlenko, T. & Dietrich, v. R. (2002). Bayesian Network Classifiers in a High Dimensional Framework. In: UAI'02: Proceedings of the 18th Conference in Uncertainty in Artificial Intelligence, University of Alberta, Edmonton, Alberta, Canada, August 1-4, 2002. (pp. 397-404). Morgan Kaufmsnn
Open this publication in new window or tab >>Bayesian Network Classifiers in a High Dimensional Framework
2002 (English)In: UAI'02: Proceedings of the 18th Conference in Uncertainty in Artificial Intelligence, University of Alberta, Edmonton, Alberta, Canada, August 1-4, 2002., Morgan Kaufmsnn , 2002, p. 397-404Conference paper, Published paper (Refereed)
Abstract [en]

We present a growing dimension asymptotic formalism. The perspective in this paper is classification theory and we show that it can accommodate probabilistic networks classifiers, including naive Bayes model and its augmented version. When represented as a Bayesian network these classifiers have an important advantage: The corresponding discriminant function turns out to be a specialized case of a generalized additive model, which makes it possible to get closed form expressions for the asymptotic misclassification probabilities used here as a measure of classification accuracy. Moreover, in this paper we propose a new quantity for assessing the discriminative power of a set of features which is then used to elaborate the augmented naive Bayes classifier. The result is a weighted form of the augmented naive Bayes that distributes weights among the sets of features according to their discriminative power. We derive the asymptotic distribution of the sample based discriminative power and show that it is seriously overestimated in a high dimensional case. We then apply this result to find the optimal, in a sense of minimum misclassification probability, type of weighting.

Place, publisher, year, edition, pages
Morgan Kaufmsnn, 2002
Keywords
BN classifier
National Category
Other Mechanical Engineering
Identifiers
urn:nbn:se:miun:diva-2394 (URN)1579 (Local ID)1-55860-897-4 (ISBN)1579 (Archive number)1579 (OAI)
Available from: 2008-09-30 Created: 2008-12-16 Last updated: 2011-04-19Bibliographically approved
Pavlenko, T. (2001). Asymptotic error rates in the discriminant analysis using feature selection. In: 23 European Meeting of Statisticians: Contributed Papers II (pp. 307-308). Lisboa: Instituto Nacional de Estatística
Open this publication in new window or tab >>Asymptotic error rates in the discriminant analysis using feature selection
2001 (English)In: 23 European Meeting of Statisticians: Contributed Papers II, Lisboa: Instituto Nacional de Estatística , 2001, p. 307-308Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Lisboa: Instituto Nacional de Estatística, 2001
Series
Revstat / Instituto Nacional de Estatística : statistical journal, ISSN 0873-4275 ; 2
Keywords
Feature selection
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-5642 (URN)1586 (Local ID)1586 (Archive number)1586 (OAI)
Available from: 2008-09-30 Created: 2008-09-30 Last updated: 2011-04-12Bibliographically approved
Pavlenko, T. & von Rosen, D. (2001). Effect of dimensionality on discrimination. Statistics, 35(3), 191-213
Open this publication in new window or tab >>Effect of dimensionality on discrimination
2001 (English)In: Statistics, ISSN 0233-1888, Vol. 35, no 3, p. 191-213Article in journal (Refereed) Published
Abstract [en]

Discrimination problems in a high-dimensional setting are considered. New results are concerned with the role of dimensionality in the performance of the discrimination procedure. Assuming that data consist of a block structurc two different asymptotic approaches are presented. These approaches are characterized by different types of relations between the dimensionality and the size of the training samples.par Asymptotic expressions for the error probabilities are obtained and a consistent approximation of the discriminant function is proposed. Throughout the paper the importance of the dimensionality in the asymptotic analysis is stressed.

Keywords
misclassification error
National Category
Mathematics
Identifiers
urn:nbn:se:miun:diva-2351 (URN)10.1080/02331880108802731 (DOI)000170524900001 ()2-s2.0-0038779053 (Scopus ID)1577 (Local ID)1577 (Archive number)1577 (OAI)
Available from: 2008-09-30 Created: 2008-09-30 Last updated: 2016-10-27Bibliographically approved
Organisations

Search in DiVA

Show all publications