Measuring Group Separability in Geometrical Space for Evaluation of Pattern Recognition and Dimension Reduction Algorithms

Research output: Contribution to journalResearch articleContributedpeer-review

Abstract

Evaluating group separability is fundamental to pattern recognition. A plethora of dimension reduction (DR) algorithms has been developed to reveal the emergence of geometrical patterns in a low-dimensional space, where high-dimensional sample similarities are approximated by geometrical distances. However, statistical measures to evaluate the group separability attained by DR representations are missing. Traditional cluster validity indices (CVIs) might be applied in this context, but they present multiple limitations because they are not specifically tailored for DR. Here, we introduce a new rationale called projection separability (PS), which provides a methodology expressly designed to assess the group separability of data samples in a DR geometrical space. Using this rationale, we implemented a new class of indices named projection separability indices (PSIs) based on four statistical measures: Mann-Whitney U-test p-value, Area Under the ROC-Curve, Area Under the Precision-Recall Curve, and Matthews Correlation Coefficient. The PSIs were compared to six representative cluster validity indices and one geometrical separability index using seven nonlinear datasets and six different DR algorithms. The results provide evidence that the implemented statistical-based measures designed on the basis of the PS rationale are more accurate than the other indices and can be adopted not only for evaluating and comparing group separability of DR results but also for fine-tuning DR algorithms’ hyperparameters. Finally, we introduce a second methodological innovation termed trustworthiness, a statistical evaluation that accounts for separability uncertainty and associates to the measure of each index a p-value that expresses the significance level in comparison to a null model.

Details

Original languageEnglish
Article number9716930
Pages (from-to)22441-22471
Number of pages31
JournalIEEE access
Volume10
Issue number10
Publication statusPublished - 1 Jan 2022
Peer-reviewedYes

External IDs

Scopus 85125333434
dblp journals/access/AcevedoDKCSC22
Mendeley 2c99ac38-a157-3e56-96bf-e6ef195f1608
ORCID /0000-0003-2848-6949/work/141543409

Keywords

Research priority areas of TU Dresden

Keywords

  • Indexes, Dimensionality reduction, Clustering algorithms, Biomedical measurement, Pattern recognition, Shape, Principal component analysis, data embedding, cluster validity indices, group separability, dimension reduction

Library keywords