Measuring Group Separability in Geometrical Space for Evaluation of Pattern Recognition and Dimension Reduction Algorithms
Research output: Contribution to journal › Research article › Contributed › peer-review
Contributors
Abstract
Evaluating group separability is fundamental to pattern recognition. A plethora of dimension reduction (DR) algorithms has been developed to reveal the emergence of geometrical patterns in a low-dimensional space, where high-dimensional sample similarities are approximated by geometrical distances. However, statistical measures to evaluate the group separability attained by DR representations are missing. Traditional cluster validity indices (CVIs) might be applied in this context, but they present multiple limitations because they are not specifically tailored for DR. Here, we introduce a new rationale called projection separability (PS), which provides a methodology expressly designed to assess the group separability of data samples in a DR geometrical space. Using this rationale, we implemented a new class of indices named projection separability indices (PSIs) based on four statistical measures: Mann-Whitney U-test p-value, Area Under the ROC-Curve, Area Under the Precision-Recall Curve, and Matthews Correlation Coefficient. The PSIs were compared to six representative cluster validity indices and one geometrical separability index using seven nonlinear datasets and six different DR algorithms. The results provide evidence that the implemented statistical-based measures designed on the basis of the PS rationale are more accurate than the other indices and can be adopted not only for evaluating and comparing group separability of DR results but also for fine-tuning DR algorithms’ hyperparameters. Finally, we introduce a second methodological innovation termed trustworthiness, a statistical evaluation that accounts for separability uncertainty and associates to the measure of each index a p-value that expresses the significance level in comparison to a null model.
Details
Original language | English |
---|---|
Article number | 9716930 |
Pages (from-to) | 22441-22471 |
Number of pages | 31 |
Journal | IEEE access |
Volume | 10 |
Issue number | 10 |
Publication status | Published - 1 Jan 2022 |
Peer-reviewed | Yes |
External IDs
Scopus | 85125333434 |
---|---|
dblp | journals/access/AcevedoDKCSC22 |
Mendeley | 2c99ac38-a157-3e56-96bf-e6ef195f1608 |
ORCID | /0000-0003-2848-6949/work/141543409 |
Keywords
Research priority areas of TU Dresden
ASJC Scopus subject areas
Keywords
- Indexes, Dimensionality reduction, Clustering algorithms, Biomedical measurement, Pattern recognition, Shape, Principal component analysis, data embedding, cluster validity indices, group separability, dimension reduction