PL EN
Comparative evaluation of persistence diagram vectorisation methods in classification tasks
 
More details
Hide details
1
Department of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland
 
 
Corresponding author
Dominika Sulowska   

Department of Computer Science, Lublin University of Technology, Nadbystrzycka 36B, 20-618 Lublin, Poland
 
 
 
KEYWORDS
TOPICS
ABSTRACT
Topological Data Analysis (TDA) enables the analysis of the geometric structure of data using tools from algebraic topology. A central technique in TDA is persistent homology, whose results are represented by persistence diagrams (PDs) describing the lifespan of topological features. Since PDs lack a natural vector-space representation, their direct use in machine learning (ML) classifiers is challenging. Therefore, several vectorisation methods have been proposed, including Persistence Image (PI), Persistence Landscapes (PL), Betti Curves (BC), and Persistence Silhouettes (PS). This study presents a comparative analysis of these vectorisation methods in classification tasks involving both synthetic and real-world datasets, using three classifiers: Logistic Regression (LR), XGBoost (XGB), and Multilayer Perceptron (MLP). Hyperparameter tuning and cross-validation were applied, and model performance was evaluated using accuracy, precision, recall, and F1-score. The results show that PI and PL consistently achieve the highest classification performance across different data types and classifiers. For synthetic datasets, these methods reached scores above 0.98, while for the ECG dataset, they outperformed alternative approaches by up to 30%. In contrast, all methods exhibited limited effectiveness on the MNIST dataset due to high geometric complexity and noise in pixel-based point cloud representations. For the ModelNet10 dataset, PI clearly outperformed other techniques, achieving scores of approximately 0.75. Overall, the results indicate that PI provides a robust and versatile topological representation for classification tasks, while PL stands out for its stability and interpretability in complex data analysis.
Journals System - logo
Scroll to top