Density-based clustering validation

In each graph, an increasing level of noise is introduced to the initial data, which consist of two well-defined semicircles. As the noise increases and thus the overlap between the two groups, the value of the DBCV index progressively decreases. Image released under MIT license.[1]

Density-Based Clustering Validation (DBCV) is a metric designed to assess the quality of clustering solutions, particularly for density-based clustering algorithms like DBSCAN, Mean shift, and OPTICS. This metric is particularly suited for identifying concave and nested clusters, where traditional metrics such as the Silhouette coefficient, Davies–Bouldin index, or Calinski–Harabasz index often struggle to provide meaningful evaluations.

Unlike traditional validation measures, which often rely on compact and well-separated clusters, DBCV index evaluates how well clusters are defined in terms of local density variations and structural coherence.

This metric was introduced in 2014 by David Moulavi and colleagues in their work.[2] It utilizes density connectivity principles to quantify clustering structures, making it especially effective at detecting arbitrarily shaped clusters in concave datasets, where traditional metrics may be less reliable.

The DBCV index has been employed in bioinformatics analysis,[3] ecology analysis,[4] techno-economic analysis,[5] and health informatics analysis[6] as well as in numerous other fields.[7] [8]

  1. ^ GitHub. FelSiq/DBCV Fast Density-Based Clustering Validation (DBCV) Python package -- https://github.com/FelSiq/DBCV
  2. ^ Moulavi, David; Jaskowiak, Pablo A.; Campello, Ricardo J. G. B.; Zimek, Arthur; Sander, Jörg (2014), "Density-Based Clustering Validation", Proceedings of the 2014 SIAM International Conference on Data Mining (PDF), SIAM, pp. 839–847, doi:10.1137/1.9781611973440.96, ISBN 978-1-61197-344-0
  3. ^ Di Giovanni, Daniele (2023), "Using machine learning to explore shared genetic pathways and possible endophenotypes in autism spectrum disorder", Genes, 14 (2): 313, doi:10.3390/genes14020313, PMC 9956345, PMID 36833240
  4. ^ Poutaraud, Joachim (2024), "Meta-Embedded Clustering (MEC): A new method for improving clustering quality in unlabeled bird sound datasets", Ecological Informatics, 82, Elsevier: 102687, doi:10.1016/j.ecoinf.2024.102687
  5. ^ Shim, Jaehyun (2022), "Techno-economic analysis of micro-grid system design through climate region clustering", Energy Conversion and Management, 274, Elsevier: 116411, Bibcode:2022ECM...27416411S, doi:10.1016/j.enconman.2022.116411
  6. ^ Martínez, Rubén Yáñez (2023), "Spanish Corpora of tweets about COVID-19 vaccination for automatic stance detection", Information Processing & Management, 60 (3), Elsevier: 103294, doi:10.1016/j.ipm.2023.103294
  7. ^ Beer, Anna (2025), "DISCO: Internal Evaluation of Density-Based Clustering", arXiv:2503.00127 [cs.LG]
  8. ^ Veigel, Nadja (2025), "Content analysis of multi-annual time series of flood-related Twitter (X) data", Natural Hazards and Earth System Sciences, 25 (2), Copernicus Publications Gottingen, Germany: 879–891, Bibcode:2025NHESS..25..879V, doi:10.5194/nhess-25-879-2025

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search