This article is within the scope of WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.StatisticsWikipedia:WikiProject StatisticsTemplate:WikiProject StatisticsStatistics
I wil be giving this article a Review for possible GA status. As higher mathematics are not one of my strong suits (my last "high math" was trigonometry & analytic geometry ages ago...) this might take me a while but I promise I will finish. Shearonink (talk) 19:48, 1 March 2017 (UTC)[reply]
I think the article stays as focused as is possible and still make the subject as clear as an article in a non-technical encyclopedia can. Shearonink (talk) 07:08, 7 March 2017 (UTC)[reply]
Nicely-done. As I said below, I think, for future improvements, that explaining the usage of Nearest-neighbor chain algorithms in real-world terms (as in, what do they do?) will help de-mystify the subject to Wikipedia's general readership. Shearonink (talk) 07:08, 7 March 2017 (UTC)[reply]
I am reading this through over and over and sort of/maybe/almost understand the subject. I do have a question though...in layman's terms, is there an explanation for what this algorithm is used for? I mean I understand it is used for clustering but what is the purpose of "clustering"? Shearonink (talk) 04:22, 3 March 2017 (UTC)[reply] @David Eppstein: Was wondering about the above question. Thanks, Shearonink (talk) 05:48, 5 March 2017 (UTC)[reply]
Yes, thanks for the suggestion. The short answer is that clustering is fundamental for understanding all kinds of data — e.g. trying to understand which different diseases cause similar collections of symptoms, trying to group customers by their interests, etc. Hierarchical clustering is good either when the grouping of data that you want to construct is multi-level or tree-like (like Wikipedia categories) or when you don't know how many groups to make (so you make groupings at all levels of refinement and then figure out which level is the right one later). A common use for some of the clustering algorithms described here is to reconstruct evolutionary trees by using genetic distance. But all this should really be in the article (in the background section), not here — I plan on adding it when I can take the time to look for appropriate sources to use for it. —David Eppstein (talk) 07:58, 5 March 2017 (UTC)[reply]
Just trying to understand the subject a bit more, so thanks. And you are looking to add this type of content in the future? Ok, good, that was probably going to be a "recommendation for future improvements" from me. The article really looks to be in good shape, I will be doing a few more readthroughs to see if there's anything I missed, but should be able to finish up in the next day or so. Shearonink (talk) 16:33, 5 March 2017 (UTC)[reply]