Co-citation Proximity Analysis

Documents B and C are cited in closer proximity to each other in the full-text of the citing document, when compared to document A. Hence, according to co-citation proximity analysis, documents B and C are more strongly related than documents A and B or A and C.
Figure visualizing the Co-citation Proximity Analysis (CPA) approach to document similarity computation.

Co-citation Proximity Analysis (CPA) is a document similarity measure that uses citation analysis to assess semantic similarity between documents at both the global document level as well as at individual section-level.[1][2] The similarity measure builds on the co-citation analysis approach, but differs in that it exploits the information implied in the placement of citations within the full-texts of documents.

Co-citation Proximity Analysis was conceived by B. Gipp in 2006[3] and the description of the document similarity measure was later published by Gipp and Beel in 2009.[1] The similarity measure rests on the assumption that within a document’s full-text, the documents cited in close proximity to each other tend to be more strongly related than those documents cited farther apart. The figure to the right illustrates the concept. The CPA approach to document similarity assumes the documents B and C to be more strongly related than the documents B and A, because the citations to B and C occur within the same sentence, whereas the citations to B and A are separated by several paragraphs.

The advantage of the CPA approach compared to other citation and co-citation analysis approaches is an improvement in precision. Other widely used citation analysis approaches, such as Bibliographic Coupling, Co-Citation or the Amsler measure, do not take into account the location or proximity of citations within documents. The CPA approach allows a more granular automatic classification of documents and can also be used to identify not only related documents, but the specific sections within texts that are most related.

  1. ^ a b Bela Gipp and Joeran Beel, 2009 "Citation Proximity Analysis (CPA) – A new approach for identifying related work based on Co-Citation Analysis" in Birger Larsen and Jacqueline Leta, editors, Proceedings of the 12th International Conference on Scientometrics and Informetrics (ISSI’09), volume 2, pages 571–575, Rio de Janeiro (Brazil), July 2009.
  2. ^ Bela Gipp and Joeran Beel. "Method and system for detecting a similarity of documents". Patent Application, Oct 27, 2011. 2011/0264672 A1.
  3. ^ Bela Gipp, 2006. "Doctoral Proposal: (Co-)Citation Proximity Analysis – A Measure to Identify Related Work"

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search