Duplicate Records in WorldCat for 20th-Century American, British, and Canadian Books: A Comparison of Duplication Rates and Causes
My Session Status
What:
Talk
When:
1:00 PM, Wednesday 17 Apr 2024
(30 minutes)
Where:
Theme:
In-person Session
This study aimed to compare the bibliographic record duplication rates between books published in the United States, the United Kingdom, and Canada and identify the causes of duplicate records in OCLC WorldCat. The aim was also to illustrate the causes of duplicate records with examples, taking the opportunity to review earlier cataloging standards and identify common pitfalls. There was an attempt to rank the causes in order by most impactful with the intention of informing cataloging practices and OCLC’s duplicate management strategy, currently undergoing changes with the incorporation of machine-learning techniques. Although it was expected that the estimated rate of duplication by country of publication would be higher for the U.K. and Canada than for the U.S. due to the later adoption of OCLC as a bibliographic utility in those countries, that turned out to be wrong. The duplication rate for books published in New York and London was similar, with the duplication rate for books published in Montreal being much lower. In hindsight, the choice of city for Canada should probably have been Ottawa to capture duplicate records for Canadian federal government documents or Toronto, as the English-language publishing capital of the country. To promote deduplication, cataloging should be done with authority work, and OCLC records should be corrected with each new or updated authority record. Hopefully as authority records are newly created or upgraded to Resource Description and Access (RDA) and edited with the latest guidelines, the addition of cross-references will lead to more access points being authorized and easier merging of bibliographic records. OCLC’s Duplicate Detection and Resolution (DDR) software has done well in merging records, leading to more library holdings on the highest quality records. But copy cataloging older books without ISBNs or LCCNs requires careful keyword searching and sorting results by records with the highest number of holdings. Jeffrey Beall wrote in 2010 that the rules about when to input a new record are vague, ignored, or misunderstood. While this may still be true, catalogers aiming to better understand these rules can benefit from expert training by joining OCLC’s Member Merge Project. Records for editions and reproductions and brief records, completed only in local catalogs and later batchloaded to OCLC, were a major cause of record duplication. Due to deriving practices, such records often represent a mix of manifestations that cannot always be untangled. Libraries should review the retention of some categories of 20th century books whose records tend to cause duplication, including printings, reproductions, fine arts publications, such as auction and exhibition catalogs, and conference publications. OCLC should encourage the deletion of difficult-to-merge records without holdings whenever those records obviously do not represent unique content.
Discussion