Using Citation Metadata to Investigate the Implications of Automatic Indexing Algorithms on Information Retrieval
My Session Status
What:
Talk
When:
3:30 PM, Tuesday 16 Apr 2024
(30 minutes)
Where:
Theme:
Virtual Session
As of April 2022, the National Library of Medicine has converted to automatic indexing for MEDLINE citations thanks to the integration of The Medical Text Indexer (MTI). MTI has been incredibly impactful, with a notable decrease in the time it takes a MEDLINE citation to receive MeSH indexing. However, further work is needed to address some well-documented issues around the indexing genes and chemical compounds and their impact on information retrieval. To investigate these issues, this research pursues the following research questions:
RQ1. Is there a relationship between the indexing method or journal impact factor (JIF) and how well MeSH terms align with keywords and chemical symbols?
RQ2. Is there a relationship between the indexing method or JIF and the term usage frequencies among MeSH, keywords, and chemical symbols?
These RQs are being addressed by analyzing a sample of indexed MEDLINE citations. 648 citations published between January 2021 and December 2023 were randomly selected and relevant information fields were extracted via NLM’s efetch and xtract tools. Journal impact factor data was downloaded from Clarivate. Using R, a n-gram analysis and the relative frequency of each term will address RQ1 and RQ2, respectively. As this is an ongoing research project, more information will be provided regarding the results and interpretation of the n-gram results for RQ1 and relative frequency results for RQ2. In addition to a discussion about how these results impact information retrieval, these results will also be contextualized within the current state of librarianship and the role that artificial intelligence is taking in the field.
RQ1. Is there a relationship between the indexing method or journal impact factor (JIF) and how well MeSH terms align with keywords and chemical symbols?
RQ2. Is there a relationship between the indexing method or JIF and the term usage frequencies among MeSH, keywords, and chemical symbols?
These RQs are being addressed by analyzing a sample of indexed MEDLINE citations. 648 citations published between January 2021 and December 2023 were randomly selected and relevant information fields were extracted via NLM’s efetch and xtract tools. Journal impact factor data was downloaded from Clarivate. Using R, a n-gram analysis and the relative frequency of each term will address RQ1 and RQ2, respectively. As this is an ongoing research project, more information will be provided regarding the results and interpretation of the n-gram results for RQ1 and relative frequency results for RQ2. In addition to a discussion about how these results impact information retrieval, these results will also be contextualized within the current state of librarianship and the role that artificial intelligence is taking in the field.
Discussion