Passer au contenu de la page principale

An interactive AI-based approach to semantic annotations for the SpokenWeb archive

Quoi:
Posters
Quand:
2:25 PM, Mercredi 27 Avr 2022 EDT (55 minutes)
Pauses:
Pause   03:20 PM à 03:35 PM (15 minutes)
Où:
  Session virtuelle
Cette session est dans le passé.
L'espace virtuel est fermé.
Comment:

Francisco Berrizbeitia, Developer, Concordia University Library

View poster | text-only version (.docx)

Adding semantic annotations to archival metadata allows to generate an alternative representation of the dataset in the form of a graph. This can be useful for multiple reasons: discovery of new relationships between objects, improves findability and allows for more sophisticated queries using the sparql query language. In this presentation we will explain the rationale used to develop a web-based tool to help users deal with this task using a semi-automatic approach that ensures high quality annotations while leveraging natural languages understanding techniques to speed up the process. First, we will present the proposed automated method and the results of the validation experiment that led us to the conclusion that a supervised approach was the best course of action, as opposed to a fully automated solution. Then, we will demonstrate the resulting application: an open-source, web-based tool that can be either used as stand-alone tool or integrated with Swallow, a metadata management system that was initially developed under the SpokenWeb partnership. The automated process used for tagging can me summarized as follows: 1) The text is tagged using Dbpedia Spotlight, a pretrained general NER tool that has shown good results in the past generating a list of dbpedia.org entities. 2) Each dbpedia.org URL is accessed to get the equivalent Wikidata object using the sameAs predicate. To test the effectiveness of the proposed method we compared the results of the automated approach to manually generated annotations (our gold standard). The chosen collection was the Sir George Williams Poetry Series, consisting of 54 unique entries in Swallow documenting twice as many recorded events, with entries sometimes having as many as 30 or more Wikidata annotations. The results of this exercise were an 80% precision on the detected entities with a recall of 36% when compared to the manual process. We considered that a tool with this performance could not fully replace the manual tagging. However, paired with an interactive user interface that allows to rapidly correct the mistakes made by the predictive model, and easily search and add entities manually could drastically reduce this time-consuming task. With this in mind, we then proceeded to develop a web application that could be integrated with Swallow or be used independently. The application uses a python back end that takes care on the interactions with dbpedia-spotlight and Wikidata.org and exposes the different methods as web services using Flask. The front-end is an easy to use, JavaScript based user interface. We hope that tools like the one we are proposing will encourage catalogue administrators to include semantic annotations in the records and connect more collections to the linked data cloud.

 

Twitter hashtag: #CULibraryForum  

Detail de session
Pour chaque session, permet aux participants d'écrire un court texte de feedback qui sera envoyé à l'organisateur. Ce texte n'est pas envoyé aux présentateurs.
Afin de respecter les règles de gestion des données privées, cette option affiche uniquement les profils des personnes qui ont accepté de partager leur profil publiquement.

Les changements ici affecteront toutes les pages de détails des sessions