An interactive AI-based approach to semantic annotations for the SpokenWeb archive

Quoi:

Posters

Partie de:

Présentation des affiches en salles de petits groupes Zoom

Quand:

2:25 PM, Mercredi 27 Avr 2022 EDT (55 minutes)

Pauses:

Pause 03:20 PM à 03:35 PM (15 minutes)

Où:

Session virtuelle

Cette session est dans le passé.

L'espace virtuel est fermé.

Êtes-vous un conférencier ou membre du personnel?

Comment:

Francisco Berrizbeitia, Developer, Concordia University Library

View poster | text-only version (.docx)

Adding semantic annotations to archival metadata allows to generate an alternative representation of the dataset in the form of a graph. This can be useful for multiple reasons: discovery of new relationships between objects, improves findability and allows for more sophisticated queries using the sparql query language. In this presentation we will explain the rationale used to develop a web-based tool to help users deal with this task using a semi-automatic approach that ensures high quality annotations while leveraging natural languages understanding techniques to speed up the process. First, we will present the proposed automated method and the results of the validation experiment that led us to the conclusion that a supervised approach was the best course of action, as opposed to a fully automated solution. Then, we will demonstrate the resulting application: an open-source, web-based tool that can be either used as stand-alone tool or integrated with Swallow, a metadata management system that was initially developed under the SpokenWeb partnership. The automated process used for tagging can me summarized as follows: 1) The text is tagged using Dbpedia Spotlight, a pretrained general NER tool that has shown good results in the past generating a list of dbpedia.org entities. 2) Each dbpedia.org URL is accessed to get the equivalent Wikidata object using the sameAs predicate. To test the effectiveness of the proposed method we compared the results of the automated approach to manually generated annotations (our gold standard). The chosen collection was the Sir George Williams Poetry Series, consisting of 54 unique entries in Swallow documenting twice as many recorded events, with entries sometimes having as many as 30 or more Wikidata annotations. The results of this exercise were an 80% precision on the detected entities with a recall of 36% when compared to the manual process. We considered that a tool with this performance could not fully replace the manual tagging. However, paired with an interactive user interface that allows to rapidly correct the mistakes made by the predictive model, and easily search and add entities manually could drastically reduce this time-consuming task. With this in mind, we then proceeded to develop a web application that could be integrated with Swallow or be used independently. The application uses a python back end that takes care on the interactions with dbpedia-spotlight and Wikidata.org and exposes the different methods as web services using Flask. The front-end is an easy to use, JavaScript based user interface. We hope that tools like the one we are proposing will encourage catalogue administrators to include semantic annotations in the records and connect more collections to the linked data cloud.

Twitter hashtag: #CULibraryForum

Permettre aux participants d'évaluer les sessions avec un "pouces vers le haut/bas" (thumbs up/thumbs down).

Permettre aux participants d'envoyer un feedback à l'organisateur.

Pour chaque session, permet aux participants d'écrire un court texte de feedback qui sera envoyé à l'organisateur. Ce texte n'est pas envoyé aux présentateurs.

Afficher la liste des personnes dans l'auditoire de chaque session du programme.

Afin de respecter les règles de gestion des données privées, cette option affiche uniquement les profils des personnes qui ont accepté de partager leur profil publiquement.

Permettre aux participants de participer à des discussions en ligne sur les sessions.

Les changements ici affecteront toutes les pages de détails des sessions

Detail de session