(1) Overview

Repository location

The description of the workflows is part of an extended report about the outcomes of the CLARIN-funded project “A new CLARIN Resource Family for lexical semantic change research”. The report and the workflows are stored in a Zenodo repository with DOI 10.5281/zenodo.8156199.

The workflow for lexical semantic change research in lexicology is stored in the Social Sciences and Humanities (SSH) Open Marketplace as a “Workflow” item. The workflow is called “Semantic change analysis for lexicological studies”, and it is available at https://marketplace.sshopencloud.eu/workflow/yStoh2.

Context

The evolution of word meanings (lexical semantic change) is a highly relevant subject for linguists, lexicographers, and scholars across the humanities. Take the word “snowflake”: this word initially referred to ice crystals but has evolved to describe unique individuals or those (perceived to be) easily offended, reflecting shifts in culture and society.

In recent years, Natural Language Processing researchers have developed algorithms that predict lexical semantic change (). Moreover, annotated texts provide valuable context for word meanings, and some researchers have proposed models that represent semantic change information contained in lexical resources like dictionaries as linked data (). However, accessing resources and tools for semantic change research remains challenging due to fragmentation across various domains.

To address this challenge, the CLARIN-funded project “A new CLARIN Resource Family for lexical semantic change research” () aimed to centralise essential resources for semantic change research, encompassing datasets with word sense annotations, word embeddings derived from diachronic corpora, automated algorithms for detecting semantic change, and lexical resources.

As part of this project, we have developed workflows inspired by the SSH Open Marketplace. These workflows guide users through a series of manageable steps, connecting each step with relevant datasets, manuals, and digital tools. We chose the SSH Open Marketplace as a model given its broad scope across the whole of the SSH and its robust infrastructure, which is supported by three prominent SSH European Research Infrastructures consortia, CESSDA, CLARIN, and DARIAH. The workflows that we have developed in the context of this project are designed not only to apply to various research domains but also to be language-independent. The SSH Open Marketplace discovery platform enables linking to various resources for each step and even at the workflow level itself, provided that the resource in question is part of the platform. In this way, any resource for any language, as long as it is available on the platform, can be linked to the different steps.

(2) Method

The SSH Open Marketplace website offers guidance on the creation of workflows, with the following steps: identification and description of workflows and the association of metadata with them; description of the various steps involved in the workflow; and finally, the identification of relevant resources. In our case, the previous research carried out by the first two authors in semantic change detection, together with a knowledge of the current literature on the topic, directly inspired the selection of the specific workflows we chose to develop. Compared to existing workflows particularly focussed on the NLP processing steps (LSCDetection;, ), we were interested in covering, as far as possible, the information needs of the various disciplines which might potentially make use of the results of semantic change detection. Once we had a first draft of our workflows, we contacted numerous experts, choosing among the authors of the articles we consulted during the initial literature review process (references to the articles are provided with the workflows in Zenodo). We asked them to give feedback on the workflows based on their own experiences in the field. In our case, we also endeavoured to include references to CLARIN Resource Families (CRFs) in our workflows and base our workflows mainly around the different categories of resources featured in the CRFs. The intention here was to make our workflows as interoperable as possible by citing categories of resources hosted by one of the major European SSH infrastructures, which could potentially be a point of reference throughout the different SSH disciplines. These categories, moreover, are periodically updated with new resources.

Figures 1 and 2 show the workflow “Semantic change analysis for lexicological studies” that we created within the SSH Open Marketplace. Each workflow is introduced by describing the task and the type of audience targeted. A list of steps follows (Figure 1), which the user can expand to find additional information and examples of resources available on the platform that can be used to complete each specific step (Figure 2).

Figure 1 

Overview of the steps of the workflow “Semantic change analysis for lexicological studies” in the SSH Open Marketplace.

Figure 2 

Step 1 of the workflow “Semantic change analysis for lexicological studies”, as it appears in the SSH Open Marketplace.

(3) Dataset Description

Object name

“Workflows for lexical semantic change” (in Zenodo); “Semantic change analysis for lexicological studies” (in SSH Open Marketplace).

Format names and versions

The workflows stored in Zenodo are presented as a pdf file that follows the structure suggested by the SSH Open Marketplace. The workflow in the Open Marketplace is available as an online resource on the platform.

Creation dates

Start date: 2022-11-01

End date: 2023-10-31

Dataset creators

Barbara McGillivray (Department of Digital Humanities, King’s College London, London, United Kingdom); Paola Marongiu (Institut des sciences du langage (ISLa), University of Neuchâtel, Neuchâtel, Switzerland); Fahad Khan (Istituto di Linguistica Computazionale, Consiglio Nazionale delle Ricerche, Pisa, Italy).

Language

English, Latin, ancient Greek

License

Creative Commons Attribution 4.0 International

Repository name

Zenodo; Social Sciences and Humanities Open Marketplace https://marketplace.sshopencloud.eu/

Publication date

2023-10-31

(4) Reuse Potential

We have presented workflows that streamline and facilitate research in semantic change by simplifying access to relevant language resources and tools scattered across different repositories and platforms. The workflows are intended for use by a range of researchers from different backgrounds whose interests lie in the exploration of lexical semantic change as a valuable tool in addressing their respective research questions, whether in history, linguistics (including lexicology, lexicography and semantics), or cultural studies (). The particular example of the lexicology workflow can be used for any study that aims to analyse the evolution of a lexical or semantic field, either to test specific hypotheses or to relate language changes to large-scale social and historical events. Beyond research, the workflows can be employed in teaching and learning scenarios, offering students an opportunity to engage with advanced linguistics concepts and data analysis techniques and exposing them to interdisciplinary approaches to the study of semantic change.

However, some challenges may arise from the reuse of the workflows in research contexts. Research in NLP on semantic change is evolving rapidly, requiring frequent updates to the more computational components of the workflows. From a more general perspective, workflows have been designed to be adaptable to the disciplines mentioned earlier. While this ensures comprehensive coverage of various research scenarios, this lack of granularity inevitably leads to overlooking some intersteps that are more specific to those disciplines and/or research questions. For instance, individual languages may lack some of the tools and resources assumed by a single workflow, and it may be necessary to either build them or use resources that are available. We have chosen to forego granularity to ensure broader applicability of the workflows, fully aware that in the current state this may also represent a limitation to our work.