Paper Session #6
Session Chair: Elena Gigliarelli, ISPC CNR
Papers
The Writing on the Wall: Digitally Rediscovering Bulgaria’s Post-Byzantine Heritage St. Kliment Ohridski University of Sofia, Bulgaria Text-bearing objects are among the cultural heritage items most thoroughly studied with digital methods and most widely encoded and published with digital tools. The TEI subset for inscriptions and papyri known as EpiDoc (https://sourceforge.net/p/epidoc/wiki/Home/) is constantly developed by an active community of contributors. It is applied to a number of online epigraphic and papyrological collections, including Bulgaria’s Telamon database of the Greek inscriptions (https://telamon.uni-sofia.bg/) whose stages of development have been presented before the DARIAH community on different occasions. There is also the SigiDoc subset for seals and stamps (http://sigidoc.huma-num.fr/). Among the most interesting historical inscriptions hitherto poorly covered by Digital Epigraphy are those accompanying church murals and icons, especially from the (post)-Byzantine world. Byzantine religious art left a rich heritage of artistic canons and conventions that lived on long after Byzantium itself was gone, sometimes in quite different linguistic and cultural contexts. There is a great number of texts written in (sometimes substandard) Byzantine Greek accompanying various religious scenes in churches and monasteries throughout Bulgaria from the period of Ottoman rule (XV-XΙΧ c.). The forms and functions of such inscriptions have rarely been an object of research beyond the scope of art-historical publications where they are usually described as a part of the image. However, such texts, even when they appear extremely short and standardized, enter into a whole range of relations that require further study and proper representation. Firstly, there is the question of the place of the inscription in the context of the entire visual composition: a relation that can vary from the simple explanation of a scene to the subtle interplay between a saintly character and the quotation contained in the book or scroll (s)he holds. Then, the issue arises about the intertextual relation of such quotations and the scriptural or liturgical traditions of which they were instances, echoes, etc. As well, the roles of the inscriptions in the larger framework not only of the particular religious building but also of the whole culture of the respective period and the linguistic competence of its audiences needs examination. The present paper will describe the methodology and the workflow of a new project at the University of Sofia, Bulgaria, which aims at resolving such complex issues through the EpiDoc-based tools elaborated during the work on the Telamon collection. For the processing, indexing, and publication of the texts and the images, our own AIAX EpiDoc front-end service is customized and applied. For the accurate and user-friendly representation of the different connections of the texts with their wider iconographical, intertextual, and cultural contexts, a conceptual model is currently being created. According to this model, different authority files will be linked to the metadata of the particular texts in ways that will allow for the searching and organizing of the monuments according to intersections of different classifying criteria. A demonstration of the online collection will accompany the presentation. Collections as Data at the KB, the National Library of the Netherlands: Redesigning Data Services for the Future KB, the National Library of the Netherlands, Netherlands, The In less than 20 years’ time, the collections of digitized materials from the KB, the national library of the Netherlands, have grown into fully-fledged large-scale national collections, actively maintained and well established. They are supplemented on a regular basis and access to the collections is facilitated according to the contemporary generally accepted primary and secondary access methods of digital cultural heritage: with an online graphical search interface (Delpher) and with a suite of services, in line with the ‘Collections as Data imperative’ first elaborated by Thomas Padilla and colleagues (2019). Based on ten years of experience the KB is now in the process of rethinking and redesigning these Data Services. In this paper we will offer a concise analysis of our experiences so far and discuss the plans we have to get Data Services ready for another ten years. Data Services was launched in 2012, based on a set of API’s, to give access to the KB collections as data. It basically consists of a coordinator and a handful of manuals on how to search and harvest our collections. Upon request a license may be granted to get access to parts of the copyright protected collections for research purposes. Data Services has been successful in opening up the Delpher collections for a variety of national and international research projects (Polimedia, Translantis, Nederlab, Media Suite, Impresso, to name a few) as well as many individual researchers in the Netherlands, and beyond. In tandem with the KB Lab (lab.kb.nl), launched in 2014, it has served as an inspiration for several European national libraries to give access to digital library collections as data. Today we are reorganizing Data Services. The remake will put a stronger emphasis on the importance of FAIR data principles, documenting data provenance, transparency of selection workflows and user-friendliness. On a practical level it will include the introduction of a data registry to make the data more easily findable for both humans and machines, and a series of data sheets and/or data cards as standardized documentation, encouraging transparency and mitigating potential bias. We also aim to build a corpus selection tool, offering advanced functionalities of data discovery and selection to support the creation of research corpora, as such functioning as a more intuitive user interface to the existing API’s. Finally we want to address the need for giving access to _all_ in-copyright materials by creating an onsite mining facility and an online tools-to-data solution, providing ways to mine our collections without violating the rights of copyright owners. The VAST methodology and workflows for experience digitisation 1National Centre for Scientific Research “Demokritos”, Greece; 2Università degli Studi di Milano, Italy; 3National and Kapodistrian University of Athens, Greece; 4Museo Galileo – Istituto e Museo di Storia della Scienza, Italy; 5NOVA University of Lisbon – School of Social Sciences and Humanities, Portugal; 6Fairy Tale Museum, Cyprus; 7Semantika Research, Slovenia The interaction of aesthetic and moral values [1], as well as the role of art in moral education [2] are topics of debate. Nevertheless, artifacts often cannot escape but embody the values of their times, their creators or of the stories/people they talk about, at least in the perception of the audiences. Reactions with respect to transmitted values can get as strong as toppling statues, like that of Edward Colston in Bristol, UK in 2020. At the same time, observing and understanding the values of citizens and stakeholders becomes increasingly important in many fields, from sociology and policy-making to technology and Artificial Intelligence. Taken together, the systematic collection of the values born in cultural artifacts, as perceived by audiences, can offer a ‘valuable’ source of research data in various fields. Unlike measurable or objective properties of artifacts and historical metadata, people’s experience during their exposure to an artifact is highly subjective. From spontaneous emotional reactions to personal biases, capturing the audience’s perception of values is a challenging task. VAST is a European H2020 research and innovation action that has been developing methodologies, tools and data infrastructure to capture and digitise the values of intangible cultural heritage (CH) artifacts, including narratives of ancient Greek theatre, 17th century scientific texts and European fairy tales. From text and visual content annotation, to applications and educational activities, VAST has structured a methodology so that any data collected are scientifically sound, can populate VAST’s ontology, and stimulate research in humanities. VAST methodology, instilled in the ontological specifications, deals with three major aspects of a person’s interaction with an artifact or experience: a) the participant’s individual characteristics (e.g. demographic information, philosophical beliefs), b) the description of the artifact/ experience and c) the collection of the perceived values based on the artifact/ experience. The ‘born-digital’ datasets, collected through the above workflow, populate VAST’s ontology and will be available through the VAST platform for various uses. Besides supporting research, from Humanities to AI (e.g. providing ‘unbiased’ datasets for values mining and Natural Language Processing), VAST datasets can be used for citizen-informed curation, as well as for promoting new ways of engagement with CH, including digital apps and post-activity interactions. Again, these interactions with the VAST platform can further be used for augmenting existing datasets or producing new ones. Example of the VAST workflow based on an educational activity. An excerpt of Sophocles’ Antigone is used to trigger a discussion about the concept of ‘values’ and the values perceived in the play. Pre-activity, participants provide demographic information and fill-in the Personal Values Questionnaire. After exposure to the play (central experience), the value perception is collected through text annotation and mind maps. The collected data are organised into the VAST ontology and then become available for post-activity use. Importantly, the methodology of each activity becomes available, allowing the reproducibility of the experience and the relevant dataset augmentation. [1] Ravasio, M., 2021. What is the Connection Between Art and Morality? [2] Hospers, J., 2022. Art as means to moral improvement. HTR in BnF DataLab : first steps with researchers
Bibliothèque nationale de France, France The BnF DataLab opened in October 2021 to welcome researchers working on the BnF’s digital collections. It has been designed to facilitate access to the collections by providing physical and computer work spaces but also support from a wide range of experts. More than a data supply service, the BnF DataLab is a coordination of services and expertise that accompanies research projects from the constitution of corpora to the valorisation of research results. It is a place of co-construction, but also a laboratory for experimenting the tools of the future library, not only in the service of research projects but also in business applications. To enable this dialogue between researchers and the library and to ensure the follow-up of projects, partnerships with major players in the research world such as the CNRS via the IR* Huma-Num or the ObTIC project-team of Sorbonne-Université have been set up. These partnerships allow researchers to be hosted in residence at the BnF DataLab, supply services of staff and co-sponsored projects. The challenges facing heritage institutions and academic research are similar when it comes to FAIR data. We believe that the DataLabs flourishing in different national libraries can serve to forge relationships between patrimonial institutions, between patrimonials and research, to provide researchers with usable and interoperable data. By offering a place for experimenting with new technologies, but also ensuring best practices, methodological and technical standards, to enhance the value of the datasets and tools. This harmonization and exchange between researcher and cultural insitution can be shown through HTR programs, hosted is the Bnf DataLab. The transcription of handwritten documents and the provision of robust models is a subject that mobilizes both researchers and heritage institutions. It is one of the major projects of the BnF. Indeed, the state of HTR technologies has improved a lot during the last few years. It seems the moment has come for cultural heritage institutions to take advantage and start considering processing large sets of documents. But after an important period of using OCR softwares as a routine, the project of developing HTR brings some fundamental questions. None of the HTR engines that have emerged from the scholar research field is capable of processing efficiently a huge variety of writings, from widely different times and scriptors. And nothing indicates that such a tool or generic model will be soon available. So if we don’t want to use an API from some of the major IT actors, and if we need to set up the same kind of industrial workflow that we use for OCR, how do we do it? How do we manage to retrieve all the models that are available ? How do we test them to certify their efficiency? How do we interrogate their description to relate them to the relevant documents to process? And how do we feedback the creators of the open-source models to help them improve their work? |