Browsing by Author "Tuominen, Jouni, Dr., University of Helsinki and Aalto University, Finland"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item From Text to Knowledge: Methods, Tools, and Applications for Digital Humanities Based on Linked Data(Aalto University, 2023) Tamper, Minna; Tuominen, Jouni, Dr., University of Helsinki and Aalto University, Finland; Mäkelä, Eetu, Assoc. Prof., University of Helsinki, Finland; Tietotekniikan laitos; Department of Computer Science; Semantic Computing Research Group; Perustieteiden korkeakoulu; School of Science; Hyvönen, Eero, Prof., Aalto University, Department of Computer Science, FinlandThe digitization of Cultural Heritage collections has enabled the use of computational methods such as Natural Language Processing (NLP) on textual collections. These methods have been used widely in Digital Humanities (DH) to study digitized contents with automated processes. The Semantic Web and linked data technologies have been applied to describe document collections and their metadata in library and museum collections. They provide infrastructure for connecting different collections by linking them using shared vocabularies that describe metadata values and fields. Linked data is also used in Finnish museum and library collections. It is commonly used to modeling document metadata, such as author, or title of a piece of work. Also, the content of a document in a collection is usually described using manually assigned keywords. Other information about the content is often scarce and finding documents related to an actor can be laborious. This thesis studies and presents novel models, methods, and tools for transforming and enriching document collections automatically to linked data. Linked data technology helps to link together documents of a collection based on their metadata, e.g., author, or publisher. It can be also used to link documents based on information extracted about the content, such as actors mentioned in text. The aim of this thesis is to study how the NLP methods and linked data can be used to study digitized document collections, such as biographies. Research in this thesis is conducted by designing, implementing, and evaluating proof-of-concept systems, tools, and data for real life use cases. The research follows the principles of the design science and action research. The thesis presents a toolkit that can be used to model, transform, and enrich biographical text document collections to linked data to improve collection's information retrieval and interoperability internally and with other collections. The data model for describing text document collection's content and features, e.g., keywords and mentioned names, creates a foundation for building intelligent services based on the linked data such as network or linguistic analysis. These services can be used to visualize the interlinked data by showing the relations between themes or actors. In addition, the linked-data-based datasets can be used as an input for NLP tools to create data analytical visualizations and applications. This approach can be also used to evaluate the quality and content of text document collections for DH research. The prototypes created for data transformation, enrichment, and information visualization can be also applied to other document collections.