Information extraction from historical collections
Gallery, Library, Archival, and Museum (GLAM) collections contain large troves of visual and textual information. Their digitisation is opening up new opportunities to automatically extract contents and create knowledge graphs. Machine learning methods need to be adapted to the task, since challenges include input noise, source and domain variety, lack of linguistic resources. This open-ended project aims to design and develop machine learning methods for the automatic extraction of information for GLAM collections. Tasks of interest include text recognition, named entity recognition, object detection, similarity search.
Methodologically, I work with transfer learning, to re-use existing resources, active learning, to leverage expert knowledge while training models, and eXplainable AI, to open models to human inspection and understanding.