Skip to main content

Multimodal NLP

Leveraging the Bloom Library to create a multilingual, multimodal dataset for building and testing language technology for low-resource languages

Working with SIL International, the Vision Lab is developing a dataset for multimodal and multilingual learning based on the Bloom Library, an online repository of books covering 456 languages. Many of these books have both text and visual and even audio components, allowing for research into multimodal techniques for developing language technology, connecting the modalities of sight, sound and text. Grounded language learning in multiple modalities may provide a path towards more efficient and flexible Natural Language Processing, especially benefiting languages that are "low-resourced," without much in the way of existing machine learning text datasets.

Furthermore, the massively multilingual nature of this dataset may provide significant opportunities for other research angles including cross-lingual transfer learning, again benefiting low-resource languages.

Seeing a language from many different perspectives such as speech, text, visual grounding and transfer from neighboring languages, may provide a clearer picture of how to use it and understand it, than any one of these alone, and this dataset will provide a basis to explore the concept further. 


Vision Lab, Dr. Vijayan Asari, Director

Kettering Laboratories
300 College Park
Dayton, Ohio 45469 - 0232