Top

海角视频

What do you get when medievalists team up with computer scientists? Smart software that reads Latin

In a move that could transform manuscript studies, U of T medievalists have partnered with a team in the United Kingdom to develop a program that can read and transcribe the handwritten Latin found in 13th-century legal manuscripts.

Medieval handwriting can look crabbed and unintelligible, with non-standardized spellings, hyphenations, abbreviations, calligraphic flourishes and any number of distinct 鈥渉ands.鈥

While scholars have been making digital images of these manuscripts for years, an image is only a first step in making the contents accessible. The further steps of transcribing and comparing these texts is painstaking and tedious work that can take years or even decades to complete.

Artificial intelligence (AI) and machine reading software like is changing that.

Developed by 鈥 an international consortium of scholars, scientists and archivists 鈥 the software can be trained to read any type of handwriting, theoretically in any language.

Transkribus not only digitizes manuscripts and transcribes their contents but 鈥渞ecognizes鈥 idiosyncratic features across multiple manuscripts, thus enabling comparison.

Recent successes include the transcription of manuscripts from colonial Mexico, the Hanseatic League, and early 20th-century Ireland.

Back in 2016 when the software was still getting off the ground, it came to the attention of , a professor of medieval social and economic history at U of T Scarborough who is also cross-appointed to the Centre for Medieval Studies in the Faculty of Arts & Science.

Gervers 鈥 who has worked with Latin manuscripts since the 1970s 鈥 put together a U of T team including , a 海角视频 of Computer Science professor who works on natural language processing, and , now a PhD student in history at Yale University.

They also joined forces with another team already working with Transkribus at University College, London (UCL).

Scholars in UCL鈥檚 were teaching the software to read the handwritten papers of the 18th-century philosopher Jeremy Bentham. By sharing resources for software development, the two teams could train Transkribus more quickly and efficiently.

The teaching process, however, wasn鈥檛 easy. Transkribus learns by 鈥渓ooking鈥 at a sample page and comparing it line by line with a pre-prepared transcription. Lloyd spent hours selecting text to feed the software and the team ran into two major problems: hyphens and abbreviations.

Medieval scribes often saved valuable parchment space by abbreviating words, sometimes dramatically; they would also write up to the very border of the script area before arbitrarily hyphenating whatever word they were on when they ran out of space. Since Transkribus 鈥渞eads鈥 whole words rather than individual letters, it had to learn to recognize words even when abbreviated or hyphenated.

Clearing these hurdles is now paying off. The new Latin-reading Transkribus is capable of precisely transcribing the peculiar handwriting in 13th-century Latin legal documents.

Though the program is currently trained for Latin legal texts, it鈥檚 only a matter of time before it鈥檚 adapted to literary texts and more.

Gervers notes that Transkribus would be an ideal program for Ge鈥檈z, an Ethiopic script he has worked with alongside Latin since the 1990s. Largely unchanged over its 2,000-year history, the Ge鈥檈z script was used in one of the earliest known complete Gospel manuscripts and is still used in Ethiopia today.

Gervers says the script is 鈥減erfect for machine transcription鈥 鈥 Ge鈥檈z has no abbreviations and conveniently puts colons at the ends of words and sentences.

With enough time and effort, Gervers thinks that Transkribus can be applied across medieval studies.

鈥淲hen 鈥 rather than if 鈥 the process is successful, it will make an enormous difference to the way medievalists approach their subject.鈥

Launch and Demonstration:

The U of T team and the Bentham Project are . Register online for the Zoom seminar. A . Hundreds of medievalists from around the world are already registered.


from Arts & Science News.