language

Can a Computer Recognise Dialect?

By Britt van Sloun, translated by Noor de Bruijn

27 April 2026 3 min. reading time

Is a computer able to understand whether a sentence is written in Drents, Limburgs, or Gronings? Researchers at the Meertens Institute decided to take up this question. Using over a thousand Dutch dialect novels and a specifically trained language model, they show that artificial intelligence performs remarkably well at identifying regional speech. This offers not only technical insights, but cultural ones too.

The library of the Meertens Institute holds over 1,100 dialect novels, written in a wide range of regional languages across different periods. “We knew that there was something special in that collection,” says Nikki Beyer, who started on the project as an intern and is now involved as a PhD student. “All those dialects preserved in writing, we felt there had to be more to discover.” The first step was digitising the books, a labour-intensive process that took over a year. Only then could the real research begin: Can a language model tell the difference between dialect and Standard Dutch?

Covers of dialect novels written by Bart Veenstra
© Meertens Institute

From BERTje to Meertje

For the experiment, Beyer and her colleagues used an existing language model: BERTje, developed at the University of Groningen. They retrained the model on tens of thousands of sentences taken from dialect novels. The result was a new model with a new name: Meertje. Meertje may look similar to generative AI systems like ChatGPT, but it works differently. “This isn’t a model that generates text,” Beyer stresses. “Meertje analyses language. It reads sentences and figures out whether they are written in dialect.”

To teach the model this, Beyer manually labelled around thirty thousand sentences as either ‘dialect’ or ‘no dialect’. The model was first tested on Drents, using texts from one specific writer: Bart Veenstra. It soon became clear that Meertje was starting to pick up on patterns that set Drents apart from Standard Dutch. Then came the real surprise: The model could identify other Dutch dialects too, from Gronings to Limburgs, with an accuracy rate of around 95 percent. “That was honestly a dream outcome,” says Beyer.

Nikki Beyer, first as an intern and now as a PhD candidate affiliated with the project Dialect Novels at the Meertens Institute.

More than just spelling

What Meertje detects is more than just distinctive spelling. A simple phrase such as ‘wat zult dat’ was immediately identified as dialect by the model. That is because the model also pays attention to structure, rhythm, and grammatical patterns. Vowel combinations that are rare in Standard Dutch but common in Drents also play a role. The same goes for subtle differences in verb forms, noun endings, and sentence structure. What stands out is that Meertje learns how dialects differ from Standard Dutch and is then able to use that knowledge to recognise other regional languages.

Dialect as a social signal

The research also revealed literary insights. In a lot of these novels, the main characters speak dialect, while doctors, officials, and other people in positions of power speak Standard Dutch. “Even among writers who speak dialect themselves, you can see that language is never socially neutral,” says Beyer. Dialect signals closeness, identity, and sometimes inferiority, whereas Standard Dutch can convey distance and authority. The language model makes these patterns visible at scale.

Dialect shows where you come from

Beyer first studied literary studies and later linguistics. This project brings those two worlds together. “It sits exactly at the intersection of what I love,” she says. “Using computational methods to understand something human and cultural.” That human aspect is more relevant than ever before. Dialects are losing ground in everyday life, yet they are also making a comeback. “In Limburg and Friesland, young people are embracing their regional language with pride,” says Beyer. “Dialect is not outdated. It shows exactly where you come from.” Dialect novels are especially important because they express identity and culture through literature.

A large digital collection of dialect texts

The next research question is figuring out why Meertje identifies a sentence as dialect. Which features matter most? Sounds, grammar, word order? These questions should ultimately tell us more about the linguistic structure of dialects.

At the same time, the researchers are building a large digital collection of dialect texts. After removing digitisation errors, each text is enriched with metadata on location, time period, and language use. By linking these texts to dialect dictionaries and other databases, researchers will be better able to compare how written dialect relates to spoken language.

language

Can a Computer Recognise Dialect?

From BERTje to Meertje

More than just spelling

Dialect as a social signal

Dialect shows where you come from

A large digital collection of dialect texts

Britt van Sloun

Leave a Reply Cancel reply

American Prisons Have a Touch of Ghent

Belgian Beats and Dutch Dance in the United States

From New Amsterdam to New York: Take a Walk on the Wild Side

Dutch Money for an American Revolution

The Witch Swoops Back Into the Spotlight

How a Belgian Tower Steeped in Colonial Symbolism Became the Setting for African American Emancipation

Dutch Light in the Hamptons: Willem de Kooning Gave Direction to American Art

From Mercator to Google Maps at the Renovated MAP Mercator Museum

Support us

About us

More information

Stay informed

Mailchimp newsletter footer

Can a Computer Recognise Dialect?

From BERTje to Meertje

More than just spelling

Dialect as a social signal

Dialect shows where you come from

A large digital collection of dialect texts

Leave a Reply Cancel reply

You might also like

Support us

About us

More information

Stay informed

Mailchimp newsletter footer