17th March, John Percival Building, 5.00pm
Last week, Cardiff Digital Cultures Network attended the launch of CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes, The National Corpus of Contemporary Welsh), a brand new interdisciplinary project led by Dr Dawn Knight from Cardiff University’s School of English, Communication and Philosophy. Funded by £1.8m from the ESRC and AHRC, the project combines expertise from Computer Science, Applied Linguistics and Education, with the aim being the creation of the first large scale open access word corpus of contemporary Welsh language use.
Additionally, the project will become the first semantic tagger of Welsh, where data is understood by its meaning; the first corpus to trust community crowdsourcing for data collection (via an app); and it will become the first user-defined corpus – integrating traditional corpus tools with bespoke applications. It will also contain in-built sustainability and will, the team hopes, provide a model of corpus construction for under-resourced languages.
The project has it origins way back in 2011 when team members Dr Knight and Dr Tess Fitzpatrick were thinking through innovative ideas to help people learn Welsh in more interesting ways. My favourite of these ideas was the Twister dance mat that would be connected to ipads and when users/players landed on a certain circle they would then have to speak some Welsh. It was around this point, however, that both Dr Knight and Dr Fitzpatrick realised that there was no Welsh Corpus.
What, then, is a corpus? It is a collection of language data in use; spoken, written and digital. It, as Dr Knight notes, ‘allows users to identify and explore language as it is actually used, rather than relying on intuition or prescriptive accounts of how it “should” be used.’ The significance of this work lies in its importance for the development of technologies such as predictive text production, word processing tools, machine translation, voice recognition and search tools. For example, the team are already in discussion with Apple about the creation of a Welsh Siri. Bendigedig, I say. ‘I’m sorry, I don’t understand that’, Siri currently says. But that will soon change if this tremendous project achieves its goals.
For more information about the CorCenCC project see: https://sites.cardiff.ac.uk/corcenc
– Michael Goodman