Title: Yr Amliadur: Frequency Lists for Contemporary Welsh (Version 1.0.0)
Citation
Knight D, Morris S, Tovey-Walsh B, et al. (2020). Yr Amliadur: Frequency Lists for Contemporary Welsh (Version 1.0.0). Cardiff University. https://doi.org/10.17035/d.2020.0120164107
Access Rights: Creative Commons Attribution Share Alike 4.0 International
Access Method: Click to email a request for this data to opendata@cardiff.ac.uk
Dataset Details
Publisher: Cardiff University
Date (year) of data becoming publicly available: 2020
Data format: .xls, .pdf
Estimated total storage size of dataset: Less than 100 megabytes
Number of Files In Dataset: 4
DOI : 10.17035/d.2020.0120164107
DOI URL: http://doi.org/10.17035/d.2020.0120164107
Related URL: https://www.corcencc.org
Yr Amliadur contains the following sample frequency lists of contemporary Welsh language usage: The sample frequency lists are based on the CorCenCC (Corpws Cenedlaethol Cymraeg Cyfoes - National Corpus of Contemporary Welsh, Knight et al., 2020 which includes 14,338,149 tokens (circa 11.2-million-words). The data in CorCenCC represents a wide range of contexts, genres and topics and has, as far as possible, been anonymised using a combination of manual and automated techniques, and fully tagged in terms of part-of-speech (POS) and semantic categories. The research on which this frequency list dataset is based was funded by the UK Economic and Social Research Council (ESRC) and Arts and Humanities Research Council (AHRC) as the Corpws Cenedlaethol Cymraeg Cyfoes (The National Corpus of Contemporary Welsh): A community driven approach to linguistic corpus construction project (Grant Number ES/M011348/1). All outputs from the CorCenCC project are licensed under Creative Commons CC-BY-SA v4 and thus are freely available for use by professional communities and individuals with an interest in language. Bespoke applications and instructions are provided for each tool. When reporting information derived by using the CorCenCC corpus data and/or tools, CorCenCC should be appropriately acknowledged.
Description
Related Projects