NordFA: Forced alignment for Nordic languages
The study of sound change in the English‐speaking world has benefited tremendously from forced alignment, permitting large quantitative studies to be conducted by small teams or even solo researchers. The Nordic languages, on the other hand, have been neglected. To aid my dissertation work, I have created Forced Alignment for Nordic Languages, aka NordFA. The program currently works for Danish and Swedish. My colleague Michael McGarrah has also built a generalized version of FAVE that can align more than one language within the same program.
NordFA is not scheduled for public release, because both my PhD and this project are currently self-funded. For the time being (until I can secure an academic position), I am available for paid consulting work whereby I align your files from my computer and send them back to you. Kindly contact me at n8.young at gmail.com for a free trial and pricing details.
What is forced alignment?
It is software that automatically locates the boundaries between the sounds corresponding to the phones that make up a fragment of speech. NordFA's input is a sound file and its orthographic transcription. Its output is a segmented file that is readable in Praat (Boersma & Weenink, 2017).
What software is NordFA adapted from?
NordFA is built on the original architecture of Forced Alignment and Vowel Extraction (FAVE; Rosenfelder, Fruehwald, Evanini, & Yuan, 2011), which is a python-based program that incorporates the Hidden Markov Model Toolkit (HTK; Young, Woodland & Byrne, 1993).
What type of speech can it process?
It can segment natural speech in long sound files. Unlike other programs, NordFA does not require sound files to be broken into smaller pieces. It can segment files that are hours long (although the segmentation for a 90-minute file can take up to 24 hours!).
What languages does it segment?
This edition includes Swedish (SweFA) and Danish (DanFA). Norwegian is not part of this release but is planned for the future.
Sounds great, but is anyone else using it?
Currently, Aarhus Unversity's The Puzzle of Danish is using DanFa. Nobody is using SweFA yet except me.
How much time will this save me?
75% of the time spent in manual segmentation is building and populating the TextGrid cells. 25% is deciding where the boundaries go. NordFA, even in its most primitive version, reduces the time spent on segmentation by 75%. See the figure to the right.
Doesn't something like this exist already?
As far as I am aware of, there is no available forced alignment for Danish in circulation. For Swedish, only a "build-framework" is available via The Montreal Forced Aligner. Therefore, I believe anyone who works in Nordic phonetics can benefit from this software. It has helped my own analysis greatly.
How do I know its pronunciation dictionary is correct?
SweFA and DanFA’s lexicons come from three sources: (1) data seized from the insolvent Nordic Language Technology Holdings Inc. via the National Library of Norway, (2) the online Swedish slang dictionary slangopedia.se, (3) and manual entries from my research corpus, which contains 115 hours of Stockholm Swedish.
But how accurate are the alignments?
It is accurate enough that it has saved me thousands of hours on my dissertation project. But it still needs improvement!
SweFA is the most developed of the two and currently exists for Central Swedish. Its phonetic dictionary contains more than 2.8 million entries, which include elided and syncopated pronunciations (konstnärerna >> konsnärna), inflected forms (prata, pratar), and most common compound words (otrevlig, jättetrevlig). Moreso, it has multiethnolectal entries. It also has a “powersandher” that identifies retroflex coalescence (för sig >> fö rsig) and apocopes (ringde >> ringd). It codes vowels for lexical pitch accent 1, 2, and compound-word pitch accent 2.
Tested on a casual speech recording of young multiethnolectal men in Stockholm, the phonetic dictionary covered 99.8% of all words (n=6284). Compared with manual alignment for 606 monophones, mean boundary displacements at onsets were 0.021 seconds and 0.020 seconds at offsets. Root mean square deviations were 0.030 and 0.029 for onsets and offsets, respectively.
DanFA’s pronunciation dictionary contains over 200,000 entries and covers 99.5% (n=53,976) of the dialogue transcriptions in DanPASS (Grønnum, 2009). Multiethnolectal slang and schwa-assimilated pronunciations are not yet included but will be part of the next release (spring 2018). The prototype’s test of 144 Copenhagen monophones has rendered promising results.
Bailey, G. (2016). Automatic Detection of Sociolinguistic Variation Using Forced Alignment. University of Pennsylvania Working Papers in Linguistics, 22(2), 3.
Boersma, Paul & Weenink, David (2017). Praat: doing phonetics by computer [Computer program]. Version 6.0.29, retrieved 24 May 2017 from http://www.praat.org/
Grønnum, N. (2009). A Danish phonetically annotated spontaneous speech corpus (DanPASS), Speech Communication, 51(7), 594-603.
Riad, T. (2015). Prosodi i svenskans morfologi [Prosody in the morphology of Swedish]. Stockholm: Morfem förlag.
Rosenfelder, I., Fruehwald, J., Evanini, K., & Yuan, J. (2011). FAVE (Forced Alignment and Vowel Extraction) Program Suite. Retrieved from http://fave.ling.upenn.edu.
Young, SJ., Woodland, PC. & Byrne, WJ. (1993). HTK Version 1.5: User, Reference and Programmer Manual. Publ. Entropic Research Laboratories, Washington DC.