Home
Documentation
Protocols
Download
Contacts

Documentation

Few items are required for recording. The most important prerequisite is a quiet room without reverberation. Although many speech features, especially measurements of rate or duration of segments such as diadochokinetic rate can be quite robust, several features are sensitive to environmental noise such as harmonic-to-noise ratio. Therefore, we recommend closing the windows, shutting down all cell phones, and checking for other sources of noise if necessary. The reverberation of larger rooms can influence the results when echoed waves interfere with the speaker’s acoustic signal. A well-placed curtain can significantly reduce the reverberation.

Recording systems

We live in the era when transmission of audio signal became part of our daily life. Despite the availability and rising popularity of compact recording systems such as USB microphones and smartphones, the acoustic analysis still relies on the traditional approach based on a microphone and recording device connected as individual pieces of hardware. The rationale is that the compact all-in-one systems are usually designed as inexpensive solutions for broadcast, audio- or video-blogging, or voice communication. Indeed, a curious user can hardly find a detailed datasheet for his or her smartphone microphone. Contrary, vrious characteristics and their toleration range are presented for professional microphones. We do not want to say that USB microphones and smartphones provide lower grade analysis since recent research showed that some acoustic features could be calculated on the signal recorded by smartphone (see Rusz et al. 2018), however, a subset of features cannot be generalized to all processing methods and smartphones. Evidence that chep microphones work as well is too scarce. Therefore, we suggest using professional high-quality microphones and recorders.

Additionally, we highly recommend using the same type of recording system for the examination as the one used for the recording of the normative data. The normative data incorporated into our system were recorded using microphone Beta 53 by Shure (Niles, Illinois, USA) and recorder DR40 V2 by TASCAM (Santa Fe Springs, California, USA). We cannot guarantee the interchangeability of results for all the features. If the any user would like to use any different recording system, we can help and incorporate any custom normative data into the software upon request. In the following subchapters, we briefly describe microphone characteristics that may help with selecting appropriate hardware considering mechanical construction, characteristics of the transducer as well as electrical compatibility.

Microphones

We all know that we can whisper into one's ears and be louder than someone shouting on the other side of the street. The closer we get to the source, the more "effectively" we can capture the sound. Although placing the microphone on the table in front of the speaker may look like a good idea, we must take more advantage of the close-distance recording in a clinical setting since the environmental noise can be higher in a hospital or office than in acoustically isolated broadcast studios, where this approach is often seen. The best and most popular solution is to use a headset microphone and place the capsule around 5 cm (approximately 2 inches) to the speaker's mouth.

The headset ensures that the distance between mouth and microphone will be stable even when the patient will move while keeping the ratio between the level of the speech signal and environmental noise low. A very important parameter that determines the suitability of the microphone is the frequency characteristic. The frequency characteristic means how sensitively will be the given frequency captured. Vaguely speaking you can imagine a change in frequency characteristic as the change of your hearing when you put your head-dress on your ears and everything sound different. We prefer the frequency characteristic to be as flat (deviation ~ 2dB) and broad (range from 50 to 20000 Hz is perfect) as possible. The next parameter to consider is directionality, and it is even more critical since it has implications on the frequency characteristic. When the microphone is directional, its noise performance is better because the sound outside its direction is captured less sensitively. However, the directionality makes the microphone to change its frequency characteristics with distance from the source. This phenomenon is called the proximity effect and it is one of the reasons why some microphones are decorated with the low-cut button. You must place the microphone to the distance that has the flattest frequency characteristic, which is usually defined in the datasheets. The so-called omnidirectional microphones show similar frequency characteristic in all directions and distances from the source, therefore are more convenient choice despite increased sensitivity to environmental noise.

Not only these requirements favor the condenser microphones over the cheaper and more common dynamic microphones. Since the diaphragm of condenser microphones can be lighter acoustic waves are transduced with increased sensitivity. Condenser microphone can also handle higher sound pressure and do not change its performance in various load impedances unlikely to passively driven dynamic microphones. The word passively driven indicates that the mic does not require a power source. Majority of condenser microphones are actively driven, i.e., you require an extra power supply for the microphone to amplify the signal of electret condenser capsule before transmission (note that other condenser microphones require voltage principally even to capture acoustic waves). This extra voltage is usually transmitted hidden together with the signal (hereby phantom voltage). Phantom voltage ranges typically from 3 to 50 V. The higher voltage is typically associated with a higher quality of the recording.

The last factor to consider is the cable that connects the microphone to the recorder. It has to be at least 1 m (3.3 feet) long to allow moving. An optional feature of the microphone is the balanced output, which decreases electric disturbances. Not every recorder can utilize it, but it does not limit the usability of the microphone with the given recorder.

In summary, there are three key-words to look for when selecting a professional microphone: headset, condenser, and omnidirectional. Other parameters can also be important, but it would be outside the scope of these pages to walk through all of them. For more information on the topic, we recommend the article by Švec and Granqvist (2010).

In summary, there are three key-words to look for when selecting a professional microphone: headset, condenser, and omnidirectional. There are other parameters that are also important, but it would be outside scope of these pages to walk through all of them. For more information on the topic, we recommend article by Švec and Granqvist (2010).

Recorder

First and foremost, the recorder parameters must match the parameters of the microphone used. The connector has to be of the same type as microphone otherwise a converter will be required. The phantom voltage of the recorder must fit the range required by the microphone-beware that exceeding the microphone maxima can cause damage. Almost all current recorders exploit the advantages of the digitalization and record the data into a file in rewritable memory. The inbuild analog-to-digital converter has to have at least 16 bits with a sampling rate of at least 44100 Hz. The data format is commonly wav that stores the raw waveform. Although a lossy compression into mp3 and other formats are incorporated into many recorders to reduce the size of recording files, the use of lossy compression can destroy the temporal features of the acoustic signal. Nevertheless, the loss-less compression such as the m4a can be used because it can restore the raw waveform recorded. To avoid confusion, we suggest to always use raw audio formats.

Most recorders allow copying the audio files into the computer via USB cable or removable memory card. Some systems may provide also optional blue-tooth or other wireless technologies, which transmission protocols may not be safe for medical data. We do not recommend to use these technologies to avoid possible legal implications.

The pocket recorders are usually battery powered and some allow to be charged via USB cable. These are some convenient details that can make your recording easier.

Additionally, there exist professional sound-cards such as the Computerized Speech Lab (KayPentax, Pine Brook, New York, U.S.A.) which incorporates a dynamic vocal microphone, sound-card, and computer into one product.

Manual editing

Manual editing and supervision is still a necessity for all credible methods of acoustic analysis. The rationale for supervision is strongly related to technological limitations. Manual editing allows to perform a quality check incomparable to an automated decision. When you listen to your recording or inspect its oscillogram or spectrogram visually, you can judge easily whether or not your data are worthy of analysis. You may think that when you have been present during the recording, you can judge it based on your memory. However, many sounds such as ambulance sirene are natural in many clinics and thus hard catch for your attention. Also, the microphone connector may get loose or break during the recording session, which you do not want to analyze. Finally, some repetitions of the task were not possibly well performed or instructed and you want to check it out and select the good ones. Despite huge development in the field of "artificial intelligence", there are no technologies that could mimic a common sense required to edit and check the file.

You can use almost any audio-editing software to check and trim the recordings because the oscillogram and spectrogram, as well as a trimming function, have become a standard accessories of audio editors. Each software has a slightly different interface, but the principle is always the same: 1. select the interval of interest, 2. play it, inspect, and adjust the boundaries, 3. save the interval into another audio file. The last point has to be stressed since some programs allow you to perform analysis within the selected interval without saving. In the long term, you want to know where your values come from. A different part of the recording may yield different data so saved interval is necessary evidence for any reliable longitudinal follow-up. illustrates the essence of trimming. For more information about trimming in other programs, we refer to their respective manuals and tutorials.

Automated processing

The software is currently written in the MATLAB environment of version 2015 and higher using "Signal Processing Toolbox" and "Statistics and Machine Learning Toolbox". After you download the extract_dysarthria_analyzer.exe and install_dysarthria_analyzer.exe (for windows), copy these files into your preferred location and click on install_dysarthria_analyzer. The installer will extract all the required files in the ./dysarthria_analyzer folder and example of the database in the ./database folder and clean the installation files.

Please, load your data into ./database directory following the specific naming and storing convention before you start the app. To start the app, please, click on the dysarthria analyzer.bat, which starts the minimal MATLAB on the background and show a graphical interface. Following tutorial illustrates how to use the app:

Full documentation for the software including loading of the data is decribed in the doctoral thesis "Automated analysis of speech disorders in neurodegenerative diseases" by Jan Hlavnička.

References

Rusz, J., Hlavnička, J., Tykalová, T., Novotný, M., Dušek, P., Šonka, K., & Růžička, E. (2018). Smartphone allows capture of speech abnormalities associated with high risk of developing Parkinson’s disease. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 26, 1495-1507.

Švec and Granqvist (2010). Guidelines for Selecting Microphones for Human Voice Production Research. American Journal of Speech-Language Pathology, 19, 356-368.