As one of the mainstream methods of voice human-computer interaction, voice has unique advantages and charms. A seemingly short piece of voice not only contains the text content that the speaker wants to convey, but also contains the speaker’s identity, language category, speaker’s emotional state, the environment in which they speak, etc.
What is Voice Annotation?
Speech annotation is a relatively common type of annotation in the data annotation industry. Speech annotation means that the annotator first “extracts” the text information and various sounds contained in the voice, and then transcribes or synthesizes it. The data after voice data annotation is mainly used for artificial intelligence machine learning, which is equivalent to giving computers The system is equipped with “ears” so that it has the function of “hearing”, so that the computer can realize accurate speech recognition.
What are the voice annotation methods?
1. Voice cleaning
Voice cleaning is the process of re-examining and verifying the voice, the first step in voice data preprocessing, and an important part of ensuring the correctness of subsequent results.
2. ASR voice transcription
ASR is automatic speech recognition technology, which is a technology that converts human speech into text. Speech transcription is the process of transcribing speech data into text data, and it is a relatively common tagging form in the field of data tagging.
3. Emotional judgment
Emotional information in speech is a very important behavioral signal that reflects human emotions, and recognizing the emotional information contained in speech is an important part of realizing natural human-computer interaction. Emotion judgment is to judge the emotional intention of the character’s language content in the audio for some dialogue data, such as: expressing questions, needs or complaints and suggestions, etc.
4. Voice cutting
Speech segmentation is the process of identifying boundaries between words, syllables, or phonemes in natural language. Speech segmentation is an important subproblem in the field of speech recognition technology.
5. Voiceprint recognition
Voiceprint recognition is a kind of biometric recognition technology, through the characteristic analysis of one or more kinds of voice signals to achieve the purpose of identifying unknown voices, simply put, it is a technology to identify whether a certain sentence is said by someone .
6. Phoneme labeling
A phoneme is the smallest unit of speech divided according to the natural properties of speech. It is analyzed according to the pronunciation actions in a syllable, and an action constitutes a phoneme.
7. Temperament labeling
Prometic annotation in speech synthesis systems generally adopts the method of predicting prosody based on text information. Taking Chinese labeling as an example, the rhythm prediction is performed based on text information, and the rhythm prediction results are usually determined based on information such as initials, finals, words, phrases, and paragraphs.
8. Pronunciation proofreading
Pronunciation proofreading is the process of collecting data during the entire oral training process and correcting non-standard pronunciation.