On the Identification of FOSD-based Non-zero Onset Speech Dataset

Dimensions

Tran, D.C. and Ibrahim, R. (2020) On the Identification of FOSD-based Non-zero Onset Speech Dataset. In: UNSPECIFIED.

Full text not available from this repository.

Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Recent trends in voicebot and chatbot application development have enabled utilization of speech-to-text (STT) and text-to-speech (TTS) generation techniques. In order to develop such TTS or STT engines, text and the corresponding recorded speech in an audio file used for training, validating and testing must be aligned. This is to ensure the developed engines achieve the desired conversion quality. In order to align speech and text, an audio alignment tool should be used. In such tools, often onset detection algorithms are utilized for labeling the audio file's speech start and end times. This information is then stored together with the file's transcript. In this work, an open nonzero onset Vietnamese speech dataset is provided. This dataset contains 348 audio files filtered from over 25,000 (approximately 30-hours) Vietnamese speech records released publicly by FPT Corporation, Vietnam in 2018. This amount of labeled data is considered to be more than sufficient for a typical onset detection algorithm researches. Â© 2020 IEEE.

Item Type:	Conference or Workshop Item (UNSPECIFIED)
Additional Information:	cited By 0; Conference of 2020 IEEE Student Conference on Research and Development, SCOReD 2020 ; Conference Date: 27 September 2020 Through 28 September 2020; Conference Code:164976
Depositing User:	Mr Ahmad Suhairi UTP
Date Deposited:	10 Nov 2023 03:27
Last Modified:	10 Nov 2023 03:27
URI:	https://khub.utp.edu.my/scholars/id/eprint/12727

Actions (login required)

: View Item