Speech Recognizing Comparisons Between Web Speech API and FPT.AI API

Tran, D.C. and Nguyen, D.L. and Ha, H.S. and Hassan, M.F. (2022) Speech Recognizing Comparisons Between Web Speech API and FPT.AI API. Lecture Notes in Electrical Engineering, 770. pp. 853-865. ISSN 18761100

Full text not available from this repository.
Official URL: https://www.scopus.com/inward/record.uri?eid=2-s2....

Abstract

Nowadays, people use speech recognition services for many purposes in their daily lives, such as learning foreign languages, communicating, etc. Therefore, they need to decide which ones to use. High accuracy and short processing time speech recognition service will help improve the work effectively as the time to re-check output results and the delay time between recognition tasks. For Vietnamese speech recognition, Web Speech API and FPT.AI API are popular. Web Speech API supports multiple languages, while FPT.AI API focuses on Vietnamese as FPT.AI�s products are developed exclusively for the Vietnamese market. In order to assist people in choosing a suitable Vietnamese speech recognition service, in this paper, the speech recognizing accuracy and processing time between Web Speech API and FPT.AI API has been compared. 307 audio files containing Vietnamese speeches which are obtained from FPT Open Speech Dataset were chosen to test the accuracy and the processing time of both APIs. For the accuracy test, FPT.AI API was 0.57 more precise than Web Speech API. However, in the processing time test, Web Speech API was 50.99 faster than FPT.AI API. For Web Speech API, it was mostly accurate to process 12�14-second-long audio files, while FPT.AI API did best when process 2�4-second-long audio files. The audio files with duration values between 2 and 8 seconds are optimal for both APIs to proceed with STT conversions. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

Item Type: Article
Additional Information: cited By 0; Conference of 12th National Technical Seminar on Unmanned System Technology, NUSYS 2020 ; Conference Date: 24 November 2020 Through 25 November 2020; Conference Code:266059
Uncontrolled Keywords: Application programming interfaces (API); Character recognition; Machine learning; Speech; Statistical tests, Applications programming interfaces; Audio files; Daily lives; Foreign language; FPT.; Processing time; Speech-to-text; Vietnamese; Vietnamese speech; Web speech, Speech recognition
Depositing User: Mr Ahmad Suhairi UTP
Date Deposited: 19 Dec 2023 03:24
Last Modified: 19 Dec 2023 03:24
URI: https://khub.utp.edu.my/scholars/id/eprint/17841

Actions (login required)

View Item
View Item