eprintid: 17841 rev_number: 2 eprint_status: archive userid: 1 dir: disk0/00/01/78/41 datestamp: 2023-12-19 03:24:09 lastmod: 2023-12-19 03:24:09 status_changed: 2023-12-19 03:08:46 type: article metadata_visibility: show creators_name: Tran, D.C. creators_name: Nguyen, D.L. creators_name: Ha, H.S. creators_name: Hassan, M.F. title: Speech Recognizing Comparisons Between Web Speech API and FPT.AI API ispublished: pub keywords: Application programming interfaces (API); Character recognition; Machine learning; Speech; Statistical tests, Applications programming interfaces; Audio files; Daily lives; Foreign language; FPT.; Processing time; Speech-to-text; Vietnamese; Vietnamese speech; Web speech, Speech recognition note: cited By 0; Conference of 12th National Technical Seminar on Unmanned System Technology, NUSYS 2020 ; Conference Date: 24 November 2020 Through 25 November 2020; Conference Code:266059 abstract: Nowadays, people use speech recognition services for many purposes in their daily lives, such as learning foreign languages, communicating, etc. Therefore, they need to decide which ones to use. High accuracy and short processing time speech recognition service will help improve the work effectively as the time to re-check output results and the delay time between recognition tasks. For Vietnamese speech recognition, Web Speech API and FPT.AI API are popular. Web Speech API supports multiple languages, while FPT.AI API focuses on Vietnamese as FPT.AIâ��s products are developed exclusively for the Vietnamese market. In order to assist people in choosing a suitable Vietnamese speech recognition service, in this paper, the speech recognizing accuracy and processing time between Web Speech API and FPT.AI API has been compared. 307 audio files containing Vietnamese speeches which are obtained from FPT Open Speech Dataset were chosen to test the accuracy and the processing time of both APIs. For the accuracy test, FPT.AI API was 0.57 more precise than Web Speech API. However, in the processing time test, Web Speech API was 50.99 faster than FPT.AI API. For Web Speech API, it was mostly accurate to process 12â��14-second-long audio files, while FPT.AI API did best when process 2â��4-second-long audio files. The audio files with duration values between 2 and 8 seconds are optimal for both APIs to proceed with STT conversions. © 2022, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. date: 2022 publisher: Springer Science and Business Media Deutschland GmbH official_url: https://www.scopus.com/inward/record.uri?eid=2-s2.0-85116480522&doi=10.1007%2f978-981-16-2406-3_64&partnerID=40&md5=a1ffad457cfbb1bb01be650287375c0d id_number: 10.1007/978-981-16-2406-3₆₄ full_text_status: none publication: Lecture Notes in Electrical Engineering volume: 770 pagerange: 853-865 refereed: TRUE isbn: 9789811624056 issn: 18761100 citation: Tran, D.C. and Nguyen, D.L. and Ha, H.S. and Hassan, M.F. (2022) Speech Recognizing Comparisons Between Web Speech API and FPT.AI API. Lecture Notes in Electrical Engineering, 770. pp. 853-865. ISSN 18761100