npm 包 speechmatics 使用教程-JavaScript中文网-JavaScript教程资源分享门户

npm 包 speechmatics 使用教程

简介

speechmatics 是一个基于云端的语音识别服务，可以帮助我们将音频转录为文本。而 npm 包 speechmatics 为我们提供了与 speechmatics 语音识别服务的接口，让我们能够在我们的前端应用上快速而方便地获取文本信息。本文将对这一 npm 包进行详细的介绍和使用教程。

安装

首先，我们需要在代码中引入 speechmatics，可以通过 npm 来安装：

--- ------- ------------

使用

使用 speechmatics 的过程中需要访问 speechmatics 后端接口，因此我们需要先获取 speechmatics 的 Access Token 和 User ID，登录 speechmatics 官网即可获取。

获取 Access Token 和 User ID 后，我们可以使用 speechmatics 的语音识别服务了。

下面是具体的使用教程：

引入 speechmatics

----- ------------ - -----------------------

初始化 Speechmatics 对象

----- ------------------- - -
  -------- ---------------
  ------ --------------------
  ------ -------
-

----- ------------ - --- ---------------------------------

在初始化 speechmatics 时需要传入以下参数：

user_id: 用户 ID，必填
token: Access Token，必填
model: 语音模型，选填，默认值为 'en-US'，即英语模型。

其中，Access Token 必须是有效的，不然将无法使用 speechmatics 的服务。

进行语音识别

在语音识别之前，我们需要获取输入音频的数据。这里我们使用 web-audio-recorder-js 库进行录音，然后将录音结果提交给 speechmatics 进行识别。

以下是录音并提交到 speechmatics 进行识别的代码：

----- -------- - --- ---------------------------- -
  ---------- -------------
  --------- ------
  ------------ --
  ----------------- ---------- --------- -- ---
  ---------------- ---------- --------- -- ---
  ---------- ---------- -- ---
  ----------- ---------- ----- -- -
    ----- ------ - -
      --------- --------------
      ----------- -------
      ------ --------------------------
      ---------- -----
    -

    -------------------------------
      ---------- -- -
        --------------------------------------------
      --
      -------- -- -----------------
  -
--

-------------------------

这里需要进行一些说明：

audioInput 为录音的 HTML 元素，可以通过 document.querySelector('audio') 获取到
workerDir 是 web-audio-recorder-js 库需要的参数，指定 worker 文件所在的目录
encoding 为录音格式，这里使用 mp3 格式
numChannels 为录音声道数，这里指定为 2
onComplete 函数中，将获取到的录音 blob 数据提交给 speechmatics 进行识别。

识别接口的参数 params 包括以下信息：

duration: 语音的长度，单位为秒
user_token: 用户自定义的标识，选填
model: 语音模型，选填，默认值为初始化时传入的参数
data_file: 录音文件的 blob 数据，必填

返回结果

speechmatics 服务返回的结果包括了语音识别的文本信息、开始时间、结束时间、段落数等信息。其中，文本信息即为识别后的文字内容。我们可以通过以下代码获取识别结果：

-------------------------------
  ---------- -- -
    --------------------------------------------------
  --
  -------- -- -----------------

在输出识别结果时，我们从 data.transcription[0].text 获取识别后的文本信息。

结语

在本文中，我们介绍了如何安装和使用 npm 包 speechmatics，如何使用该包来实现前端语音转文字的功能。希望读者能够通过本文的内容掌握 speechmatics 的使用方法，并在实际应用中提高开发效率和用户体验。

来源：JavaScript中文网，转载请联系管理员！本文地址：https://www.javascriptcn.com/post/6006709f8ccae46eb111f06a