[GCP] Google Cloud Speech API

keywords: `cloud`, `GCP`

Cloud Speech-to-text @ Google Docs

Cloud Speech: Node.js Client @ googleapis doc

擷取 1 分鐘以上的長音檔

Transcribing long audio files @ Google Docs > Cloud Speech-to-Text

若要轉錄 1 分鐘以上的音檔，該音檔需要放到 GCS 上
因為 1 分鐘以上的音檔需要處理較長的時間，因此會分成兩個請求，第一個是請求「開始轉錄的操作」，第二個請求是「取得轉錄的結果」，這個結果可能不會馬上完成，當 done 為 true 時才可得到轉錄的結果（透過 JavaScript sdk 可以再一次 async function 中等到結果完成後再繼續執行）。

請求開始轉錄

這個請求會請 Google Cloud 開始處理某一部音檔的轉錄，並且取得 operation name，之後便需要透過這個 operation name 來詢問處理狀況並取得轉錄結果：

// Authorization Token 的取得可以透過 GCP 的 CLI 輸入
// gCloud auth application-default print-access-token

var request = require('request');

var options = {
  method: 'POST',
  url: 'https://speech.googleapis.com/v1/speech:longrunningrecognize',
  headers: {
    Authorization: 'Bearer $(gcloud auth application-default print-access-token)',
    'Content-Type': 'application/json',
  },
  body: {
    audio: { uri: 'gs://cloud-speech-videos/long-audio.wav' },
    config: {
      encoding: 'LINEAR16',
      sampleRateHertz: 48000,
      languageCode: 'zh-TW',
      enableWordTimeOffsets: true,
    },
  },
};

request(options, function (error, response, body) {
  if (error) throw new Error(error);

  console.log(body);
});

會取得帶有 operation name 的回應，之後便需要透過這個 operation name 來詢問處理狀況並取得轉錄結果：

{
  "name": "8662420302733843496"
}

詢問轉錄狀況並取得結果

在 Request URL 中會帶入 operation name，例如：

https://speech.googleapis.com/v1/operations/<oprenation-name>

var request = require('request');

var options = {
  method: 'GET',
  url: 'https://speech.googleapis.com/v1/operations/8662420302733843496',
  headers: {
    Authorization: 'Bearer $(gcloud auth application-default print-access-token)',
    'Content-Type': 'application/json; charset=utf-8',
  },
};

request(options, function (error, response, body) {
  if (error) throw new Error(error);

  console.log(body);
});

文字尚未轉錄完成前會得到如下結果：

{
  "name": "7219429054181291819",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "startTime": "2019-08-12T06:50:00.776262Z",
    "lastUpdateTime": "2019-08-12T06:50:04.302192Z"
  }
}

轉錄完成後會得到 done 屬性為 true：

{
  "name": "7219429054181291819",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeMetadata",
    "progressPercent": 100,
    "startTime": "2019-08-12T06:50:00.776262Z",
    "lastUpdateTime": "2019-08-12T06:50:17.442272Z"
  },
  "done": true,
  "response": {
    "@type": "type.googleapis.com/google.cloud.speech.v1.LongRunningRecognizeResponse",
    "results": [
      // ...
    ]
  }
}

其他

取得時間戳記（Getting word timestamps）

keywords: `time offset`, `timestamp`

Getting word timestamps @ Google Doc - Cloud Speech-to-Text

若想要取得時間戳記，可以在 config 內加上 enableWordTimeOffsets 為 true：

const config = {
  // ...
  enableWordTimeOffsets: true,
};

取得的結果會以 100ms 為單位，取得每個單字的時間戳記（timestamp）。

keywords: cloud, GCP​

擷取 1 分鐘以上的長音檔​

請求開始轉錄​

詢問轉錄狀況並取得結果​

其他​

取得時間戳記（Getting word timestamps）​

keywords: time offset, timestamp​