{"id":5119,"date":"2016-10-19T02:40:44","date_gmt":"2016-10-19T02:40:44","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=5119"},"modified":"2016-10-19T02:40:44","modified_gmt":"2016-10-19T02:40:44","slug":"getting-started-google-cloud-speech-api","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/getting-started-google-cloud-speech-api\/","title":{"rendered":"Getting started with the Google Cloud Speech API"},"content":{"rendered":"<p>Google has a &#8220;speech to text&#8221; API. At the moment, they are advertising a $300 credit for new accounts, so I thought this might be a good fit for an app I&#8217;m working on to <a href=\"https:\/\/www.findlectures.com\">search\/discover standalone lectures<\/a>. In this article I&#8217;ll talk about how you go about setting up a proof-of-concept. My thoughts\/opinions on the experience are at the end. <\/p>\n<p>The Google Speech API is part of Google&#8217;s larger platform, so if you&#8217;re doing this for the first time, you&#8217;ll need to follow a series of steps. While this may look like a lot of work, it&#8217;s still an order of magnitude easier than setting up the open source projects used for audio transcription (kaldi\/sphinx).<\/p>\n<p>1. Create an account<br \/>\n2. Enable Google Speech API<br \/>\n3. Create a service account<br \/>\n4. Install GCloud SDK<br \/>\n5. Activate GCloud SDK<br \/>\n6. Create a bucket<br \/>\n7. Install SOX<sup><a href=\"#footnote_0_5119\" id=\"identifier_0_5119\" class=\"footnote-link footnote-identifier-link\" title=\"http:\/\/sox.sourceforge.net\/\">1<\/a><\/sup><br \/>\n8. Convert MP3s to RAW file format using Sox<br \/>\n9. Upload converted files (command line &#8211; gsutil has an rsync command)<br \/>\n10. Create a service account, and download the credentials<br \/>\n11. Set an environment variable to point to this file<br \/>\n12. Use Google&#8217;s Python demo example to transcribe your file<sup><a href=\"#footnote_1_5119\" id=\"identifier_1_5119\" class=\"footnote-link footnote-identifier-link\" title=\"https:\/\/github.com\/GoogleCloudPlatform\/python-docs-samples\/tree\/master\/speech\">2<\/a><\/sup><\/p>\n<p>Some of these steps are well-documented in Google&#8217;s docs, so I&#8217;m going to cover the areas that tripped me up. It appears that the Python examples are updated more regularly, so I recommend those. Whatever you choose, Google&#8217;s support does monitor Github tickets, which is very helpful.<\/p>\n<p>A note about sox: this is used for converting between audio formats. Unlike some of the competing products, Google forces you to convert your files into one or two files, which unfortunately offloads a lot of work onto you as a consumer of the API. When I did this, I missed a step in the instructions, found that my files played fine for me, but had the API fail without anything being logged. <\/p>\n<p>Sox can detect clipping in audio files, and will advise you when you can potentially fix the issue by reducing the audio volume. For my lecture search engine, this is a great finding, because I can use the presence of clipping to affect the ranking of files.<\/p>\n<p>To convert a mass of files, you&#8217;ll need to write a small script (make sure to get a lot of disk space &#8211; these files get big)<\/p>\n<pre lang=\"bash\">\nfor %%f in (wav\\*.wav) do (\n  sox -v 0.98 wav\\%%~nf.wav --rate 16k --bits 16 --channels 1 d:\\data\\flac\\%%~nf.raw\n)\n<\/pre>\n<p>For a single file, I&#8217;d upload it to a bucket through the UI, but for a bunch, you can use rsync:<\/p>\n<pre lang=\"bash\">\ngsutil rsync -d d:\\Data\\raw gs:\/\/gsieling-flac\n<\/pre>\n<p>Set your credentials:<\/p>\n<pre lang=\"bash\">\nexport GOOGLE_APPLICATION_CREDENTIALS=\/d\/Data\/search-ff2e0539de94.json\n<\/pre>\n<p>If you get errors about missing libraries, you may need to install some missing dependencies. It&#8217;s very important to use the versions specified in the requirements.txt of the sample project &#8211; some of these have newer versions with large breaking changes.<\/p>\n<pre lang=\"bash\">\npip install gcloud==0.18.2\npip install grpcio==1.0.0\npip install PyAudio==0.2.9\npip install grpc-google-cloud-speech-v1beta1==1.0.1\npip install six==1.10.0\n<\/pre>\n<p>Then transcribe a file<sup><a href=\"#footnote_2_5119\" id=\"identifier_2_5119\" class=\"footnote-link footnote-identifier-link\" title=\"https:\/\/github.com\/GoogleCloudPlatform\/python-docs-samples\/blob\/master\/speech\/api-client\/transcribe_async.py\">3<\/a><\/sup>:<\/p>\n<pre lang=\"bash\">\npython transcribe_async.py --encoding LINEAR16 gs:\/\/gsieling-flac\/14823.raw\n<\/pre>\n<p>If you get the following error, you probably skipped the environment variable:<\/p>\n<pre lang=\"bash\">\ngrpc.framework.interfaces.face.face.AbortionError: \nAbortionError(code=StatusCode.PERMISSION_DENIED, \ndetails=\"Google Cloud Speech API has not been used in\n project google.com:cloudsdktool before or it is disabled. \nEnable it by visiting \nhttps:\/\/console.developers.google.com\/apis\/api\/speech.googleapis.com\/overview?project=google.com:cloudsdktool \nthen retry.\n\nIf you enabled this API recently, wait a few minutes \nfor the action to propagate to our systems and retry.\")\n<\/pre>\n<p>If you get a &#8220;resource exhausted&#8221; error, it actually indicates that you didn&#8217;t convert your files to the correct format:<\/p>\n<pre>\nTraceback (most recent call last):\nFile \"D:\\Software\\Anaconda3\\lib\\site-packages\\grpc\\beta_client_adaptations.py\", \nline 201, in blocking_unary_unary\n\ncredentials=credentials(protocol_options))\nFile \"D:\\Software\\Anaconda3\\lib\\site-packages\\grpc_channel.py\", \nline 481, in __call\n\nreturn _end_unary_response_blocking(state, False, deadline)\nFile \"D:\\Software\\Anaconda3\\lib\\site-packages\\grpc_channel.py\", \nline 432, in _end_unary_response_blocking\n\nraise _Rendezvous(state, None, None, deadline)\ngrpc._channel._Rendezvous: <_Rendezvous of RPC that terminated with\n(StatusCode.RESOURCE_EXHAUSTED, Quota exceeded.)>\n<\/pre>\n<p>Once you get this to work, you&#8217;ll get some output like so:<\/p>\n<pre lang=\"javascript\">\nWaiting for server processing...\nWaiting for server processing...\nWaiting for server processing...\nWaiting for server processing...\nresults {\n  alternatives {\n    transcript: \"forced migration of you issue 44 September 2013\"\n    confidence: 0.787492573261\n  }\n}\nresults {\n  alternatives {\n    transcript: \"voices from inside Australia\\'s detention centres is Melissa Phillips\"\n    confidence: 0.87813615799\n  }\n}\nresults {\n  alternatives {\n    transcript: \"the Harvest Island Beach Australia there is little sense of individual in question\"\n    confidence: 0.745137870312\n  }\n}\n<\/pre>\n<p>Google&#8217;s API seems to envision three major use cases &#8211; streaming audio with commands (e.g. from an app &#8211; they let you specify words you anticipate), short transcriptions, and long, asynchronous transcripts. When I was corresponding with support, they referred to my 15 minute file as &#8220;long&#8221;, which is unfortunate considering most of my files are at the hour length. The docs also advise that you not use MP3s because you can lose information, however there is unfortunately an enormous amount of this out there, so if you are in this situation, it may not be the best API for you.<\/p>\n<p>Performance-wise, I found that this API takes a very long time to complete (maybe ~1\/3 the length of the file), but your mileage may vary.<\/p>\n<p>One thing that surprises me about this API is that while Google has &#8220;buckets&#8221; for storage, you can&#8217;t have the output of your long-running jobs stored there when they finish &#8211; they go into some mystery location that you have to poll until the job finishes, or else they disappear (but count against your bill). <\/p>\n<ol class=\"footnotes\"><li id=\"footnote_0_5119\" class=\"footnote\">http:\/\/sox.sourceforge.net\/<span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_0_5119\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><li id=\"footnote_1_5119\" class=\"footnote\">https:\/\/github.com\/GoogleCloudPlatform\/python-docs-samples\/tree\/master\/speech<span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_1_5119\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><li id=\"footnote_2_5119\" class=\"footnote\">https:\/\/github.com\/GoogleCloudPlatform\/python-docs-samples\/blob\/master\/speech\/api-client\/transcribe_async.py<span class=\"footnote-back-link-wrapper\"> [<a href=\"#identifier_2_5119\" class=\"footnote-link footnote-back-link\">&#8617;<\/a>]<\/span><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>Setting up the Google Speech API<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[12],"tags":[67,352,520],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5119"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=5119"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5119\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=5119"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=5119"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=5119"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}