{"id":4553,"date":"2016-06-21T01:59:20","date_gmt":"2016-06-21T01:59:20","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=4553"},"modified":"2016-06-21T01:59:20","modified_gmt":"2016-06-21T01:59:20","slug":"get-subtitles-youtube-srt-format","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/get-subtitles-youtube-srt-format\/","title":{"rendered":"Get subtitles from Youtube in SRT format"},"content":{"rendered":"<p>To download subtitles from youtube, you can use youtube-dl (this also gets audio, images, metadata):<\/p>\n<pre lang=\"bash\">\nyoutube-dl -o .\/data\/mXC3xGZWo_M\"\/%(id)s.%(ext)s\" -x --sub-lang en \\\n  --write-sub --sub-format vtt --convert-subtitles srt --write-auto-sub \\\n  --continue --write-info-json --write-description --write-annotations \\\n  --min-filesize 50k --ignore-errors --write-all-thumbnails --no-call-home \\\n  --audio-format mp3 mXC3xGZWo_M\n<\/pre>\n<p>This set of arguments will do it&#8217;s best to get you something &#8211; get SRT formatted subtitles if available, if not, try to convert them to SRT. It also tries to get real closed captioning if available, and if not it pulls the automatically generated ones from the speech-to-text software Youtube runs.<\/p>\n<p>I found in doing this that youtube-dl did not handle the conversion properly. I would much prefer the SRT format, as it&#8217;s much simpler to parse.<\/p>\n<pre lang=\"bash\">\n[youtube] mXC3xGZWo_M: Downloading webpage\n[youtube] mXC3xGZWo_M: Downloading video info webpage\n[youtube] mXC3xGZWo_M: Extracting video information\n[youtube] mXC3xGZWo_M: Looking for automatic captions\n[youtube] mXC3xGZWo_M: Searching for annotations.\n[youtube] mXC3xGZWo_M: Downloading MPD manifest\n[info] Writing video description to: data\\mXC3xGZWo_M\\mXC3xGZWo_M.description\n[info] Writing video annotations to: data\\mXC3xGZWo_M\\mXC3xGZWo_M.annotations.xm\nl\n[info] Writing video subtitles to: data\\mXC3xGZWo_M\\mXC3xGZWo_M.en.vtt\n[info] Writing video description metadata as JSON to: data\\mXC3xGZWo_M\\mXC3xGZWo\n_M.info.json\n[youtube] mXC3xGZWo_M: Downloading thumbnail ...\n[youtube] mXC3xGZWo_M: Writing thumbnail to: data\\mXC3xGZWo_M\\mXC3xGZWo_M.jpg\n[download] Destination: data\\mXC3xGZWo_M\\mXC3xGZWo_M.m4a\n[download] 100% of 46.63MiB in 00:12\n[ffmpeg] Correcting container in \"data\\mXC3xGZWo_M\\mXC3xGZWo_M.m4a\"\n[ffmpeg] Destination: data\\mXC3xGZWo_M\\mXC3xGZWo_M.mp3\nDeleting original file data\\mXC3xGZWo_M\\mXC3xGZWo_M.m4a (pass -k to keep)\n[ffmpeg] Converting subtitles\n\nWARNING: video doesn't have subtitles\nERROR: file:data\\mXC3xGZWo_M\\mXC3xGZWo_M.en.vtt: Invalid data found when processing input\n<\/pre>\n<p>You can invoke ffmpeg directly to convert the file (I found this worked better than youtube-dl)<\/p>\n<pre lang=\"bash\">\nffmpeg.exe -i mXC3xGZWo_M.en.vtt mXC3xGZWo_M.en.srt\n<\/pre>\n<p>This works like a charm:<\/p>\n<pre lang=\"bash\">\nffmpeg version N-80386-g5f5a97d Copyright (c) 2000-2016 the FFmpeg developers\n  built with gcc 5.4.0 (GCC)\n  configuration: --enable-gpl --enable-version3 --disable-w32threads --enable-nvenc --enable-avisynth --enable-bzlib --enable-fontconfig --enable-frei0r --enable-gnutls --enable-iconv --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libfreetype --enable-libgme --enable-libgsm --enable-libilbc --enable-libmodplug --enable-libmfx --enable-libmp3lame --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libopus --enable-librtmp --enable-libschroedinger --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvo-amrwbenc --enable-libvorbis --enable-libvpx --enable-libwavpack --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxavs --enable-libxvid --enable-libzimg --enable-lzma --enable-decklink --enable-zlib\n  libavutil      55. 24.100 \/ 55. 24.100\n  libavcodec     57. 46.100 \/ 57. 46.100\n  libavformat    57. 38.100 \/ 57. 38.100\n  libavdevice    57.  0.101 \/ 57.  0.101\n  libavfilter     6. 46.101 \/  6. 46.101\n  libswscale      4.  1.100 \/  4.  1.100\n  libswresample   2.  1.100 \/  2.  1.100\n  libpostproc    54.  0.100 \/ 54.  0.100\nInput #0, webvtt, from 'mXC3xGZWo_M.en.vtt':\n  Duration: N\/A, bitrate: N\/A\n    Stream #0:0: Subtitle: webvtt\nFile 'mXC3xGZWo_M.en.srt' already exists. Overwrite ? [y\/N] y\n[srt @ 05140720] Using AVStream.codec to pass codec parameters to muxers is deprecated, use AVStream.codecpar instead.\nOutput #0, srt, to 'mXC3xGZWo_M.en.srt':\n  Metadata:\n    encoder         : Lavf57.38.100\n    Stream #0:0: Subtitle: subrip (srt)\n    Metadata:\n      encoder         : Lavc57.46.100 srt\nStream mapping:\n  Stream #0:0 -> #0:0 (webvtt (native) -> subrip (srt))\nPress [q] to stop, [?] for help\nsize=     261kB time=00:51:16.69 bitrate=   0.7kbits\/s speed=1.26e+004x\nvideo:0kB audio:0kB subtitle:143kB other streams:0kB global headers:0kB muxing overhead: 83.253769%\n<\/pre>\n<p>The only downside is you then get duplicated lines of text in the SRT file.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Converting vtt transcripts to SRT with ffmpeg<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4],"tags":[226,569,607],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/4553"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=4553"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/4553\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=4553"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=4553"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=4553"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}