{"id":584,"date":"2012-09-17T00:20:34","date_gmt":"2012-09-17T00:20:34","guid":{"rendered":"http:\/\/garysieling.com\/blog\/?p=584"},"modified":"2012-09-17T00:20:34","modified_gmt":"2012-09-17T00:20:34","slug":"scraping-videos-with-phantomjs","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/scraping-videos-with-phantomjs\/","title":{"rendered":"Scraping Videos with PhantomJS"},"content":{"rendered":"<p>I&#8217;ve been using PhantomJS for some scraping projects &#8211; PhantomJS is a headless webkit, packaged to run Javascript scripts. Some of my family are still on a slow connection with a low monthly bandwidth cap- they can&#8217;t watch many videos. This is unfortunate given  the number of training classes available online (e.g., pattern drafting, in this case).<\/p>\n<p>Unfortunately, downloading video isn&#8217;t a supported use case for PhantomJS. It appears that the primary goal of PhantomJS is automated testing (e.g. like Selenium), and they don&#8217;t want to include the necessary code to render videos, as it involves potentially dealing with many codecs.<\/p>\n<p>Fortunately, there is alternative project that works well &#8211; <a href=\"https:\/\/github.com\/rg3\/youtube-dl\">youtube-d<\/a>l (github link). This is a pre-packaged python project which lets you download youtube videos by channel, search result, playlist, etc. It also supports Google Video, Photobucket, Yahoo! Video, Dailymotion, blip.tv, DepositFiles, vimeo, and more.<\/p>\n<p>Setup is simple-<\/p>\n<pre>\ngit clone https:\/\/github.com\/rg3\/youtube-dl\n<\/pre>\n<p>Python 2.x must be available and on PATH (if on Windows). <\/p>\n<p>You can run it easily, like so:<\/p>\n<pre>\nyoutube-dl.exe -o %(stitle)%s(ext)s http:\/\/youtube.com\/user\/DonMcCunn\n<\/pre>\n<p>Notable, the command line arguments allow you to modify the ouput filenames. Other args allow audio extraction, specifying desired format, simulation options, and authentication. Typical file sizes are 5-10 MB per 5 minute video.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve been using PhantomJS for some scraping projects &#8211; PhantomJS is a headless webkit, packaged to run Javascript scripts. Some of my family are still on a slow connection with a low monthly bandwidth cap- they can&#8217;t watch many videos. This is unfortunate given the number of training classes available online (e.g., pattern drafting, in &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.garysieling.com\/blog\/scraping-videos-with-phantomjs\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Scraping Videos with PhantomJS&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[4],"tags":[495,607],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/584"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=584"}],"version-history":[{"count":0,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/584\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=584"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=584"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=584"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}