{"id":5818,"date":"2018-10-27T13:55:00","date_gmt":"2018-10-27T13:55:00","guid":{"rendered":"http:\/\/www.garysieling.com\/blog\/?p=5818"},"modified":"2020-03-30T02:29:50","modified_gmt":"2020-03-30T02:29:50","slug":"aws-lambda-experiment-results","status":"publish","type":"post","link":"https:\/\/www.garysieling.com\/blog\/aws-lambda-experiment-results\/","title":{"rendered":"AWS Lambda Experiment results"},"content":{"rendered":"<p>I set up an AWS lambda to update video metadata for about 75% of <a href=\"https:\/\/www.findlectures.com\">FindLectures.com<\/a>. For version two, I want to provide an option to sort based on confidence intervals within a topic (i.e. based on the number of up\/downvotes, how likely is this video to be good).<\/p>\n<p>The architecture of the update was as follows:<\/p>\n<ul>\n<li>For every video that could be updated, make a record in DynamoDB<\/li>\n<li>For each insert to DynamoDB, trigger one lambda<\/li>\n<li>For a given video, run youtube-dl to get just video metadata<\/li>\n<li>Save the video metadata to S3<\/li>\n<\/ul>\n<p>Ultimately this did not work as well as I hoped. I inserted 150,000 rows into DynamoDB. AWS ran max 15 concurrent lambdas, and ultimately I only got 3,000 successful runs of the job. There wasn&#8217;t any evidence of where the concurrency problem lies (the lamdbda &amp; account max are at 1000, and I gave DynamoDB more readers, just in case).<\/p>\n<p>Ultimately this experiment was nearly free:<\/p>\n<pre>AWS Lambda - Compute Free Tier - 400,000 GB-Seconds - US East (Northern Virginia) 42,760.113 seconds $0.00\n\nAWS Lambda Request $0.00\nAWS Lambda - Requests Free Tier - 1,000,000 Requests - US East (Northern Virginia) 17,651 Requests $0.00\n\nAmazon Simple Storage Service Requests-Tier1 $0.08\n$0.005 per 1,000 PUT, COPY, POST, or LIST requests 16,945 Requests $0.08\n\nAmazon Simple Storage Service Requests-Tier2 $0.00\n$0.004 per 10,000 GET and all other requests 777 Requests $0.00\n\nAmazon Simple Storage Service TimedStorage-ByteHrs $0.10\n$0.023 per GB - first 50 TB \/ month of storage used 4.350 GB-Mo$0.10\n<\/pre>\n<p>Lessons learned:<\/p>\n<ul>\n<li>If you naively use non batch APIs (e.g. to upload \/ retreive from DynamoDB) you lose a ton of time, even inside the AWS datacenters<\/li>\n<li>You need to replicate the AWS environment as much as you can for your own testing. E.g. your first step should be to get a sample JSON payload from DynamoDB.<\/li>\n<li>There are other tools to help replicate AWS locally (<a href=\"https:\/\/github.com\/intoli\/exodus\">Exodus bundler<\/a>, for getting binaries for the AWS Linux, and <a href=\"https:\/\/github.com\/lambci\/docker-lambda\">Docker containers<\/a>)<\/li>\n<li>The free tier sounds like a lot of requests, but note that it is also metered by time, which is easier to hit in an ETL process like this.<\/li>\n<li>Lambdas with DynamoDB are a really powerful form of stored procedures &#8211; any language, outside of a database transaction.<\/li>\n<li>If you choose to upload a zip of your lambda, you can run just about any binary, so long as it runs on Amazon linux and doesn&#8217;t write to any filesystem other than \/tmp<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Lessons learned from an early AWS Lambda prototype<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"om_disable_all_campaigns":false,"_monsterinsights_skip_tracking":false,"_monsterinsights_sitenote_active":false,"_monsterinsights_sitenote_note":"","_monsterinsights_sitenote_category":0,"footnotes":""},"categories":[8,22],"tags":[71,334],"aioseo_notices":[],"amp_enabled":true,"_links":{"self":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5818"}],"collection":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/comments?post=5818"}],"version-history":[{"count":1,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5818\/revisions"}],"predecessor-version":[{"id":6428,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/posts\/5818\/revisions\/6428"}],"wp:attachment":[{"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/media?parent=5818"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/categories?post=5818"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.garysieling.com\/blog\/wp-json\/wp\/v2\/tags?post=5818"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}