AWS Lambda Experiment results

I set up an AWS lambda to update video metadata for about 75% of For version two, I want to provide an option to sort based on confidence intervals within a topic (i.e. based on the number of up/downvotes, how likely is this video to be good).

The architecture of the update was as follows:

  • For every video that could be updated, make a record in DynamoDB
  • For each insert to DynamoDB, trigger one lambda
  • For a given video, run youtube-dl to get just video metadata
  • Save the video metadata to S3

Ultimately this did not work as well as I hoped. I inserted 150,000 rows into DynamoDB. AWS ran max 15 concurrent lambdas, and ultimately I only got 3,000 successful runs of the job. There wasn’t any evidence of where the concurrency problem lies (the lamdbda & account max are at 1000, and I gave DynamoDB more readers, just in case).

Ultimately this experiment was nearly free:

AWS Lambda - Compute Free Tier - 400,000 GB-Seconds - US East (Northern Virginia) 42,760.113 seconds $0.00

AWS Lambda Request $0.00
AWS Lambda - Requests Free Tier - 1,000,000 Requests - US East (Northern Virginia) 17,651 Requests $0.00

Amazon Simple Storage Service Requests-Tier1 $0.08
$0.005 per 1,000 PUT, COPY, POST, or LIST requests 16,945 Requests $0.08

Amazon Simple Storage Service Requests-Tier2 $0.00
$0.004 per 10,000 GET and all other requests 777 Requests $0.00

Amazon Simple Storage Service TimedStorage-ByteHrs $0.10
$0.023 per GB - first 50 TB / month of storage used 4.350 GB-Mo$0.10

Lessons learned:

  • If you naively use non batch APIs (e.g. to upload / retreive from DynamoDB) you lose a ton of time, even inside the AWS datacenters
  • You need to replicate the AWS environment as much as you can for your own testing. E.g. your first step should be to get a sample JSON payload from DynamoDB.
  • There are other tools to help replicate AWS locally (Exodus bundler, for getting binaries for the AWS Linux, and Docker containers)
  • The free tier sounds like a lot of requests, but note that it is also metered by time, which is easier to hit in an ETL process like this.
  • Lambdas with DynamoDB are a really powerful form of stored procedures – any language, outside of a database transaction.
  • If you choose to upload a zip of your lambda, you can run just about any binary, so long as it runs on Amazon linux and doesn’t write to any filesystem other than /tmp