I recently had to write some code which uses the [AWS SDK] to
list all the in a S3 bucket which potentially contains many
objects (currently over 80,000 in production). The listObjects
API will only return up to 1,000 keys at a time so you have to make
multiple calls, setting the Marker field to page through all the
keys.

It turns out there’s a lot of sub-optimal examples out there for how
to do this which often involve global state and complicated recursive
callbacks. I’m also a fan of the clarity of JavaScript’s newer
async/await feature for handling asynchronous code so I was keen on a
solution which uses that style.

Here’s what I came up with:

async function allBucketKeys(s3, bucket) {
  const params = {
    Bucket: bucket,
  };

  var keys = [];
  for (;;) {
    var data = await s3.listObjects(params).promise();

    data.Contents.forEach((elem) => {
      keys = keys.concat(elem.Key);
    });

    if (!data.IsTruncated) {
      break;
    }
    params.Marker = data.NextMarker;
  }

  return keys;
}

It’s called like this:

// Remember to catch exceptions somewhere...
const s3 = connectToS3Somehow();
var keys = await allBucketKeys(s3, "my_bucket");

This solution is clean, concise and hopefully straightforward.

An important aspect that supports this solution is that the AWS API
can return a Promise for a call (via .promise()) which can then be
used with await. Given the need to conditionally call listObjects
multiple times, an arguably clearer code structure can be achieved using
await instead of callbacks.



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here