Publishing to S3

Now that you can access your files and convert them into JSON, the next step is to publish them to S3.

To start, you'll need to have or sign up for an Amazon Web Services account.

Setting Up S3

Create an S3 bucket

Log in to the AWS console, and go to the S3 home. (S3 stands for for Simple Storage Service. You can think of it as a filesystem in the cloud.)

Click on Create Bucket.

Click on Create Bucket

This bucket needs to have a globally unique name. It's common to name buckets after a domain name you control, for example, assets.nytimes.com.

Create an IAM user for Driveshaft

Go to the IAM > Users section of the console. (IAM is short for "Identity and Access Management," essential it's user account for your app).

Click on Create New Users.

Click on Create New Users

Give your Driveshaft user a name, and click Create. Then search for and click on your new account on the Users page.

Enter a user id

Click on Manage Access Keys, and then Create Access Key.

Click on Manage Access Keys

Download the credentials JSON and store in a safe place.

Note your IAM credentials

Note the two keys: an access key and a secret access key. Enter them in the following environmental variables:

AWS_ACCESS_KEY_ID="*********"
AWS_SECRET_ACCESS_KEY="*********"
AWS_REGION="us-east-1" # or another region

Authorize the user to access your S3 bucket

Click on the Create User Policy button.

Click on Create User Policy

A user policy says which S3 buckets buckets and API actions the user is allowed to access. This policy should allow access to the new bucket you created.

The syntax for these rules (or "policies") is a bit... esoteric. So we have an sample policy here that you can enter in the Custom Policy section that gives access to a single S3 bucket.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "s3:Put*",
        "s3:List*",
        "s3:Get*",
        "s3:Delete*"
      ],
      "Resource": [
        "arn:aws:s3:::assets.nytimes.com/*",
        "arn:aws:s3:::assets.nytimes.com"
      ]
    }
  ]
}

Just replace assets.nytimes.com with the name of your bucket.

Check that it works

Driveshow should now be able to publish to your S3 bucket. Try it out with with the following spreadsheet.

Enter the following into the form on the main page, (again replacing assets.nytimes.com with your bucket) and hit Submit:

Enter an S3 destination

Next, click on the red Publish to... button.

Enter an S3 destination

If it works, you'll see a log of the first published version when the page reloads.

Enter an S3 destination

Click on any Restore button to revert to a previous version of the json.

On S3, the canonical or latest version of a file is stored at the destination you specify (like s3://assets.nytimes.com/spreadsheet.json). But a second timestamped file is kept for every version:

s3://assets.nytimes.com/spreadsheet-20150515-120000.json

When you publish a file, the canonical version is updated, and a new timestamped version is created. When you restore a version, the canonical version is replaced with the version you specify.