HOWTO access Amazon DynamoDB with Ruby

Amazon DynamoDB is a fully managed NoSQL database service. This guide will show you how to access the Amazon DynamoDB API with Ruby and version 3 of the official AWS SDK for Ruby.

Finding your credentials

First you’ll need to get your access key credentials from the AWS Management Console.

IAM access keys are best for production applications, and can be managed from the “Security Credentials” tab of the IAM user page. Alternatively you can use a root access key, which you can find in the “Access Keys” section of the account-wide “Security Credentials” page. Root keys are more of a security risk because they grant access to everything in your AWS account, but easier to use if you’re new to AWS.

You’ll also need to pick a region (see the AWS General Reference for a list).

Setting up your credentials

For a production app it’s best to store your credentials in environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_REGION). The AWS SDK will pick up the credentials from the environment variables automatically.

Alternatives for development and quick scripting are to store your credentials in an ini formatted config file (~/.aws/credentials), or paste them directly into your code.

Installing the aws-sdk-dynamodb gem

Make sure you have the aws-sdk-dynamodb gem installed:

gem install aws-sdk-dynamodb

Alternatively add the gem to your Gemfile and install it via bundler.

Creating your first table

Get started by initializing a client object and creating your first table:

require 'aws-sdk-dynamodb'

dynamodb = Aws::DynamoDB::Client.new

dynamodb.create_table({
  table_name: 'Products',
  attribute_definitions: [
    {attribute_name: 'account_id', attribute_type: 'S'},
    {attribute_name: 'product_id', attribute_type: 'S'}
  ],
  key_schema: [
    {attribute_name: 'account_id', key_type: 'HASH'},
    {attribute_name: 'product_id', key_type: 'RANGE'}
  ],
  provisioned_throughput: {
    read_capacity_units: 1,
    write_capacity_units: 1
  }
})

There’s a lot going on here, so let’s go through some of the details:

The key_schema parameter defines the primary key, in this example a composite primary key composed of a partition key (account_id) and a sort key (product_id)
The attribute_definitions parameter defines the type of both both the partition key and the sort key as String keys (DynamoDB also supports Numeric and Binary primary keys)
The provisioned_throughput parameter defines the how much throughput capacity is allocated for reads and writes (you’ll want to adjust this for a production application, but 1 unit for each should be adequate for testing)

Adding some items

Once you have a table you can add some items:

dynamodb.put_item({
  table_name: 'Products',
  item: {
    'account_id' => '93d0',
    'product_id' => '0001',
    'title' => 'Blue & Black Dress',
    'colours' => Set.new(['blue', 'black']),
    'likes' => 0
  }
})

dynamodb.put_item({
  table_name: 'Products',
  item: {
    'account_id' => '93d0',
    'product_id' => '0002',
    'title' => 'White & Gold Dress',
    'colours' => Set.new(['white', 'gold']),
    'likes' => 0
  }
})

DynamoDB is “schemaless”, meaning that items can have arbitrary attributes, unlike traditional SQL databases which require all attributes to be specified upfront. This makes is easier to work with evolving data models and data that varies in shape.

DynamoDB supports a number of data types: String, Number, Binary, Boolean, Null, List, Map (similar to a Ruby Hash or JSON object), String Set, Number Set, Binary Set. These data types are encoded and decoded into equivalent Ruby types by the SDK. Number values are represented as BigDecimal objects (which you may need to convert to/from floats or integers), and Binary values are represented as StringIO objects.

Fetching items by primary key

Fetching a single item by primary key is straightforward:

response = dynamodb.get_item({
  table_name: 'Products',
  key: {
    'account_id' => '93d0',
    'product_id' => '0001'
  }
})

item = response.item

The value of response.item will be nil if no item exists with the given primary key.

Conditional writes

Calling #put_item will overwrite data if an item with the same primary key already exists. Sometimes this is what you wan’t, sometimes it isn’t.

Conditional writes can be used to specify that item attributes should meet one or more expected conditions for the write to succeed. For example, attempting to re-add the second item from above on the condition that the product_id doesn’t already exist:

dynamodb.put_item({
  table_name: 'Products',
  condition_expression: 'attribute_not_exists(product_id)',
  item: {
    'account_id' => '93d0',
    'product_id' => '0002',
    'title' => 'White & Gold Dress',
    'colours' => Set.new(['white', 'gold']),
    'likes' => 0
  }
})

Given that the item already exists the specified conditions will not be met, and the call will raise an Aws::DynamoDB::Errors::ConditionalCheckFailedException error. This is similar to the behaviour you’re probably used to with SQL INSERT.

Conditional writes can also be useful for implementing optimistic concurrency control, similar to the optimising locking functionality provided by ActiveRecord in Rails.

Partially updating an item

The #update_item method can be used to update a subset of item attributes, instead of replacing all attributes as #put_item does. For example, updating a product title:

dynamodb.update_item({
  table_name: 'Products',
  key: {
    'account_id' => '93d0',
    'product_id' => '0001'
  },
  update_expression: 'SET title = :title',
  expression_attribute_values: {
    ':title' => 'Blue & Black Dress ON SALE NOW'
  }
})

If it’s not clear what’s going on here, here are some of details:

The key parameter specifies the primary key in the same way as with #get_item
The update_expression parameter specifies the attributes you want to update and how to update them—in this case just setting the title attribute to a new value
The expression_attribute_values parameter defines the values to be substituted into the update expression—in this case just the new value of the title attribute

This may seem overly verbose just for updating a single attribute, but the benefits of this approach are clearer when you need to perform more complex updates.

Updating counters & sets

As well as updating multiple attributes, update expressions can modify attributes in different ways. For example, let’s say you need to implement a “like” feature for the products. You can use an update expression to increment the likes attribute without knowing its existing value like this:

dynamodb.update_item({
  table_name: 'Products',
  key: {
    'account_id' => '93d0',
    'product_id' => '0001'
  },
  update_expression: 'SET likes = likes + :n',
  expression_attribute_values: {
    ':n' => 1
  }
})

Similarly, values can be added or removed from set attributes. For example, adding a colour to a product like this:

dynamodb.update_item({
  table_name: 'Products',
  key: {
    'account_id' => '93d0',
    'product_id' => '0001'
  },
  update_expression: 'ADD colours :colour',
  expression_attribute_values: {
    ':colour' => Set.new(['brown'])
  }
})

So now you know how to create tables and work with individual items. What about selecting multiple items? What if you need a list of all the products in the table? Or products for a specific account or category?

Full table scans

The #scan method can be used to perform a full table scan, which by default will return every item in a table. For example, listing every product item like this:

response = dynamodb.scan(table_name: 'Products')

items = response.items

Accessing every item will be increasingly expensive in performance terms as a dataset grows in size, so full table scans should generally be avoided in a production application. However they can be useful for small datasets, or exporting a dataset in full.

Primary key queries

The #query method can be used to search for items based on their primary key. For example, listing all of the products for a given account like this:

response = dynamodb.query({
  table_name: 'Products',
  key_condition_expression: 'account_id = :account_id',
  expression_attribute_values: {
    ':account_id' => '93d0'
  }
})

items = response.items

The key_condition_expression parameter defines the query (various comparison operators are supported), and the expression_attribute_values parameter defines the values to be substituted in the different expression parameters (in this case just key_condition_expression). The response object is a struct, which contains some metadata relating the query in addition to the items themselves.

Count queries

The select parameter can be used to return the number of items matching the query, instead of the matching items themselves:

response = dynamodb.query({
  table_name: 'Products',
  select: 'COUNT',
  key_condition_expression: 'account_id = :account_id',
  expression_attribute_values: {
    ':account_id' => '93d0'
  }
})

count = response.count

Similar to SELECT COUNT(*) FROM in SQL.

Filtered queries

The filter_expression parameter can be used to further filter the query results. For example, listing all of the products from a specific account, in a specific colour:

response = dynamodb.query({
  table_name: 'Products',
  key_condition_expression: 'account_id = :account_id',
  filter_expression: 'contains(colours, :colour)',
  expression_attribute_values: {
    ':account_id' => '93d0',
    ':colour' => 'black'
  }
})

items = response.items

Note that the same throughput is consumed whether the filter expression is specified or not. In order to query more efficiently you’ll be better off using secondary indexes.

Secondary index queries

For more efficient querying DynamoDB supports secondary indexes. For example, the #update_table method can be used to add a global secondary index on the existing products table to support queries on the product_category attribute:

dynamodb.update_table({
  table_name: 'Products',
  attribute_definitions: [
    {attribute_name: 'product_category', attribute_type: 'S'},
    {attribute_name: 'title', attribute_type: 'S'}
  ],
  global_secondary_index_updates: [
    create: {
      index_name: 'ProductCategoryIndex',
      key_schema: [
        {attribute_name: 'product_category', key_type: 'HASH'},
        {attribute_name: 'title', key_type: 'RANGE'}
      ],
      projection: {
        projection_type: 'KEYS_ONLY'
      },
      provisioned_throughput: {
        read_capacity_units: 1,
        write_capacity_units: 1
      }
    }
  ]
})

Parameters are similar to those used in #create_table:

The key_schema attribute defines the attributes used to key the index—in this example a key composed of a partition key (product_category) and a sort key (title)
The attribute_definitions parameter defines the types of the attributes used in the key schema—in this case both product_category and title are String keys
The provisioned_throughput parameter defines the throughput for the index

In addition the projection parameter defines which attributes to include in the index. This example specifies only the keys are to be included, which means that any results returned from querying the index will include the index key attributes together with the primary key attributes of each item. Other options which might be more appropriate depending on the application are projection_type: 'INCLUDE' (for including additional attributes), and projection_type: 'ALL' (for including all item attributes).

The index will by created asyncronously and typically won’t be available immediately. Once the index is active it can be queried like so:

response = dynamodb.query({
  table_name: 'Products',
  index_name: 'ProductCategoryIndex',
  select: 'ALL_PROJECTED_ATTRIBUTES',
  key_condition_expression: 'product_category = :product_category',
  expression_attribute_values: {
    ':product_category' => 'dresses'
  }
})

items = response.items

The index_name parameter specifies which index to query, and the select parameter specifies that all the projected attributes should be returned. Other parameters are used the same as when querying the table directly.

Running DynamoDB locally

Amazon provides a downloadable implementation of the DynamoDB API for development and testing—no fees, and no network connection required!

Instructions for how to download and run the local web service can be found in Running DynamoDB on Your Computer. You’ll then need to specify the endpoint config parameter so that your code knows to connect to the local web service.

You can specify the endpoint parameter in the global config like this:

Aws.config[:dynamodb] = {endpoint: 'http://localhost:8000'}

Alternatively you can specify the parameter when initializing the client, like this:

dynamodb = Aws::DynamoDB::Client.new(endpoint: 'http://localhost:8000')

You’ll also need to specify the same credentials as before (access key, secret access key, and region), but these do not have to be the same as your production credentials, so you can use dummy placeholder values.