Published on

dynaglue: Getting Started Guide

Authors

dynaglue is a small JavaScript library for DynamoDB I started putting together after developing an interest in single-table designs. The guidance for using them is still quite new and hard to understand, so I wanted to make it easier for others to use them.

Developing it was also a learning process for me. I come from a relational database background. Using a no-SQL database in itself adds a multitude of new concepts, especially one as rigid as DynamoDB (the limitations it imposes are important - they give you fast and predictable performance). Using it with single-table designs adds an additional set of access patterns to learn too.

Coding for single-table designs leans heavily on the developer to perform a number of very routine tasks, like encoding table names into keys or remembering to update indexes at the same time as objects. Relying on the discipline of developers is a recipe for bugs in my experience.

Where is it useful?

dynaglue will be handy for those who have studied single-table designs on DynamoDB, even having built one or two manually, but want a library to help them scale up to multiple entities and / or provide a pleasant API for implementing it.

An existing project running on DynamoDB will be difficult to use with dynaglue, because it makes strong assumptions about how data is stored and indexed which will probably not match what you already have.

Prerequisites

dynaglue uses Node.js, so your project will need to be written for JavaScript. It depends on the aws-sdk library being available to in order to execute DynamoDB requests.

Getting Started

Install dynaglue into your project with npm:

npm install dynaglue -S`

Next, we need to set up a context for dynaglue to work with. A context specifies the layout of our DynamoDB tables and the collections (types of entities) we will be storing in them. With these, we can define access patterns for indexing our collections for queries.

In a new JavaScript or TypeScript file, import dynaglue and DynamoDB from the aws-sdk:

import DynamoDB from 'aws-sdk/clients/dynamodb'
import { createContext } from 'dynaglue'

Next, we want to define our layout. It specifies how to map data onto the DynamoDB table. At a minimum, we need its table name, the name of its primary partition key and sort key (dynaglue needs a table with a partition and sort key to work properly).

For example:

const myAppTableLayout = {
  tableName: 'my-app-table',
  primaryKey: {
    partitionKey: 'pk0',
    sortKey: 'sk0',
  },
}

Both of the keys need to be strings (this restriction may be relaxed in later versions).

The table may have a CloudFormation definition like the following:

Resources:
  MyAppTable:
    Type: AWS::DynamoDB::Table
    Properties:
      AttributeDefinitions:
        - { AttributeName: 'pk0', Type: 'S' }
        - { AttributeName: 'sk0', Type: 'S' }
      KeySchema:
        - { AttributeName: 'pk0', KeyType: 'HASH' } # HASH = partition key
        - { AttributeName: 'sk0', KeyType: 'RANGE ' } # RANGE = sort key

Note that the name of our keys doesn't matter. dynaglue will map data from your entities onto whatever key names you choose, and because you can have multiple entity types in the same table, generic names are more natural to work with.

This layout will enable us do simple storage and retrieval of objects by their ID field. In it, we can already store multiple types of objects by defining a collection for each entity type. Lets define a top-level collection for storing users:

const usersCollection = {
  name: 'users',
  layout: myAppTableLayout, // reference the layout we defined previously
}

Lastly, we need to initialise a context with a DynamoDB instance from the aws-sdk and our collections:

const ddb = new DynamoDB({ region: 'us-east-1' })
const ctx = createContext(ddb, [usersCollection])

Now we have everything we need to start writing and querying our users collection!

Storing and retrieving data by ID

We can add user objects to the users with insert():

const user = {
  email: 'user1@example.com',
  name: 'Leila Kent',
  profile: {
    type: 'guest',
  },
}

const result = await insert(ctx, 'users', user)
console.log(result) // ==> { _id: '5e203ab4c9762f6f0f24e5b8', email: 'user1@example.com', ... }

Inserted objects are automatically populated with an _id field, which is used to reference the entity in other calls.

The IDs in dynaglue are generated automatically with Mongo's BSON ID algorithm, but you can opt to use add your own identifier before you insert the object.

We can retrieve our object again with the findById() call:

const object = await findById(ctx, 'users', userId)

or remove it with the deleteById() call:

await deleteById(ctx, 'users', userId)

This is all basic, and provides little more than simple wrapper around DynamoDB's PutItem, GetItem and DeleteItem call. In the real world, we want to index data on more than it's _id field - something harder to do with DynamoDB with single table layouts, but much easier with dynaglue.

Adding an access pattern

An access pattern let you define how to index the properties of objects in a collection. It references a Global Secondary Index in DynamoDB, and tells dynaglue how to fill out that index for your collection.

Like the primary key, it doesn't matter what property names you define for your GSI, and whether or not it conflicts with properties in your object. Also unlike DynamoDB, an access pattern can be used to reference nested properties, and it can automatically maintain composite keys.

Firstly, we must update our layout to specify the partition key of the GSI and (optional) sort key:

const myAppTableLayout = {
  tableName: 'my-app-table',
  primaryKey: {
    partitionKey: 'pk0',
    sortKey: 'sk0',
  },
  findKeys: [{ indexName: 'gs1', partitionKey: 'gspk1', sortKey: 'gssk1' }],
}

Above, we have extended our previous layout to reference a GSI called gs1 that takes the top-level property gspk1 as its partition key and gspk2 as its sort key.

Why doesn't it matter what property names we choose?

dynaglue stores your object value under the value sub-property, leaving the rest of the top-level key namespace free for your partition and sort keys (this means you can use anything but value as a key name!)

With this layout, we can now add one access pattern to each of our collections.

For our users collection, we'd like to add an access pattern to efficiently find all users by their profile.type field.

const usersCollection = {
  name: 'users',
  layout: myAppTableLayout,
  accessPatterns: [
    {
      indexName: 'gs1',
      partitionKeys: [['profile', 'type']],
      sortKeys: [],
    },
  ],
}

We reference the index that we added to the layout (above), and then specify the partition and sort key paths that will be put in the index. The latter are both arrays of object key paths to the properties we want to index (in this case, we just want to use the partition key, so we can leave the sort key blank - dynaglue will take care of it based on the layout).

A key path is the sequence of nested object keys used to retrieve your index value.

The above partition key path instructs dynaglue to copy the profile.type values from each object to the gspk1 top-level attribute so it can be looked up in a DynamoDB QueryItem call.

An important consequence of this, is that once you've defined an access pattern and started using it in production, it cannot be changed or mapped to a different index without a data migration. If you were to change the mapping and re-deploy, some of your objects will be populated with the old key values while new or updated objects will be written with the new index values.

Additionally, because we have put profile.type in a partition key field, we are committed to ensuring that value is defined for every object we add to the users collection (the same restriction does not apply to sort keys, as we will see).

Another dynaglue restriction is that all key values must be strings - numbers (although supported by DynamoDB) have not been added just yet.

To ensure these constraints are not violated in your own code, it is recommended you use a JSON-schema tool like ajv, (or use a strongly typed language like TypeScript) when persisting objects.

Updates

Top level entities can be partially updated with the updateById() call. You just specify an object with key path to value mappings to set on your target object (if you have multiple keys in a path, use . to separate them).

For example:

updateById(ctx, 'users', user1Id, {
  'profile.type': 'guest',
  name: 'J Smith',
})

DynamoDB supports other update operations than just SET but these are yet to be added to dynaglue.

Child Collections

Child collections are useful because they allow you to store supplementary data to a top-level object that is not needed in every object access (saving network bytes and limiting retrieval times), but can be retrieved with a single query when it is needed.

Declaration

The definition of child collections is similar to root entities, but with additional properties referencing the parent collection. This provides you with adjacency list support.

Let's say we want to define a new addresses collection, which stores zero or more addresses for each user in the users collection.

const usersCollection = {
  type: 'child',
  name: 'users',
  layout: myAppTableLayout, // reference the layout we defined previously
  parentCollectionName: 'users',
  foreignKeyPath: ['userId'],
}

The foreignKeyPath value specifies the path to an attribute on the child entity (addresses) which references the _id of the parent entity (users).

Storage and retrieval

You can conveniently retrieve all the children of a parent object with the findChildren API.

findChildren(ctx, 'addresses', userId)

Child entities are inserted the same way as root objects:

insert(ctx, 'addresses', address1);

Finding, updating and deletion requires separate *ChildById methods, which take both the child object ID followed by the parent object ID (both are needed).

deleteChildById(ctx, 'addresses', addressId, userId)
findChildById(ctx, 'addresses', addressId, userId)

(support still needs to be added for doing partial updates to child objects, but the API will follow similar conventions)

Composite Keys

dynaglue makes it easy to use a group of keys to index your entities. There are two main use cases it supports:

  • Build an index composed of multiple fields, where all values must match, and;
  • Arrange the entities in a hierarchy, e.g. in a collection of 'addresses', we may want to index fields country, state, town and street so that we can look up everything in the same matching country, or be more specific and specify the country, state and town to search.

Some use cases fall in between - we will explain how that might work.

Multiple matching fields

When we were defining an access pattern before, you would have noticed we use an array of key paths (i.e. an array of an array of strings). If there is multiple fields we want to put in our index, we just add multiple key path arrays.

For example, if we want to look up staff entities based on the combination of their department and their location code:

// A staff entity example
const staff1 = {
  name: 'Jeff Arnold',
  position: {
    role: 'Senior Manager',
    department: 'Marketing',
  },
  locationCode: 'AU-SYD',
};

const staffCollection = {
  name: 'staff',
  layout: layout,
  accessPatterns: [{
    indexName: 'gs1',
    partitionKeys: [['position', 'department'], ['locationCode']],
    sortKeys: []
  }],
};

Using it is then just a matter of calling:

find(ctx, 'staff', { 'position.department': 'Marketing', locationCode: 'AU-SYD' }).

Note that because we have put both attributes in the partition key, both are mandatory for our lookup. We could have also chosen to put locationCode in the sort key, which would have made it optional to specify for a find() (internally DynamoDB QueryItem).

Similarly, if we specify any keys not matching the index, the request will fail, because dynaglue will be unable to find an index to use.

In the index gs1 (which we defined with partition key gspk1 in the layout above) the value will be stored as:

staff|-|Marketing|-|AU-SYD

(|-| is the seperator dynaglue uses)

dynaglue takes care of the key extraction and concatenation for you when inserting, updating and retrieving the value. It also validates that every entry has those keys upon insertion.

Hierarchical indexes

We can also index multiple fields that exist in a hierarchy, e.g. the components of an address (country, state, town, street) or staff in a company (branch, department, team).

It's important that the key paths are specified in order of least to most specificity.

To be able to do partial matches, we index them in a sort key. When you make a query, you can have optional values, but you must specify all the key paths up to and including the most specific one, leaving the rest unspecified.

If we take our address example, we would define our access pattern like this:

const accesspattern = {
  indexName: 'gs1',
  partitionKeys: [],
  sortKeys: [['country'], ['state'], ['town'], ['street']],
}

We can perform queries like:

find(ctx, 'addresses', { country: 'AU', state: 'NSW' }); /// All in NSW, Australia
find(cxt, 'addresses', { country: 'AU' }); // All in Australia
find(ctx, 'addresses', { country: 'AU', state: 'NSW', town: 'Sydney' });

but we couldn't do queries like:

find(ctx, 'addresses', { town: 'Sydney' }); // Town in any country or state matching 'Sydney'
find(ctx, 'addresses', { country: 'Australia', town: 'Sydney' }); // Town in Australia in any state matching 'Sydney'

If we wanted to do the latter, we would need to create another access pattern with just the keys in the queries above (this limitation comes from DynamoDB and how the index is implemented - we use begins_with to match based on the specificity of the query).

Using partition and sort keys correctly

The two above use cases are both extremes, and most use cases will fall somewhere between. Remember that:

  • Partition keys are mandatory for a query using find(), so index key paths that are always available for the access pattern
  • Sort keys are optional, so use them for values which are sometimes part of the access pattern, and order them from most general to most specific to take advantage of hierarchical lookups.

In practice, you want to put as much in the partition keys as you can - it improves the ability for DynamoDB to split the load from different queries across different shards and avoid bottlenecks from hot partitions.

An example of where we may want to split over partition and sort keys with optional values is the location hierarchy example given before.

The example given allowed everything to be optional (so you could search every location), but the downside is that all the requests would go to the same partition, which could affect performance in very high load scenarios.

If we predominantly ship within continental Europe, a good compromise would be making the country code mandatory for lookups. This just means putting the country value in a partition key.

const accesspattern = {
  indexName: 'gs1',
  partitionKeys: [['country']],
  sortKeys: [['state'], ['town'], ['street']],
}

Queries on this would not change - find() is smart enough to know where to put each value from the query and to enforce mandatory values.

Similarly, if we work mostly within one country, making both the country and state mandatory will help break up the queries across more partitions than just the country alone.

Sparse Indexes

dynaglue should also let you implement sparse indexes. A sparse index is useful when you only want to index entities in a collection that have a value specified - those that don't specify a value will not be included in this index.

Sparse indexes are a more advanced use case, but they make it easy to index documents with an optional field, so that you don't need lots of inefficient and time-consuming filtering on unpopulated values.

These are used in DynamoDB by creating a GSI with a sort key (sort keys are optional for indexes), but only populating the sort key value on documents to be indexed.

dynaglue will recognise an entity with an unpopulated value on the sort key path, and automatically leave the index value blank.

This is different from the examples in the previous section, where you don't specify a key path for the sortKeys field in the access pattern. If you specify at least one sort key path, dynaglue will leave the sort key blank when the key path is undefined on an entity (giving you a sparse index).

On the other hand, when you don't specify any sort key paths in the access pattern, but the referenced index in the layout contains a sort key, it will populate it with the collection name so that the partition key values can still be found on any entity (an alternative to this would be to create a GSI without a sort key, but then it can't be used for sparse indexes).

What's next?

dynaglue is still a 'work-in-progress', so the API may change from what is shown above. Additionally, it still needs to add support for a number of useful things, including:

  • Better marshalling of dates - at the moment, these are just converted to strings
  • Allowing you to store number index values (we don't support these properly right now)
  • Adding support for filter expressions
  • Changing the query operator for the find call
  • Extending updateById() with support for other operator types (including remove, SET with increment, ADD/REMOVE Set types) as well as child objects.
  • Small customisations, like changing the separator value, the name of the value field, pre-insert / update processors for entities and indexes etc.
  • Write sharding support - if you have an indexed value with low variability, queries can become bottlenecked in high-load scenarios without adding some randomness to the value to split the load in parallel query situations.

If you'd like to prioritise what comes next, make a contribution, or report a bug, open an issue on the Github Project Issues page (especially before you write any code!).