Schema modeling with OrderedTable

In this tutorial, we'll walk through some best practices for OrderedTable schema modeling while writing a simple blogging app. We know that the blogging app will need to support querying logic by both an author and the most recent posts, but the database schema design is left to us.

Here's what we'll cover:

  • OrderedTable schema design anti-patterns
  • A better, query-driven approach to schema design
  • Using transactionality for data consistency

Cloud storage deep dive video

Prerequisites

Why use OrderedTable

The OrderedTable cloud storage type behaves like a Map; it stores typed key value pairs with CRUD operations. But OrderedTable is also sorted and data can be retrieved with range operations. This makes it an especially good choice for data that needs to be chunked or returned in a specific order, as in our blog post example.

What not to do...

If you’re used to modeling data in a relational context, you might not know where to start when creating your OrderedTable schema, or worse, you might define a schema which is incapable of supporting your application’s use-cases.

It might be tempting to hop right into defining the data models, i.e. an Author writes many Post entities with a unique OffsetDateTime. Maybe we only need one simple, wide table—something like OrderedTable Author [Post]:

type Author = {
  authorId : Text,
  name : Text
}

type Post = {
  title: Text,
  body : Text,
  timeStamp : OffsetDateTime,
}

antiPattern : '{Exception, Cloud} OrderedTable Author [Post]
antiPattern = do OrderedTable.named !Database.default "badSchema" Universal.ordering

After all, the Author could be the key to a List of Posts, and maybe we can add the equivalent of a SQL where or limit clause with ordering for some filtering logic based on time.

Here’s why that’s not ideal:

Storing data in an unbounded list in OrderedTable is generally an anti-pattern

Say you want to edit a single blog post inside the List of Post as a transaction. Do you want the entirety of the author’s blogposts to also be locked up while the update for a single post is taking place? Generally, no.

OrderedTable also has some hard-stop data size limits when it comes to reading and writing data, so especially prolific bloggers might eventually run into issues creating new posts!

As a rule, if your OrderedTable encloses a list as its value, it’s a sign that you may want to key the data differently, so that your List of Posts becomes something shaped like a OrderedTable SomeUniqueKey Post.

Query-driven schema design

Because we’re not handing off our data to something with a query plan optimizer, we need to start by defining precisely how our application will be accessing the data.

💡 Our data modeling concerns here are twofold, for database writes, the keys we use need to be unique enough to avoid inconsistencies, and for our reads, we want our business queries to work with the minimum amount of in-memory filtering.

Ideally, we also want the lowest possible factor of data duplication, but as we’ll see, the queries over our data will ultimately dictate that.

For the purposes of this example, these are the queries we want to support:

  • Get all the posts for a given author
  • Get the most recent N posts by a given author
  • Get the most recent N posts on the entire blogging website, including the author name

The only query that our earlier OrderedTable Author [Post] would support without in-memory scans is the first one.

Compound keys for better queries

Here’s a better schema for our app: we define two OrderedTables with compound (tuple) keys–one for queries first scoped by an Author in some way, and a second for queries that are filtering by time information first.

For convenience, an AppStorage type can contain both OrderedTable tables and the Database they’re stored in:

type AppStorage = {
  database : Database,
  postsByTimestamp : OrderedTable (OffsetDateTime, Author) Post,
  postsByAuthor : OrderedTable (Author, OffsetDateTime) Post
  }

The compound keys ensure that we’re writing unique entities since it’s unlikely the same author is composing two posts with sub-millisecond timing. The ordering of the elements in the tuple is also important, since the OrderedTable will be ordered by the first element in the tuple and then order by the second only if the first elements are equal.

OrderedTable (Author, OffsetDateTime) covers the first two desired queries. Getting all the posts by a given author and the latest N posts by a given author are both supported by variations of a call to the rangeClosed.prefix function. It queries the OrderedTable for all keys within the given range, using only the first element of the compound key.

getAllPostsByAuthor : AppStorage -> Author -> {Remote} [Post]
getAllPostsByAuthor appStorage author =
  postsByAuthor = Storage.postsByAuthor appStorage
  streamEntities = rangeClosed.prefix postsByAuthor prefixOrdering author author
  Stream.map (at2) streamEntities |> Stream.toList!

getLatestPostsByAuthor : AppStorage -> Author -> Nat -> {Remote} [Post]
getLatestPostsByAuthor appStorage author n =
  postsByAuthor = Storage.postsByAuthor appStorage
  streamEntities = rangeClosed.prefix postsByAuthor prefixOrdering author author
  Stream.take n streamEntities |> Stream.map (at2) |> Stream.toList!

Likewise, to support the final query across all authors over time, we’ll use the OrderedTable keyed by (OffsetDateTime, Author) and the same prefixQuery function.

getLatestPosts : AppStorage -> OffsetDateTime -> Nat -> {Remote} [Post]
getLatestPosts appStorage time n =
  postsByTimestamp = Storage.postsByTimestamp appStorage
  streamEntities = rangeClosed.prefix postsByTimestamp prefixOrdering time time
  Stream.take n streamEntities |> Stream.map (at2) |> Stream.toList!

Use transactionality to sync your data across OrderedTables

Keeping two OrderedTables around with the same content but different keys sounds like a recipe for inconsistency, but the Cloud’s durable storage layer has another mechanism for keeping data in sync and that is transactionality.

Represented by the Transaction ability in functions like OrderedTable.write.tx, we can use this ability to describe interactions with the cloud’s storage layer that need to complete together.

In our case, when we receive a new blogpost, we need to write to both OrderedTables in the same transactional block. Below, the transactional block is opened by the do keyword, and run as a Storage interaction with the transact.random function.

writePost : AppStorage -> Author -> OffsetDateTime -> Post -> {Remote, Storage, Exception, Random} ()
writePost appStorage author time newPost =
  postByAuthorOrderedTable = Storage.postsByAuthor appStorage
  postByTimestampOrderedTable = Storage.postsByTimestamp appStorage
  db = Storage.database appStorage
  Storage.transact.random db do
      OrderedTable.write.tx postByAuthorOrderedTable (author, time) newPost
      OrderedTable.write.tx postByTimestampOrderedTable (time, author) newPost

Run a test deployment

You can exercise each of these database interactions in local tests by writing posts with the writePost function and then querying them with your various read functions. Here's an example of a test run that writes two posts and then queries for all posts by a given author:

createStorage : '{Exception, Cloud} Storage
createStorage = do
  database = Database.named "posts"
  Database.assign database (!Environment.default)
  postsByTimestamp = OrderedTable.named database "postsByTimestamp" Universal.ordering
  postsByAuthor = OrderedTable.named database "postsByAuthor" Universal.ordering
  Storage database postsByTimestamp postsByAuthor

testGetAuthor : '{IO, Exception} [Post]
testGetAuthor = Cloud.main.local do
  postTime = !OffsetDateTime.current
  secondTime = addDuration postTime (Duration.seconds +10)
  appStorage = !createStorage
  Cloud.submit (!Environment.default) do
    writePost appStorage (Author "Alice") postTime (Post "Hello" "World" postTime)
    writePost appStorage (Author "Bob") secondTime (Post "Goodbye" "Computer" secondTime)
    getAllPostsByAuthor appStorage (Author "Alice")

Note that the local updates and reads to the database must take place in the same Cloud.main.local block. Entering run testGetAuthor in the UCM will return a list of posts by the author "Alice", not "Bob".

Summary

Working with OrderedTable requires slightly different thinking about your data modeling. If you’re ever stuck, remember:

  • Your OrderedTable should very seldom look like OrderedTable Key [ListOfValues]
  • Start by describing the queries your application will need to perform. It will help dictate what your data should be keyed by.
  • It’s not uncommon to coordinate multiple OrderedTables with Transactions in order to serve different access patterns for your data.

📚 Learn more about OrderedTable via the API docs