Querying an RDBMS directly in JSON

TLDR

What if this intent:

Fetch top 5 articles by rating for every category  

was this query:

Query Response
{
    "from"     : "category",
    "columns"  : [
        "name",
        "description",
        { "name"     : "articles",
          "columns"  : ["title", "rating"],
          "order_by" : "-rating",
          "limit"    : 5
        }
    ]
}
[ 
  { "category"    : "technology",
    "description" : "...",
    "articles"    : [
        { "title"  : "Hello",
          "rating" : 5
        },
        ..
    ]
  },
  ..
]

Why is this useful?

Imagine the server-side programming effort we would save, if we had an existing service that arbitrated a JSON interface to an RDBMS.

Technically, the required input/information for this service is only:

  1. The schema information: To compile JSON queries to SQL and to serialize the SQL response to JSON
  2. Row Level Security annotations: To prevent unauthorized access to data. Eg: one user should not be able to query other users' data

schema


Building the JSON interface

Consider this schema, for the discussions below:
schema

Every article has only one author, and only one category
We will discuss the following:

  1. SELECT query and response translation to JSON
  2. SELECT with JOINs, queries and responses
  3. JOIN with WHERE, queries and responses

1. Simple SELECT queries

1.1 Syntax

Except for the WHERE clause, a simple SQL SELECT query can be represented in JSON as follows:

SQL Desired JSON
SELECT title, rating  
  FROM article 
  WHERE published = '2015-11-04' AND rating > 4 
  ORDER BY rating DESC 
  LIMIT 10
  OFFSET 10
{
    "columns"  : ["title", "rating"],
    "from"     : "article",
    "where"    : ????
    "order_by" : "-rating",
    "limit"    : 10,
    "offset"   : 10
}

A where clause in SQL's SELECT is a boolean expression which can be evaluated to True or False for each row. We now need a syntax for boolean expressions in JSON

Taking inspiration from MongoDB:

SQL MongoDB-ish translation
COLUMN OP VALUE { "COLUMN" : { "OP" : VALUE } }
BE1 AND BE2 AND ... BEn { "$and" : [MBE1, MBE2 ... MBEn] }
BE1 OR BE2 OR ... BEn { "$or" : [MBE1, MBE2 ... MBEn] }
NOT BE { "$not" : MBE }

This basic boolean expression { "COLUMN" : { "OP" : VALUE } } can now be combined using $and, $or and $not to build increasingly complex boolean expressions.So, the where clause in the above SQL query can now be written in JSON as

{ 
    "$and" : [ { "published": { "$eq": "2015-11-04" } } 
             , { "rating"   : { "$gt": 4 } }
             ]
}

However, this could be further simplified with some syntactic sugar as follows :

{
    "published" : "2015-11-04",
    "rating"    : { "$gt" : 4 }
}

So finally:

SQL Desired JSON
SELECT title, rating  
  FROM article 
  WHERE published = '2015-11-04' AND rating > 4 
  ORDER BY rating DESC 
  LIMIT 10
  OFFSET 10
{
    "columns"  : ["title", "rating"],
    "from"     : "article",
    "where"    : {
        "published" : "2015-11-04",
        "rating"    : { "$gt" : 4 }
    },
    "order_by" : "-rating",
    "limit"    : 10,
    "offset"   : 10
}

1.2 Response

The response of an SQL SELECT statement is zero or more rows.
Each row can be mapped to an object in JSON, with the entire result being an array of objects.

SQL JSON
+---------+------------+
|  title  |   rating   |
+---------+------------+
|  Hello  |      5     |
+----------------------+
|  World  |     4.9    |
+----------------------+
[
    {
        "title"  : "Hello",
        "rating" : 5
    },
    {
        "title"  : "World",
        "rating" : 4.9
    }
]

2. SELECT queries with JOINs

Typically, JOINs in SQL are used to de-normalize data (in a sense, resolving data relationships captured by foreign key constraints). For example, to resolve the 'author' information of an article, the query would be as follows :

SELECT title, rating, author.id, author.name  
  FROM article 
  JOIN author
    ON article.author_id = author.id

Instead of coming up with an alternative syntax for JOINs, we'll approach JOINs more implicitly through the use-case of denormalization.

To begin, instead of looking at the query translation, let's look at ways to represent denormalized data in JSON.

2.1 Denormalized data in JSON

schema

Example denormalizations for this schema:

Article and it's author Author and his articles
{
    "title"  : "Hello",
    "rating" : 5,
    "author" : {
        "name" : "vamshi",
        "id"   : 1
    }
}
{
    "id"       : 1,
    "name"     : "vamshi",
    "articles" : [
        {
            "title"  : "Hello",
            "rating" : 5
        },
        {
            "title"  : "World",
            "rating" : 4.9
        }
    ]
}

As seen above, it seems natural to extend the object and array syntax, to represent relationships. This is already an established way of representing relationships in document stores.

If we call the author of an article an object_relationship, and articles of an author an array_relationship, the comparison with ORMs is as follows :

sqlalchemy rails JSON
ManyToOne belongs_to object_relationship
OneToMany has_many array_relationship

2.2 Denormalization queries in JSON

Object relationships Array relationships
Fetch articles and each one's author  
Fetch authors and each one's articles  
{
    "columns"  : [
        "title",
        "rating", 
        { "name"    : "author",
          "columns" : ["id", "name"] 
        }
    ],
    "from"     : "article"
}
{
    "columns"  : [
        "id",
        "name",
        { "name"    : "articles",
          "columns" : ["title", "rating"] 
        }
    ],
    "from"     : "author",
    "where"    : { "id" : 1 }
}

2.3 Arbitrary depth denormalization

Fetch 1 author, his articles, and the category for each article  
Request Response
{
    "columns"  : [
        "id",
        "name",
        { "name"    : "articles",
          "columns" : [
              "title",
              "rating",
               { "name"    : "category"
               , "columns" : ["id", "name", "desc"]
               }
          ]
        }
    ],
    "from"     : "author",
    "limit"    : 1
}
[{
    "id"       : 1,
    "name"     : "Ramu",
    "articles" : [
        {
            "title"  : "Hello",
            "rating" : 5,
            "category" : { "name" : "general", 
                           "desc" : "..." }
        },
        {
            "title"  : "World",
            "rating" : 4.1,
            "category" : { "name" : "programming", 
                           "desc" : "..." }
        }
    ]
}]

3. JOINs with WHEREs

Denormalization via the notion of relationships in our data, thus mostly mitigate the need for explicit joins.
Let us push this idea of relationships even further.

3.1. WHERE inside the nested data

What if we allowed a where clause, inside the nested column specification?

{
    "columns"  : [
        "id",
        "name",
        { "name"    : "articles",
          "columns" : ["title", "rating"],
          "where"   : { "rating" : { "$gt" : 4.5 } }
        }
    ],
    "from"     : "author"
}

We're not fetching authors whose articles have a rating > 4.5, we're fetching the articles of the authors that have a rating > 4.5 for every author.

We're filtering the nested array on properties of objects within the array. Going all the way, and adding order-ing, limit-ing we can do some pretty cool stuff.

This functionality is similar to the recently introduced inner_hits in elasticsearch.

3.3. Top level WHERE clause using nested object properties

We can flip the idea above, and use the property of a nested object to filter our top-level result.

Articles whose author relationship holds this property { "name" : "vamshi" } All authors who have written at least one article with rating greater than 4
{
    "columns" : ["title", "rating"]
    "from"    : "article"
    "where"   : {
        "author" : { "name" : "vamshi" }
    }
}
{
    "columns" : ["id", "name"]
    "from"    : "author"
    "where"   : {
        "articles" : { "rating" : { "$gt" : 4 } }
    }
}

Comparision with GraphQL

While there is a fair amount of overlap between this kind of a JSON query syntax and GraphQL, both aim to solve slightly different fundamental problems.

JSON queries to SQL GraphQL
Designed to be used with an RDBMS Backend agnostic
Using schema information, compiler should be able to execute all JSON queries within this syntax Write server side code to fetch data for predefined query 'templates'
A query is just JSON data GraphQL data format

One subtle yet extremely important difference is that GraphQL is not really a query language in the traditional sense. It is as much of a query language as you want it to be (by writing code on a GraphQL server).

Conclusion

The next parts in this series will address Row Level Security or "permissions", without which none of this would make sense for direct client access. Subsequently, we'll tackle schema-altering JSON queries too.

HasuraDB

HasuraDB is an implementation of this JSON query compiler on top of Postgres, and applies application-user permissions to ensure safe data access.

Take this for a spin on the demo account and let us know what you think at @HasuraHQ!

Vamshi Surabhi

After working on JVM internals for his master's work, Vamshi continued writing compilers with HasuraDB. He is the main architect and engineer behind the JSON to SQL compiler that powers HasuraDB.

Chennai
comments powered by Disqus