Skip to main content
Version: v3.x

Implementing Aggregates

Let's implement aggregates in our SQLite connector.

Like we've done before, we won't implement aggregates in their full generality, and instead we're going to implement two types of aggregates, called star_count and column_count. Other aggregates like SUM and MAX that you know from Postgres will come under the umbrella of custom aggregate functions, and we'll cover those separately in another tutorial.

Let's start by adding the aggregates capability to our capabilities response:

return {
version: "0.1.2",
capabilities: {
query: { aggregates: {} },
mutation: {},
relationships: {},
},
};

Aggregate queries are indicated by the presence of the aggregates field in the query request body. Just like the fields property that we handled previously, each aggregate is named with a key, and has a type, in this case star_count. So we're going to handle aggregates very similarly to fields, by building up a SQL target list from these aggregates.

The NDC spec says that each aggregate should act over the same set of rows that we consider when returning rows. That is, we should apply any predicates, sorting, pagination, and so on, and then apply the aggregate functions over the resulting set of rows.

So assuming we have a function called fetch_aggregates which builds the SQL in this way, we can fill in the aggregates in the response:

const aggregates = request.query.aggregates && (await fetch_aggregates(state, request));

Now let's start to fill in a fetch_aggregates function.

We'll actually copy/paste the fetch_rows function and create a new function for handling aggregates. It'd be possible to extract that commonality into a shared function, but arguably not worth it, since so much is already extracted out into small helper functions anyway.

async function fetch_aggregates(
state: State,
request: QueryRequest,
): Promise<{
[k: string]: unknown;
}> {}

The first difference is the return type. Instead of RowFieldValue, we're going to return a value directly from the database, so let's change that to unknown.

Next, we want to generate the target list using the requested aggregates, so let's change that.

const target_list = [];

for (const aggregateName in request.query.aggregates) {
if (Object.prototype.hasOwnProperty.call(request.query.aggregates, aggregateName)) {
const aggregate = request.query.aggregates[aggregateName];
switch (aggregate.type) {
case "star_count":
// TODO
case "column_count":
// TODO
case "single_column":
// TODO
}
}
}

For now, we'll handle the first two cases here, and save the last for when we talk about custom aggregates.

In the first case, we want to generate a target list element which uses the COUNT SQL aggregate function.

case 'star_count':
target_list.push(`COUNT(1) AS ${aggregateName}`);
break;

In the second case, we'll also use the COUNT function, but this time, we're counting non-null values in a single column:

case 'column_count':
target_list.push(`COUNT(${aggregate.column}) AS ${aggregateName}`);
break;

We also need to interpret the distinct property of the aggregate object, and insert the DISTINCT keyword if needed:

case 'column_count':
target_list.push(`COUNT(${aggregate.distinct ? 'DISTINCT ' : ''}${aggregate.column}) AS ${aggregateName}`);
break;

Now let's update our generated SQL to use the generated target list:

const sql = `SELECT ${target_list.length ? target_list.join(", ") : "1 AS __empty"} FROM (
(
SELECT * FROM ${request.collection} ${where_clause} ${order_by_clause} ${limit_clause} ${offset_clause}
)`;

Note that we form the set of rows to be aggregated first, so that the limit and offset clauses are applied correctly.

And instead of returning all rows, we're going to assume that we only get a single row back, so we can match on that and return the single row of aggregates:

const result = await state.db.get(sql, ...parameters);

delete result.__empty;

if (result === undefined) {
throw new InternalServerError("Unable to fetch aggregates");
}

return result;

That's it, so let's test our connector one more time, and hopefully see some passing tests this time.

ndc-test test --endpoint http://localhost:8080 --snapshots-dir snapshots

...
├ Query ...
│ ├ albums ...
│ │ ├ Simple queries ...
│ │ │ ├ Select top N ... OK
│ │ │ ├ Predicates ... OK
│ │ │ ├ Sorting ... OK
│ │ ├ Relationship queries ...
│ │ ├ Aggregate queries ...
│ │ │ ├ star_count ... OK
│ │ │ ├ column_count ... OK
│ │ │ ├ single_column ... OK
...

Note that ndc-test is now testing aggregates automatically, since we turned on the aggregates capability.

And let's check that we're generating the right SQL. Picking a random example from the logs, we can see that we are indeed generating well-formed SQL:

SELECT
COUNT(id) AS id_count,
COUNT(DISTINCT id) AS id_distinct_count,
COUNT(name) AS name_count,
COUNT(DISTINCT name) AS name_distinct_count
FROM (
SELECT * FROM artists LIMIT 10
)