Version: PromptQL

With Bulk API Data

In this tutorial we'll see how to connect to an API source that has some bulk data we want to bring into PromptQL.

This is what we'll do:

We will set up a connector that has a DuckDB source
We will set up a job to load data from our API source

We're loading data into DuckDB for this example, but you could load data into any database that has a supported connector (e.g., PostgreSQL, MongoDB, ClickHouse). We're going to use TypeScript to write a loading script to load data — but how you choose to load data is completely up to you.

Prerequisites

Install the DDN CLI

Minimum version requirements

To use this guide, ensure you've installed/updated your CLI to at least v2.28.0.

macOS and Linux
Windows

Simply run the installer script in your terminal:

curl -L https://graphql-engine-cdn.hasura.io/ddn/cli/v4/get.sh | bash

ARM-based Linux Machines

Currently, the CLI does not support installation on ARM-based Linux systems.

Download the latest DDN CLI installer for Windows.
Run the DDN_CLI_Setup.exe installer file and follow the instructions. This will only take a minute.
By default, the DDN CLI is installed under C:\Users\{Username}\AppData\Local\Programs\DDN_CLI
The DDN CLI is added to your %PATH% environment variable so that you can use the ddn command from your terminal.

Install Docker

The Docker-based workflow helps you iterate and develop locally without deploying any changes to Hasura DDN, making the development experience faster and your feedback loops shorter. You'll need Docker Compose v2.20 or later.

Validate the installation

You can verify that the DDN CLI is installed correctly by running:

ddn doctor

Tutorial

Step 1. Scaffold out a new local project

ddn supergraph init bulk-data --with-promptql && cd bulk-data

Step 2. Add the `hasura/duckduckapi` connector

Initialize the connector and choose hasura/duckduckapi from the list:
ddn connector init github -i

Move into the connector directory and install the dependencies:
cd app/connector/github && npm i

Step 3. Initialize a table and sample data

Open the file app/connector/github/index.ts and define your DuckDB schema there.

We'll start by creating a repoistories entity:
// ...

const connectorConfig: duckduckapi = {
  dbSchema: \`
    -- Create repositories table with commonly needed fields
    DROP TABLE IF EXISTS repositories;
    CREATE TABLE repositories (
        id INTEGER PRIMARY KEY,
        name VARCHAR NOT NULL,
        description TEXT
    );

    -- Sample data
    INSERT INTO repositories (id, name, description)
    VALUES (1, 'my-project', 'A sample repository');
  \`,
  functionsFilePath: path.resolve(__dirname, './functions.ts'),
};

// ...

Why are we doing this?

The hasura/duckduckapi connector uses a convention wherein you'll create a schema in an underlying DuckDB database that represents your API's bulk data. Later, we'll create a loader function that will then use this schema to persist bulk data from your API in this DuckDB instance.

Step 4. Add the metadata

Once we create new entities in our sources, we need to get them into our project's metadata. This allows the AI assistant to access that data via PromptQL.

# Grab the model definitions
ddn connector introspect github

# Check out what models are available to track. You'll see some sample ones which you can ignore for now.
ddn model list github

# Add the repositories model
ddn model add github repositories

Step 5. Create and run a new build

Create a new local build:
ddn supergraph build local

Run your services:
ddn run docker-start

In a new terminal tab, open the devlopment console:
ddn console --local

Head over to the PromptQL Playground and see if the AI assistant is able to access your repositories.

What repositories do I have?

PromptQL will return something like:

You have one repository in your account. This repository is named "my-project" and is described as "A sample repository".

Step 6. Set up a job to continuously load data (optional)

Adding a job to load data can be done by kicking off an async task from our DuckDuckAPI connector.

Head over to app/connector/github/index.ts and add the following code right after the connector starts:

// import statements...
// schema initialization...

async function insertData() {
  const db = await getDB();

  setInterval(async () => {
    try {
      const timestamp = new Date().toISOString();
      await db.all(\`
        INSERT INTO repositories (id, name, description)
        VALUES (
          (SELECT COALESCE(MAX(id), 0) + 1 FROM repositories),
          'project-\${timestamp}',
          'Automatically inserted at \${timestamp}'
        )
      \`);
      console.log(\`Inserted new repository at \${timestamp}\`);
    } catch (err) {
      console.error('Error inserting data:', err);
    }
  }, 1000);
}

(async () => {
  const connector = await makeConnector(connectorConfig);
  start(connector);

  // Kick off an insert data job
  insertData();
})();

A real-world example

The steps above help you get started by understanding how to setup DuckDB, how to get a connection to it and how to start inserting data into it that comes from another source.

In a production ready example, you'll need to:

Connect to another API securely
Incrementally pull in updates after the initial sync is done
Handle API rate limits
Persist data incrementally
Recover from failures and process restarts

Check out the code at PromptQL Github example and start reading through the code at app/connector/github/index.ts to see how to put together a real-world bulk data from an API connector!

Prerequisites​

Tutorial​

Step 1. Scaffold out a new local project​

Step 2. Add the hasura/duckduckapi connector​

Step 3. Initialize a table and sample data​

Step 4. Add the metadata​

Step 5. Create and run a new build​

Step 6. Set up a job to continuously load data (optional)​

A real-world example​

Was this helpful?