With Bulk API Data
In this tutorial we'll see how to connect to an API source that has some bulk data we want to bring into PromptQL.
This is what we'll do:
- We will set up a connector that has a DuckDB source
- We will set up a job to load data from our API source
We're loading data into DuckDB for this example, but you could load data into any database that has a supported connector (e.g., PostgreSQL, MongoDB, ClickHouse). We're going to use TypeScript to write a loading script to load data — but how you choose to load data is completely up to you.
Prerequisites
Install the DDN CLI
To use this guide, ensure you've installed/updated your CLI to at least v2.28.0
.
- macOS and Linux
- Windows
Simply run the installer script in your terminal:
curl -L https://graphql-engine-cdn.hasura.io/ddn/cli/v4/get.sh | bash
Currently, the CLI does not support installation on ARM-based Linux systems.
- Download the latest DDN CLI installer for Windows.
- Run the
DDN_CLI_Setup.exe
installer file and follow the instructions. This will only take a minute. - By default, the DDN CLI is installed under
C:\Users\{Username}\AppData\Local\Programs\DDN_CLI
- The DDN CLI is added to your
%PATH%
environment variable so that you can use theddn
command from your terminal.
Install Docker
The Docker-based workflow helps you iterate and develop locally without deploying any changes to Hasura DDN, making the
development experience faster and your feedback loops shorter. You'll need Docker Compose v2.20
or later.
Validate the installation
You can verify that the DDN CLI is installed correctly by running:
ddn doctor
Tutorial
Step 1. Scaffold out a new local project
ddn supergraph init bulk-data --with-promptql && cd bulk-data
Step 2. Add the hasura/duckduckapi
connector
ddn connector init github -i
cd app/connector/github && npm i
Step 3. Initialize a table and sample data
Open the file app/connector/github/index.ts
and define your DuckDB schema there.
// ...
const connectorConfig: duckduckapi = {
dbSchema: \`
-- Create repositories table with commonly needed fields
DROP TABLE IF EXISTS repositories;
CREATE TABLE repositories (
id INTEGER PRIMARY KEY,
name VARCHAR NOT NULL,
description TEXT
);
-- Sample data
INSERT INTO repositories (id, name, description)
VALUES (1, 'my-project', 'A sample repository');
\`,
functionsFilePath: path.resolve(__dirname, './functions.ts'),
};
// ...
The hasura/duckduckapi
connector uses a convention wherein you'll create a schema in an underlying DuckDB database
that represents your API's bulk data. Later, we'll create a loader function that will then use this schema to persist
bulk data from your API in this DuckDB instance.
Step 4. Add the metadata
Once we create new entities in our sources, we need to get them into our project's metadata. This allows the AI assistant to access that data via PromptQL.
# Grab the model definitions
ddn connector introspect github
# Check out what models are available to track. You'll see some sample ones which you can ignore for now.
ddn model list github
# Add the repositories model
ddn model add github repositories
Step 5. Create and run a new build
ddn supergraph build local
ddn run docker-start
ddn console --local
Head over to the PromptQL Playground and see if the AI assistant is able to access your repositories.
What repositories do I have?
PromptQL will return something like:
You have one repository in your account. This repository is named "my-project" and is described as "A sample repository".
Step 6. Set up a job to continuously load data (optional)
Adding a job to load data can be done by kicking off an async task from our DuckDuckAPI connector.
Head over to app/connector/github/index.ts
and add the following code right after the connector starts:
// import statements...
// schema initialization...
async function insertData() {
const db = await getDB();
setInterval(async () => {
try {
const timestamp = new Date().toISOString();
await db.all(\`
INSERT INTO repositories (id, name, description)
VALUES (
(SELECT COALESCE(MAX(id), 0) + 1 FROM repositories),
'project-\${timestamp}',
'Automatically inserted at \${timestamp}'
)
\`);
console.log(\`Inserted new repository at \${timestamp}\`);
} catch (err) {
console.error('Error inserting data:', err);
}
}, 1000);
}
(async () => {
const connector = await makeConnector(connectorConfig);
start(connector);
// Kick off an insert data job
insertData();
})();
A real-world example
The steps above help you get started by understanding how to setup DuckDB, how to get a connection to it and how to start inserting data into it that comes from another source.
In a production ready example, you'll need to:
- Connect to another API securely
- Incrementally pull in updates after the initial sync is done
- Handle API rate limits
- Persist data incrementally
- Recover from failures and process restarts
Check out the code at PromptQL Github example and start reading through the code at app/connector/github/index.ts to see how to put together a real-world bulk data from an API connector!