Community Story

Genomic ETL and digital custody with General Bioinformatics

Bioinformatics is the discipline of bringing together scientific knowledge and computing knowledge for a combined effect of letting scientists unleash greater insights and understanding of the problem sets they are trying to solve. It involves working with datasets around genomics, standard deviations in genomics, and countless supporting data models.

Interview with Dr. Liz Reynolds, Dan Stein, and Eleanor Poulter

General Bioinformatics is a company that collects, aggregates, and sanitizes this data for the broader scientific community by focusing on the requirements needed for a research project. They act as the data lake and the chain of custody for genomic information. With their domain expertise, they are able to identify which data abnormalities are noisy versus meaningful.

[On the business impact of Hasura] So now that long-tail where the 20% use case is, is open to us. And what’s really exciting about that from business terms is that it's open to us, and it never used to be open to us, but it’s where the scientists really value you – because they haven’t got a different way of doing it. Everyone can do the 80% use cases, hitting that long-tail is really hard.

Dr. Liz REynolds CEO of General Bioinformatics

What is General Bioinformatics using Hasura for?

General Bioinformatics uses Hasura to join a wide array of different datasets into a unified API layer. The nature of genomic information is very graph-like and lends itself to being queried as a graph. The majority of this information gets collected into relational databases, and instead of parsing the transforming and loading the information into a bespoke industry utility like SPARQL, or some RDF data format, General Bioinformatics leverages the auto-generated GraphQL API to get graph relationships from these relational data structures.

After parsing, annotating, and cleaning the data, they then expose an in-house tool that allows scientists to build their own queries for slices of this data in a self-serve format called Deep Space.

How Hasura helped?

Through this process of running a type of genomic ETL, the team discovered that Hasura allows them to create operational data engineers from scientists. Staffing for such a narrow band of skillsets would normally prove quite challenging. Using Hasura for a low-ops data platform lets the team focus on finding scientists that understand the data, and then turn them into data engineers. Focusing on modeling the data requires domain expertise, and Hasura autogenerates the API layer to access the models.

Hasura allows them to create operational data engineers from scientists.

Feature Highlights

Generated GraphQL API

Having access to a generated API lets the team take on more projects that would otherwise be on such a micro-scale that they would not be commercially viable, but by having an API, become exceptionally valuable.

Remote Schemas

Working with both an internal bespoke GraphQL API as well as Hasura’s auto-generated APIs, remote schemas allow the team to stitch this