GraphQL Resolvers with Apollo

In this article, we will discuss about two topics that concern resolvers and performance. So far, and by default, everytime a resolver is invoked, this one execute its actions which is mostly be to fetch the data. The problem is that in a graph there is potentially redundant information that will be fetched several time causing performance issue.

For example, a query on an object that is cyclic with cyclic information will cause duplication of call. Imagine querying for obj1->obj2->obj1->obj2.

The problem becomes gargantuan with an array of object. Imagine that you have a single query for each type that is in a big array, you would perform many hundred or thousand of requests while in practice you probably would have use a special endpoint that return a batch of all the information.

The good news is that GraphQL has the concept of resolving at many levels. It is possible to resolve at the root level, which mean directly at the query level. But, alos at any edge which is great for an edge into an array of object or a heavy object that require special need. How, it is possible to resolve at the field level which can also be interesting in the case of a particular field that needs to be tackled differently of its general type.

Three different resolvers: Query, Edges and Fields

The two concepts we will investigate is “look-ahead” and “batching”. The look ahead is the idea of looking in the query schema and performing chirurgical analysis of what is requested. Batching is the of collecting all the desired data to fetch and fetch it once we are done traversing the tree. It means that if in the graph we have several times the same entity to query that we will only do it once — at the end. From these two summaries, it is clear that one is to optimize the query in term of figuring out which would be the best while the second is to avoid redundant calls. The former can help for avoiding calling several endpoints by redirecting the logic into a single endpoint while the latter removes querying the same element.

Look-ahead

A parent children is the common scenario. Imagine a parent who has many children. GraphQL by default will call the resolver for the parent and then will call a single resolver by children. If you have the resolver of the parent fetching the parent data (1 HTTP request) and then one fetch at each child (1 HTTP request multiplied by the number of children) it can become not performant. Even if the GraphQL is connected directly to a database, it would not be performant on a big list of children. The database scenario is often easier to visualize. Instead of making several SELECT statement with a WHERE clause that specify a single child ID, we would do a SELECT statement with a IN clause that specify the array of IDs. That way, it would return a single query response with many rows. In REST, if you have an endpoint that allows the parent to expand the children, you can use that endpoint instead of the one that only return the immediate parent attribute.

In GraphQL, you can peek at what is being queried. The look-ahead notion is the exploration of what the user specified in the query. The information is available in the fourth parameter of the query. The parameter’s type is “GraphQLResolveInfo”. You can use a NPM package named “graphQLFields” that will give you an easy way to access the information.

const fields = graphQLFields(graphQLResolveInfo);

Once you have extracted all the fields, you can check if the children node is being requested. If not, you can fetch the parent information without the additional payload (SELECT directly the ID without further data from children).

 if (fields.sites !== undefined){
    // Perform a more exhaustive query that will save us many small request
}

There is still one issue with the look-ahead: the children resolver is still called and will still perform the request. How can we notify the children that we already have everything we need in a clean way? This is where batching come in.

Batching

Batching is doing two things: cache and batch many ids. The whole idea is that instead of calling directly your SQL or REST endpoints, you call the DataLoader. It is a layer of abstraction that will check if we already have a promise for the key requested. If so, it returns the existing promise. The promise can be already resolved which would be very fast. The DataLoader library is a NPM package that has its own TypeScript definition file which is convenient if you are writing your code in TypeScript.

Naturally, the DataLoader is taking an array of the key. Even if you want to request for a single element, the DataLoader will presume that you query for a collection. I will not go in this article about pattern that you can use other than mentioning that you could look at the number of ids passed in the DataLoader and take a smart decision about how to fetch the data. Worth mentioning, the load function of the DataLoader that is needed to get the information from the cache or the code inside the data loader (to fetch) can be invoked multiple times. The DataLoader will coalesce all singular loads which occur within a single tick and then call your batch loading function.

An effective way to work with DataLoader is to have a single DataLoader by way to query the information. For example, if you query a “parent” entity by id, you would have a DataLoader for “parent” by “id”. You will have one for “parent” by “name” and one for “child” by “id”, etc. The separation might sound redundant but a single GraphQL query does not ask for many entities in a different way, hence does not duplicate much.

A good way to keep everything tidy up is to define a class into which we can inject the current user’s request. It gives all the security information like any authentication bearer token that the fetching code might need. The class trickle down the context information (user’s HTTP request) by having the request passed in its constructor parameter down to the service that will fetch the data. In the following code, you can see the pattern.

export class DataLoaders {
    private dataSources: GraphQLCustomDataSources;
    public getParentByParentId: DataLoader<number, Parent>;
    public getChildByChildId: DataLoader<number, Child>;
    public getChildrenByParentId: DataLoader<number, Child[]>;

    public constructor(requestTyped: IUserRequest) {
        this.dataSources = {
            sourceApi1: new Api1HttpService(requestTyped),
            sourceApi2: new Api2HttpService(requestTyped)
        };

        this.getParentByParentId = new DataLoader<number, Cache[]>(parentIds => {
            const proms: Promise<Parent[]>[] = [];
            for (let i = 0; i < parentIds.length; i++) {
                proms.push(this.dataSource.sourceApi1.getParent(parentIds[i]));
            }
            return Promise.all(proms);
        });

        // And so on for each DataLoader...

    }
}

The code above is a short version of what it can be with two entities: Parent and Child. In reality, you would have way more DataLoader and might want to breakdown each detail into a separated file and use the DataLoaders class as a facade to all the logic. The goal here is to have a single point of initialization to get the HTTP request passed down to the implementation of the data source.

Still, there is an issue. We are caching the DataLoader of the Parent entity, not the Child entity. It means that when the GraphQL traverse and invokes the children resolver, that this one will call the DataLoader that request the child information by child id, not by parent ID. There are many patterns. You could invoke the parent DataLoader and check if the data is already present. You can also have the parent DataLoader primes the child DataLoader. Priming the data means to set in another cache the data. The following code can be added to the DataLoader previously built. Now, the GraphQL invokes the DataLoader of the parent, get the data and populate the parent’s cache. Because it has the information about the children, it loops the collection and primes the child’s DataLoader as well. The traversal continues and the child’s resolver calls the child’s DataLoader that has a promise resolved with the child data.

children.forEach(c => {
      this.getChildByChildId.prime(c.id, c);
});

From there, you instantiate the class once in the Apollo’s server configuration. The instantiation will occur at every request, hence no data is mixed between users.

async function apolloServerConfig() {
    const serverConfig: ApolloServerConfig = {
        schema: schemas,
        context: (context: GraphQLCustomResolversContext) => {
             const newContext: GraphQLCustomContext = {
                loaders: new DataLoaders(requestTyped)
            };
            return newContext;
        },
    // ...
    }
}

Summary

The DataLoader library is useful to cache data during a single request when GraphQL is traversing the tree. A parent node can look-ahead and load in batch information reducing the number of future requests. The DataLoader library cache the result for each DataLoader. In the code presented, the DataLoader was filling up the parent loader which might not be useful in the situation but by priming the child’s DataLoader jettisoned all costly subsequent in the child’s resolver.

Related GraphQL Articles

How to setup a TypeScript, NodeJS, Express Apollo Server to easy debugging with VsCode

There is a lot of keyword in the title but this is not a clickbait, we will setup without too much burden a simple configuration that will allow Visual Studio Code (VsCode) to hook into a GraphQL installation. The idea is that everytime a TypeScript file is saved that automatically the file is transpiled into JavaScript and to have Node reboot. The solution I propose can do the transpilation, the schema transportation and the restart of NodeJS under 2 seconds.

NPM

The first step is to get some NPM packages. The first one is named concurrently which will allow from a single line, a single NPM command to execute multiple commands. This is required to have TypeScript in watch mode, having a file watcher for the GraphQL schemas and restarting node if any of the previous two changes. The second is the package cpx which can watch for file and copy them if something changes. The third is TypeScript that will watch all TypeScript file for changes and build into the output folder. The fourth package is nodemon that monitor changes in file. If a file change, it restart Node.

"concurrently": "^4.1.0",
"cpx": "^1.5.0",
"typescript": "^3.2.2",
"nodemon": "^1.18.8"

Then few NPM scripts are required.

"dev": "concurrently \"tsc -w\" \"npm run watchgraphql\" \"nodemon build/dist/index.js\"",
"debug": "concurrently \"tsc -w\" \"npm run watchgraphql\" \"nodemon --inspect build/dist/index.js\"",
"watchgraphql": "cpx 'src/graphql/schemas/**/*.graphql' build/dist/graphql/schemas/ -w -v"

There are two main scripts. The dev and debug. I mostly run the second one because it does the same as the first one with the addition of opening a port for VsCode to connect to debug the NodeJS (Express) server. What it goes is to start concurrently TypeScript in watch mode (-w), run the watchgraphql and nodemon to watch every file (produced JavaScript and GraphQL schema files. The GraphQL’s schemas have there own extension “.graphql” and are not moved like TypeScript during the transpilation. Hence, it requires a separate process to move the file when we edit the file.

Visual Studio Code

Finally, within Visual Studio you need to create a debug launch configuration. The creation occurs in the fourth button of the menu, the one with a bug. It is possible to select “Add Configuration” in the dropdown to create a new debugging configuration. Here is the one I am using:

{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Attach to node",
            "type": "node",
            "request": "attach",
            "restart": true,
            "port": 9229
        }
    ]
}

It will attach to an existing instance. It means that if you want to debug the startup of the NodeJS server that it does not work. You will need to change the way to avoid using nodemon and to have the VsCode debugger starting the NodeJS server. It is not a use case that I need, hence I do not have it configured.

Once you have these scripts in place and the VsCode configuration saved. You can click the play button or F5 to start debugging. It takes about 1-2 seconds to hook to the process. Any breakpoint you have set will stop the process and gives you time to explore the variables. If you do a change during the debug, NodeJS will restart and the debugging will stop and restart as well.

HTTP Request Debugging

I am using Axios, but other libraries allow also to shim a proxy to inspect HTTP request and response. It is very valuable when debugging an Apollo GraphQL server because you cannot do like with a web application and use Chrome’s network tab. A trick with Axios is when you configure the AxiosRequestConfig to set a proxy in the HTTP header of the request that point to your computer.

config.proxy = {
    host: "127.0.0.1",
    port: 5555
};

Then, installing a tool like Postman at the specified port is enough to receive every request with the proper HTTP headers in place.

Summary

I am far from being an expert concerning developing server side application with NodeJS. I was impressed of the quick result. Within an hour, I had a development environment that was efficient to develop and to debug. The experience of debugging directly in TypeScript is awesome and to see every request and response swimming through the proxy tool is priceless when it is the time to understand what is coming out and in the GraphQL’s resolvers.

Some other GraphQL articles

Apollo GraphQL Resolvers and Data Source separation

Resolvers can perform asynchronous call to fetch data. Using the specific implementation to perform the fetch in the resolver cause the resolver to have high-coupling into the technology to access the data but also increase the complexity of the resolver by boosting the size of each function with the request’s details. Apollo’s resolver allows to pass from the Apollo server’s configuration a set of data sources that can be used down the line by every resolver.

Resolvers and DataSource Relationship

Similar to resolvers, datasources is a member is defined in the ApolloServerConfig. The member is a function that returns a map (dictionary) of data sources. The datasources is strongly typed as “DataSrouces<T>” where T is your custom context.

dataSources: () => {
    const dataSources: DataSources<GraphQLCustomContext> = {
        dataSourceName1: new MyDataSource1()
    };
    return dataSources;
},

I create one data source per domain because all Rest and gRPC service are already divided by business domain. However, there is no conscribed way to split the data sources. Each data source is class that eventually must inherit extends DataSource<T> where T is the custom context you have defined for your GraphQL server. 

In the Netflix Open Connect GraphQL, I decided to have all my REST data source inherit a generic base class that handles all Axios code. The REST class inherits the DataSource from “apollodatasource” package. The REST encapsulate the Axios package which can be changed later for something else without having to touch resolvers.

Data Sources and Services

It worth pointing that a single data source can use more than a single service. A data source class has many functions for different requests and each of the function can request to one or many services.

Resolvers are now getting access to the data sources by reaching the context which is the third parameters.

export const bookResolversMap = {
    Query: {
        book: async (source: null, args: { bookId: number }, context: GraphQLCustomResolversContext, graphQLResolveInfo: GraphQLResolveInfo) => {
            try {
                return await context.dataSources.bookService.getBook(args.bookId);
            } catch (e) {
                console.error(e);
            }
        }
    }
};

One question might rises. How is the context having the data source? A keen observer might have spotted that resolvers are not using the type GraphQLCustomResolversContext for the context instead of the GraphQLCustomContext has in all the previous article of this series. The type remains like the original context for everywhere in the application, except for the resolvers. The data sources are injected into the context by Apollo. The TypeScript types for GraphQL context is now in three interfaces.  A first one that contains all the data source, one that contain the actual context which we previously defined to contain user’s information. Finally, a third one that has the data source and extends the custom context.

export interface GraphQLCustomDataSources {
    dataSource1: MyDataSource1;
}

export interface GraphQLCustomContext {
    userRequest: IUserRequest;
}

export interface GraphQLCustomResolversContext extends GraphQLCustomContext {
    dataSources: GraphQLCustomDataSources;
}

Summary

In this article, we saw how to move a piece around to have a cleaner architecture. The separation of concern allows changing piece without potentially breaking other pieces of code. The division of the task increases the easiness to create unit tests. The little job that each part must conduct simplify the understanding of the code and increase the reusability. In the next article, I’ll present how to debug the NodeJS server, Apollo and all the part we already have a setup which will simplify diagnostic when something is going south.

My Other GraphQL Posts

GraphQL Query with Argument

A part of me wish that the flexibility of GraphQL would extend further in term of parameter to a query. The reality is that the maintainer of a GraphQL service must provide a set of acceptable inputs. The rational is that GraphQL needs to query a services or any source of data in a known way. For example, if the REST endpoint takes an ID to fetch a specific entity, it would not make sense to take for input the NAME of the entity — the service does not support it. The explicit behaviors of GraphQL is a major difference compared to SQL (Structured Query Language). With the Graph Query Language, the requested fields inside the graph is flexible but not the inputs.

In this article, we will see how to create a query that take an input that can be used by an HTTP request against a web service in REST. The first thing is to define a GraphQL type definition, inside the query object, that accept the input. The following code adds a second function in GraphQL to request a particular book by id. 

type Query {
    books: [Book]
    book(bookId: Int): Book
}

The next step is to define the resolver. For the Book, in the previous article, I was using hardcoded value in a variable. It makes the query quite simple:

export const bookResolversMap = {
    Query: {
        books: (obj: null, args: {}, context: GraphQLCustomContext) => {
            return books;
        },
        book: (obj: null, args: { bookId: number }, context: GraphQLCustomContext) => {
            return books[args.bookId];
        }
    }
};

Instead of accessing a variable, the resolver can perform an AJAX call. The first “A” of AJAX is for “asynchronous”, hence we need to modify the function to be async. Then, it is a matter of calling the endpoint that has the values and to return it. Here is the resolver async that perform an HTTP request.

export const bookResolversMap = {
    Query: {
        books: (obj: null, args: {}, context: GraphQLCustomContext) => {
            return books;
        },
        book: async (source: null, args: { bookId: number }, context: GraphQLCustomResolversContext, graphQLResolveInfo: GraphQLResolveInfo) => {
            try {
                const axiosResponse = await axios(requestHere /*uses book id from args*/);
                return axiosResponse;
            } catch (e) {
                console.error(e);
            }
        }
    }
};

The resolver works as expected. One architectural issue is that high-cohesion between the resolver and the library Axios. Axios is a library that perform Ajax calls. It would be better to not have a strong dependency between the resolver and Axios. Mostly because in the future, the book data source might change for Redis, Sql Database or a gRPC client. The resolver sole tasks must be to take the decision about which endpoint to call and ensuring the data is cached and structures as expected and not to handle how to create the Ajax request (or SQL queries, or gRPC request). In the next article, I’ll modify the resolver to be agnostic of the fetching technology.

GraphQL Interesting Articles