How to Collect HTTP Request Telemetry with GraphQL and RESTDataSource?

I have been working in a GraphQL solution at Netflix for few months now. I am always keen to have insight about what is going on. While I created from the ground-up the GraphQL server with NodeJS I had in mind to find a way to give me, the software engineer, some insights by providing some direct feedback in the bash console running the server but also to send the data to an ElasticSearch cluster. Before rendering or sending the telemetry, we need to collect it. In this article, I’ll write how I proceeded to get information on individual HTTP calls that the GraphQL query invoke to retrieve the information while resolving a graph of objects.

The solution is oriented toward the library RESTDataSource but it is exactly the same principle with Axios or other HTTP request library. The idea is to subscribe to a global event that is invoked when a request starts and when a response comes back. By having a hook at the beginning and the end, it is possible to collect the time elapsed without having to code something on every request.

RESTDataSource Override

In the case of RESTDataSource, it is a matter of overriding the willSendRequest function. It takes a request parameter. We will use the request to add in the HTTP’s header a unique identifier that will give the response function a reference to the originator. The second function to override is didReceiveResponse. This one has two parameters: the request and the response.

The willSendRequest function performs three actions. The first one is to generate a GUID that will serve as a unique identifier. It is added into the HTTP header. Then, the second action is to add in the GraphQL’s context a collection of HTTP requests. I created a type that will keep track of the time, the total bytes received, the URL, query string, the starting time and also the unique request identifier (GUID). The unique identifier is needed for the second function.

export interface HttpStatsEndpoints {
    requestUuid: string;
    url: string;
    urlSearchParams: URLSearchParams;
    elapsedTimeMs: number;
    totalBytes: number;
}

The didReceiveResponse function gets a response, but also the request object. Because we have the request, we can peek at the GUID and extract from the GraphQL context the information and subtract the actual time from the time saved when the request started. The number of bytes and the elapsed time is saved in the context until read by the GraphQL Extension.

public didReceiveResponse(response: Response, request: Request): Promise<any> {
    return super.didReceiveResponse(response, request).then(result => {
        const requestId = request.headers.get(HEADER_REQUEST_UUID);
        const startTime = request.headers.get(HEADER_START_TIME);
        const httpRequest = this.context.stats.httpRequests.find(d => d.requestUuid === requestId);
        if (httpRequest !== undefined &amp;&amp; startTime !== null) {
            const totalNanoSecondsElapsed = process.hrtime();
            const totalMilliSecondsElaspsed = this.convertNodeHrTimeToMs(totalNanoSecondsElapsed);
            httpRequest.elapsedTimeMs = totalMilliSecondsElaspsed - Number(startTime);
            httpRequest.totalBytes = JSON.stringify(result).length;
        }
        return result;
    });
}

GraphQL Extension

At this point, when all requests are completed and the GraphQL is ready to send a response back, a custom extension can come into play. I covered the detail of a custom GraphQL Extension in a previous post concerning telemetry and how to display it on the console. The idea is the same, this time we can read the GraphQL context and while looping through the telemetry display the bytes and time taken for each request.

Here are some of my GraphQL post of this series

Validating Consumer of GraphQL with Jenkins

I will demonstrate a way to configure a continuous environment that gives insight on if a change on the GraphQL server can impact negatively a consumer. The solution assumes that both code bases are under the same organization, not under the same repository and that Jenkins is used.

I like to start with an image of what I am describing. So here is the solution.

The diagram shows on the top-right the GraphQL engineer who push a change in the repository. The normal continuous integration (CI) kicks a build when it receives a new code. So far, nothing changes from a typical workflow. However, the third step is where it changes. The idea is to create a new Jenkins job that is independent of the GraphQL server and independent of the consumer application. The independence of this job keep both existing builds untouched and the whole ecosystem tidy. The Jenkins jobs wait for the GraphQL job to complete. It is possible to configure the option under Jenkins Build Triggers.

Jenkins’ Trigger

When the GraphQL changes, the Jenkins job fetch the consumer code and fetch the GraphQL code. Do not forget we assumed that we had access to both source code. However, if the GraphQL is not internal, you can always download the GraphQL unified schema and accomplish the same end results. In fact, the next step is to run the GraphQL server script that builds the latest schema (stitches the schema). Then, it runs a tool that validates the consumer code. The tool looks for gql tags inside TSX and TS file (TypeScript) and analyzes all queries against the schema. If a breaking change occurs, the build fails and an email is sent to the engineers to act before it reaches deployment.

The Jenkins’ execute Shell script install NPM because it must run the tool which is from an NPM library. It gets the server because we do not have a graph unified (stitched) schema. Then, it runs the graphql-inspector tool.

npm install
git clone ssh://git@youServer/yourGraphQLServer.git graphql_repo
cd graphql_repo
npm install
npm run generateunifiedschema
cd ..
./node_modules/.bin/graphql-inspector validate ./src/**/*.graphql ./graphql/unifiedSchema.graphql

A special mention, the graphql-inspector does not work well (in my experience) with analyzing .ts and.tsx files. However, it worked perfectly when I moved the GraphQL query into .graphql files.

Most of the time, breaking change should not occur. GraphQL allows additional change and to deprecate fields slowly instead of working with several versions. Nonetheless, having additional awareness is something beneficial and by having the tooling configured to automatically verify potential breakdown we reduce the stress of introducing consequences in production.

My GraphQL Articles

A Single GraphQL’s Schema for Internal and External Users

Imagine the scenario that you have a GraphQL schema that is consumed only by engineers of your organization. However, some engineers work on external facing applications while some other works on internal applications. The difference is subtle, but some information must remain confidential. While it might be more restrictive than most GraphQL that are fully public, there is the benefit that the schema is private and we can leverage that detail.

Internal vs External

In the scenario described, the GraphQL server is deployed into an Amazon VPC — an Amazon Virtual Private Cloud. The environment is secured and the authorization is granted to only a limited to specific security groups.

Internal vs External Environment

The challenge is not to limit the schema definition into an internal and external, but to have the external users not able to craft a request to access unexpected information. Because the GraphQL server is behind the VPC, it is not directly accessible by the user. The external application can communicate to the GraphQL server and every front-ends request to fetch data are proxied by the web server application. The proxy is important. It means that the browser of external users is not directly connected to the GraphQL server. Instead, the browser performs an AJAX call to the respective web server and this one, on the behalf of the user will conduct the GraphQL query. The proxy is conducted with the ProxyPass instruction of Apache.

Internal applications do not have many constraints but keeping the same pattern of proxying is a good habit. It simplifies CORS because the browser performs HTTP requests to the same server, and only underneath it communicate to other servers. It also simplifies the security by having a central point of communication (the web server) to communicate with backend secured services.

GraphQL Single Schema

An obvious solution is to have two schemas: one internal and one external. The solution is the only choice if you need to expose the GraphQL schema externally without exposing the definition of internal fields and entities. However, because I had the constraint of not exposing GraphQL outside, I could simplify the maintainability by having a single schema. The problem with many schemas is that it does not scale very well. First, when adding a field that is good externally, you need to add it twice. Then, when it is time to modify or remove, you need to keep in synchronization the different schemas. Any little boilerplate cause the engineering experience to be a burden but also is prone to errors.

In an ideal world, a single schema exists and we flag the field or entity to be only available internally. That world can exist with the power of GraphQL directive.

The GraphQL Directive Idea

GraphQL allows enhancing the graph’s schema with annotation. Let’s start with the end result which should talk more than any explanation.

type MyTypeA {    
    fieldA: EntityTypeA
    fieldB: EntityTypeA @internalData
}

The idea is to add “@internalData” to every field that must be only visible to internal usage. The annotation marks field but also can mark a whole type.

type MyTypeA @internalData {    
    fieldA: EntityTypeA
    fieldB: EntityTypeA
}

The idea is to have a single schema that had an indication that the field is added into a request will have some consequence. Because it is a single graph, the field appears in the interactive Graphql Playground and is a valid field to request; even externally. However, when invoked, GraphQL at runtime will be able to read the directive and perform a logic. In our case, the logic will be to verify the source of the request and figure out if the request is internal or not. In the case of an internal request, the data will be part of the response. If the source is external, an exception will occur and the field will be undefined.

How to build a GraphQL Directive?

The directive is two parts: one is in the GraphQL language (design time) and one is the logic to perform at runtime.

In any .graphql file, you need to specify the directive to let know GraphQL about its existence. I created a file with the name of the directive and added this single line. The directive indicates that it can be applied to a type (OBJECT) or to a field (FIELD_DEFINITION). The directive could also have arguments. For example, we could have a more advanced need to specify which role can access which field.

directive @internalData on OBJECT | FIELD_DEFINITION

The second part if to handle the directive. When merging all the resolvers and type definitions you can also specify the collection of directives. What you need to pass is a key-value pair with the directive name and the class of the directive (not the object). It means that you do not instantiate (new) the class, but only give a reference to the class.

const schemas = makeExecutableSchema({
    typeDefs: allSchemas,
    resolvers: allResolvers,
    schemaDirectives: {
        internalData: InternalDataDirective,
        }
});

The class must inherit SchemaDirectiveVisitor. Then, because we have specified that it can be applied to a field and a type, we need to override two functions: visitFieldDefinition and visitObject.

export class InternalDataDirective extends SchemaDirectiveVisitor {
    private static readonly INTERNAL_APP = ["app1", "app2", "app3"];

    public visitObject(object: GraphQLObjectType): GraphQLObjectType | void | null {
        this.ensureFieldsWrapped(object);
    }

    public visitFieldDefinition(
        field: GraphQLField<any, any>,
        details: {
            objectType: GraphQLObjectType | GraphQLInterfaceType;
        }
    ): GraphQLField<any, any> | void | null {
        this.checkField(field);
    }

    private ensureFieldsWrapped(objectType: GraphQLObjectType | GraphQLInterfaceType) {
        if ((objectType as any).__scopeWrapped) {
            return;
        } else {
            (objectType as any).__scopeWrapped = true;
        }
        const fields = objectType.getFields();

        Object.keys(fields).forEach(fieldName => {
            const field = fields[fieldName];
            this.checkField(field);
        });
    }

    private checkField(field: GraphQLField<any, any>): void {
        const { resolve = defaultFieldResolver } = field;
        field.description = `&#x1f510; Internal Field. Only available for: ${InternalDataDirective.INTERNAL_APP.join(", ")}.`;
        field.resolve = async function(
            source: any,
            args: any,
            context: GraphQLCustomResolversContext,
            graphQLResolveInfo: GraphQLResolveInfo
        ) {
            if (
                context.req.appOrigin === undefined ||
                !InternalDataDirective.INTERNAL_APP.includes(context.req.appOrigin)
            ) {
                throw new Error(
                    `The field [${field.name}] has an internal scope and does not allow access for the application [${
                        context.req.appOrigin
                    }]`
                );
            }

            const result = await resolve.apply(this, [source, args, context, graphQLResolveInfo]);
            return result;
        };
    }
}

The directive converges the two entry points (field and object) into a single function. The two functions are called once when the class is instantiated by the GraphQL code at the startup of the server. It means that you cannot have custom logic in the visit functions. The dynamic aspect appends because we wrap the resolve of the field. It means that the actual resolution is executed but the code specified in “checkField” is also performed at runtime. In the code excerpt, we see that it checks for a list of accepted internal applications. If the field has the directive, it goes into the directive’s resolver and checks if the origin if from the list of accepted internal application. If not, it throws an error.

A little detail, it is possible to inject a description from the directive that is set on the initialization of this one. In my case, I specify that the field is private and mention which application can access it. If a software engineer needs an application to be on the list, it requires a code change. This is not something that happens often and because a code change is required it involves a pull request where many people will have a look.

Example of how it looks from the GraphQL Interactive Playground. The engineer who build the query knows that it is an internal field as well as under which application the response will return a value

Conclusion

The more I work with different organizations, code bases, and technologies, the more I lean toward simplicity. There is so many changes, so many ways to get very deep into subjects and so little time. Getting into complex solution often cause the maintainability a nightmare or make some people very dependent. The solution of a directive in GraphQL took less than 150 lines of code and can scale toward the entire graph of objects without having a dependency on a system to manage many schemas. The security of the information is preserved, the engineers that consume the graph are aware when building the query (description) and while executing the query (error), and the engineers building the graph can add the tag to the fields or types which take a few seconds without having to worry about the detail of the implementation.

My Other GrapHL Blog Posts

How to Pass Value to useCallback in React Hooks

UseCallback allows having the same reference for a call back function which has the benefit of only changing when something changes. This is primordial when callbacks are passed down the React component chain to avoid component to be rendered without an actual real reason causing performance issue. The following snippet of code shows a simple button that when clicked invokes an action that set the name to “test”. You can imagine that in a real scenario that the string would come from a real source instead of being hardcoded.

<button
  onClick={() => {
    dispatch(AppActions.setName("test"));
  }}
>

The action can often be handled without passing data, or by passing a React’s property and hence can access it from the handler of the action. However, in some cases, where the value is not accessible directly from the outer scope of the handler function, it means that we need to pass by parameter the value. I am reusing a Code Sandbox slightly modified to have the useCallback with a value passed down. The use of useCallback or simply the refactoring of the above snippet into a function that is not directly bound to the onClick is similar. We are moving the accessible scope. When the function is inline, the function can access anything that defined the button. It can be the React’s properties, or the “map” index if it was inside a loop or else. However, extracting the function out require some minor change to still have access to the value.

 const setName = useCallback(() => {
    dispatch(AppActions.setName("test"));
 }, []);

A quick change with React Hooks to produce the desired scenario is to use useCallback at the top of the component and access it directly in the onClick function callback.

<button onClick={setName}>Change name</button>

At the moment, it works. However, we are not passing any information. Let’s imagine that we cannot access the data directly from the useCallback, how can we still invoke this one?

const setName = useCallback((event: React.MouseEvent<HTMLButtonElement, MouseEvent>) => {
    dispatch(AppActions.setName("test"));
  }, []);

The idea is to have a callback that return a function that as on its turn the input event.

<button onClick={setName("Testing Name")}>Change name</button>

The invocation code change by passing the data. In that example, it is a string, but you can imagine that you are passing the index of the map function or data coming from a source inaccessible from the callback function.

  const setName = useCallback(
    (name: string) => (
      event: React.MouseEvent<HTMLButtonElement, MouseEvent>
    ) => {
      dispatch(AppActions.setName(name));
    },
    []
  );

My rule of thumb is that I do not need to have this convoluted definition if I am accessing directly properties of the component. Otherwise, I am passing the data needed. I always define the type, which gives me a good quick view about what is passed (name is a string and the event is a mouse event) without having to rummage the code. Here is the code sand box to play with the code of this article.

TypeScript Exhaustive Check your Reducer

A few weeks ago, I wrote about how to use React Hooks useReducer with TypeScript. The natural follow-up for many is to ensure that the set of action allowed is all served with the reducer. Not only it helps to tidy up the accepted actions by reducers when building the reducer, it also help ensuring during the lifetime of the reducer that the list of action remains up-to-date.

If we recall, the reducer is taking the state and the action. The action was typed to be a list of function that must be part of the AppActions. An utility type was used that allowed to union many set of action, but not used since we were using a single type. Nonetheless, everything was in place to ensure a flexible configuration of actions.

export type AcceptedActions = ActionsUnion<typeof AppActions>;
export function appReducer(
  state: AppReducerState,
  action: AcceptedActions
): AppReducerState {
  switch (action.type) {
    case ACTION_INCREASE_COUNT:
      return {
        ...state,
        clickCount: state.clickCount + 1
      };
    case ACTION_SET_NAME:
      return {
        ...state,
        activeEntity: { ...state.activeEntity, ...{ name: action.payload } }
      };
  }
  return state;
}

While we cannot add subjective case with action not defined in the AcceptedActions type, the weakness of the code is that we can remove one of the two cases without being noticed. Ideally, we would want to ensure that all actions are defined. In the case that an action is not anymore required that we would need to remove it from the list of action.

The solution require only a few amount of lines. First, you may already have have the core of the needed logic: an exhaustive check function. I have covered many months ago the idea of an exhaustive check in this article. In short, it is a function that should not be reached, when TypeScript found a logical path that can reach the code, the code will not compile.

export function exhaustiveCheck(check: never, throwError: boolean = false): never {
    if (throwError) {
        throw new Error(`ERROR! The value ${JSON.stringify(check)} should be of type never.`);
    }
    return check;
}

The use of reducer and TypeScript’s exhaustive check pattern is similar to what we would have done for checking if all values are covered on an Enum. The code needs to have a default case which we do not expect the code go fallthrough.

The two new lines:

    default:
      exhaustiveCheck(action);

Removing a required action cause TypeScript to go in the exhaustive check and since the function is marked to accept a never argument does not compile.

TypeScript catching the missing action

I have updated the original code sandbox. Click on the reducer.ts and try to remove on the of action.

In conclusion, the solution might not be ideal for you if you have all your actions into a huge function, or if you do not even group your action might not be even possible. However, grouping actions tidy up your code by having a better idea of what possible actions are expected in different domain of business your application handles. It is not much more work, and it self-document the code. The exhaustive check is an additional step to maintain order.