The authorization and authentication in GraphQL

Implementing a service that covers many systems might be frightful in term of data exposition. While wandering the Internet for resources about GraphQL and security, we often see cases where the security is not paramount — it is easier to ignore security. The reality is that most corporation that desire to bring GraphQL will also need to secure the information. This article sheds light on how I approached the security in my implementation of GraphQL at Netflix for Open Connect.

Prelude

In an ideal world, we have the GraphQL server getting the authorization from the corporate security infrastructure and the GraphQL delegate downstream to the actual data source the responsibility of returning an error. For example, if the data source is a REST API, the token of the user is used for the call, hence can return an HTTP code 401 and GraphQL will handle the error. However, in maybe the GraphQL exposes some internal services that were secured only by the premise that it was under a VPS (virtual private server). Meaning that not validation is actually performed. In that case, some custom code is required by the executioner of the service: GraphQL. Another case could be that the security is by a specific group of entity (organization, user, etc) meaning that you do not want user A to access user B information. Again, a strong security model would perform the validation at the source (Rest, SQL views, etc) but in the real world, it is not always the case. To mitigate the possibility of security issues among the sea of services that was cover in my scenario, the security was placed in GraphQL. Meanwhile, further modification in the data sources could be planned and put in place without compromising the delivery of the GraphQL server.

Exploring the possibilities

One strength of GraphQL is the flexibility. The flexibility nature remains true for security and it opens many doors to where to secure the service. As mentioned earlier, the NodeJS server that host Apollo is behind Apache. The reason is that at Netflix, we have many tools integrated within Apache to secure the user like single-sign-on, OpenID Connect and OAuth 2.0. The Apache module is great for authentication but not for authorizing. It does check if the user is one that can access the GraphQL but does not discriminate on which information the user can consult.

Flow of the request from user to services that contain the information

Apache gives information about the user and provides the additional HTTP headers to NodeJS. The first stop is a custom NodeJS Express middleware. The middleware is a piece of code executed in each request. The middleware check if the user is a Netflix employee with a specific grant right. If that is the case, it marks a particular field in an enhanced request object to signal the user as “authorized for full access.” The idea is to avoid every future validation that can be performance costly. This shortcut works well because the Apache module can be trusted with its information. It might not work well in your organization, thus do your homework. The next stop is at the GraphQL context. In my case, I have no validation to do at that level because I did the check in the NodeJS Express middleware. However, if you are not using NodeJS, it would be the place to do HTTP request checks. However, I instantiate a secure object at that level that contains functions that check particular lists of objects. The lists of objects are specific ids of secured entities that the user has access. Then, the context performs a query on specific backend services to fetch what objects ids the users can access. The list goes in the request. The idea is to have before the user reaches the resolver a well-defined list of authorized entities and ids.

It is possible to perform at the resolver checks, but the problem is that if you are not querying for the field that contains the ids that the user can access that you will not have the value available. For example, if a user can only access the data of the organization that he/she belongs and that the user requests for the organization by id for its name then you could block. But, if the user request for a sub-entity, for example a contact and then in the query’s tree the name of the organization, without the organization id, then the resolver cannot check if the organization’s data belong or not to the authorized ids.

Finally, the place I found the best to handle authorization is at the data loaders level where every HTTP requests to service are performed. Upon reception of a query, the data is examined to check if the payload contains information from the entities we want secure. If the response contains an entity that does not belong to the user, an exception is thrown and GraphQL bubble up the exception to the resolver who initiated the request. GraphQL handles the exception properly and your message is given to the user.

Conclusion

The solution is not ideal because it requires the user to have an additional call, per service, to have a list of entities and ids. I opted to have the GraphQL cache all the entities-ids per user in a server cache (not a request cache) for few minutes. The solution has a flaw that the request is still performed to the service. The reason is the lack of transparency from an entity B on entity A before getting the data. Nonetheless, it secures because the response does not go beyond NodeJS, it is not cached either. These loops are required because of weakness at the leaf of the architecture: the service that has access to the data. As a reminder, even if you are building an internal service that is secured by a network, it is always better to not rely on that infrastructure and to perform database checks. The future is never known, infrastructure change, security change, a potential consumer of the information evolve, and we never know when something will be exposed. For future resiliency and for an optimal defense: always authorize at the source.

My Other GraphQL Articles

How to consume GraphQL in TypeScript and React

In this article I’ll cover two different approach concerning rendering information coming from a GraphQL server in React while being strongly typed.

Approach 1: The normal fetch

The first approach is the one I prefer because it is not intrusive. It is very similar of to do a normal REST Ajax call. It can be swapped from an existing fetch command that is using not GraphQL without having to change much. The approach is frontend framework agnostic, meaning that you can use it outside React. Furthermore, this approach works well in a middleware if you are using Redux, but can be directly used in you custom “service” layer or when a component is mounted — you choose.

The first step is to get some NPM packages. In this approach, I am using two packages related to GraphQL: the Apollo-boost and the graphql-tag.

import ApolloClient, { ApolloQueryResult } from "apollo-boost";
import { Proto } from "autogenerated/octoolstypes";
import gql from "graphql-tag";
import React from "react";

You then need to create an ApolloClient which is the library that will perform the query. See the ApolloClient as an HTTP request library, similar to Axios. It lets you specify the URL to perform the POST HTTP request, but also custom headers. For example, I am specifying a bearer token to have the request authenticated.

this.client = new ApolloClient({
    uri: "/api/graphql",
    headers: {
        authorization: `Bearer ${this.props.token}`
    }
});

The next step is to have the query executed. This is where you can swap your existing fetching code with the ApolloClient.

(async () => {
    try {
        const result: ApolloQueryResult<Proto.Query> = await this.client.query<Proto.Query>({
            query: gql`
                query Proto {
                    org(orgId:orgIdParam) {
                        name
                    }
                }
            `
        });
        if (result.data.org &amp;&amp; result.data.org.name) {
             // Do something with result.data.org.name
        }
    } catch (error) {
        console.error(error.networkError.bodyText);
    }
})();

That’s it. Very simple, easy to understand because it is similar to most fetching pattern. The ApolloQueryResult takes a generic type which is the format of the gql. It means that when building for the first time the query that you might not want to explicitly specify the type because it does not exist. As discussed in a previous article, there is tool that will scan the gql query and generate the type for you.

Approach 2: Strongly bound to React

The second approach is my less favorite. I favor the first one because I like separating my view from my logic. It is easier to unit test, and the separation of concern make the code easier to understand and refactor. Nonetheless, the option is available and worth exploring.

The second approach use an Apollo component that will wrap the application. It is similar to the React-Redux provider. It takes a single property which is the client, that is the same as provided in the first approach.

public render(): JSX.Element {
    return (
        <ApolloProvider client={this.client}>
            <ConnectedRouter history={history}>
                <AppRouted />
            </ConnectedRouter>
        </ApolloProvider>
    );
}

The difference is that because we have the client connected in the ApolloProvider that any React component underneath has access to the possibility of querying by creating a Query component.

<Query<Proto2.Query>
    query={gql`
        query Proto2 {
            org(orgId: orgIdParam {
                name
            }
        }
    `}
>
    {queryResult => {
        if (queryResult.loading) {
            return "Loading";
        }
        if (queryResult.error) {
            return "Error";
        }
        if (queryResult.data === undefined || queryResult.data.org === null) {
            return <p>None</p>;
        }

        return (
            <div>
                <h5>Name:</h5>
                <p>{queryResult.data.org.name}</p>
            </div>
        );
    }}
</Query>

Once again, the Query is strongly typed by using the gql defined in the query parameter of the query component. Until the type is generated, which mean that the gql query has been picked up by the script that build the TypeScript file, the query cannot be explicitly typed. However, once it done, any future change will catch on. This approach has the query executed every time the container of the Query component is rendered which mean that future optimization, like having a caching policy specified in the ApolloClient might be a good thing.

Wrap-up

There are more than two ways to bind GraphQL data into your React application while having the benefice of being strongly Typed. The first approach is more agnostic, the second embrace React with a component to handle a query. As long as you are comfortable, either does the job. I recommend using the former approach because it is flexible without having your fetching code dependent on React but otherwise, the end result of a strongly typed code is the same.

GraphQL Posts

GraphQL Extension to Collect HTTP and Resolvers Telemetry

The more my graph was increasing, the more resolvers I built. The more resolvers I had, the more data loaders I created, the more HTTP endpoints I had to call and so on. The complexity of the code was rising as well as the possibility to have a catastrophic amount of HTTP requests to be performed by GraphQL. The amount of log by using the console was not only polluting the source code by having a line in each resolver and HTTP method but also couldn’t make it easy to collect the data for future analysis or to emergency cancellation of a GraphQL query if the amount of HTTP rise beyond a reasonable threshold.

The idea I had was to see if there is a way to hook into GraphQL to collect the data at a higher level instead of being at each function. I am using the Apollo library and the answer is yes! There is an experimental feature named “extension”. The documentation is dry and does not explain much but with some breakpoint and Visual Studio Code, I was able to build a simple extension in a few hours. Here is the result.

A GraphQL Request with its HTTP requests

The first requirement is to create the extension. It must inherit the GraphQLExtension class which is generic. The generic type needs to be your context type. Having access to the context is primordiale because it gives access to your user information. I assume that once authenticated the user is set in the context. Also, it gives you the possibility to inject statistic into the context throughout the exploration of the graph. In my case, I am using the context to collect information at my data service layer where all the HTTP requests are performed.

export class DevConsole<TContext extends ApolloConfigurationContext> implements GraphQLExtension<TContext>

There is two importants function that you can override. The first one is “requestDidStart” which is invoked when a GraphQL request is made. It means that when a context is created and it that case a “request” is the GraphQL request and not all HTTP request performed by GraphQL during that GraphQL’s request. The function return a function. The return function is invoked when the request is completed. It is the place to output into the console all your statistic or to send the information to your telemetry system.

public requestDidStart(o: {
    request: Pick<Request, "url" | "method" | "headers">;
    queryString?: string;
    parsedQuery?: DocumentNode;
    operationName?: string;
    variables?: { [key: string]: any };
    persistedQueryHit?: boolean;
    persistedQueryRegister?: boolean;
    context: TContext;
    requestContext: GraphQLRequestContext<TContext>;
}) {
    console.log("New GraphQL requested started");
    return () => {
        const yourContextData = o.context
        console.log("Your data can be sent here");
    };
}

The second function is named “willResolveField” and is executed at every node of your graph. You can know the parent field and your field.

willResolveField(source: any, args: { [argName: string]: any }, context: TContext, info: GraphQLResolveInfo) {
    // Info has the field name and parentType.name
    return (error: Error | null, result: any) => {
        // ...
    };
}

The next step is to plug the extension into Apollo GraphQL. The property “extensions” of ApolloServerConfig takes an array of extension.

const serverConfig: ApolloServerConfig = {
    // ...,
    extensions: [() => new DevConsole()]
};

I am using the RESTDataSource of Apollo as well to perform all HTTP request. The abstract class has the GraphQL’s context. If you override the function willSendRequest, you can then have a global hook into all HTTP request. In that function, I am populating a property of my context that I named “stats” where I am storing each URL and parameters.

public willSendRequest(request: RequestOptions) {
    const stats = this.context.stats;
    // push data in the stats object
}

The documentation could be better in term of defining all the possibility. Nevertheless, with some breakpoints and a little bit of time I was quite satisfied to see what can be done. You can do much more than what proposed, for instance, you can calculate the payload size and get more insight into what is not efficient. For example, I realized in the process of creating another exception that was throwing an exception when a maximum threshold of HTTP request was reached was still letting GraphQL continue its traversal causing the exception to occurs on every future node. The solution was to clean up the response of redundant error using the overload function willSendResponse.

Bonus: Getting the size of the payload

There is a lot of possibility to get a clear view of what is happening behind the scene for maintainers and developers of the GraphQL service. The detail does not matter directly for the consumer who care about getting the data fast and accurately in the format requested and not about the complexity. However, the complexity if not well exposed to the team who own the service can be costly in term of resources. It is very easy to fall into the trap of assuming that everything goes as expected in a world where every GraphQL request is unique and can surface a scenario not well covered.

Previous GraphQL Posts

How to automatically generate TypeScript for consumer of your GraphQL

One strength of GraphQL is that it has types. In an ideal world, the consumer of the GraphQL would receive the payload strongly typed. The reason that it is not the case is natural. The data that is moving from the server to the client who invoked the query receives a JSON format that does not have any notion of type. However, the client who call the GraphQL know exactly which fields he wants to consume and GraphQL maps each queried fields to a type.

Objectives

The idea of this article is to let the user who call the GraphQL server to get a tailored object that is only for the field requested but also strongly typed for the field needed. It means that if the user request field A, B, and C of an object that has A, B,C, D, E, that the type that will be automatically generated will be only with A, B, C. While it might look that the generated type is not representing the real object, it reflects exactly the data that can be used. There is no need to have a generated object that you could not use.

Constraints

Many articles discuss about introspection and how types can be generated by having the consumer connect directly to GraphQL server to fetch the schema. I had the constraint that I could not do it for security reason, hence I had to expose the schema by the server into another way and have the consumer uses the generated merged schemas.

Server Steps

On the server side, one step is required: exposing the schemas into a unified schema (file) that can be shared. By generating the type every time the server is compiled, the schema remains in synchronization. Hence, we do not lose the freshness of being connected directly to GraphQL.

I created a script in the package.json that execute a TypeScript script.

"generateunifiedschema": "ts-node src/scripts/mergeAllGraphqlSchemas.ts",

The code in the TypeScript file is simple — really simple. Here is the whole code:

import fs from "fs";
import { fileLoader, mergeTypes } from "merge-graphql-schemas";
import path from "path";
const documentTypeDefs = fileLoader("src/**/*.graphql");
const mergedSchema = mergeTypes(documentTypeDefs);
const typesPath = "output_types";
const schemaPath = path.join(typesPath, "allSchema.graphql");
fs.writeFileSync(schemaPath, mergedSchema);

It uses the library merge-graphql-schemas and search for all individual schema that is spread inside the project. The result goes outside in a folder. That’s it. In my case, that generated file is available internal at Netflix but not exposed outside. It means that every developers that consume the GraphQL service can download the schema, create query in their TypeScript code and use the unified schema to generate their code.

Client Steps

Each consumer will have to do few steps. First, they need to download the unified schemas. Second, they need to write their GraphQL query and use gql. The gql allows to have a library that will scan the TypeScript file for gql (GraphQL Query) and will generate the appropriate TypeScript definition for the response data.

Download Unified Schema

The download step is a matter of using curl. I create a simple bash file:

curl -X GET \
  http://retracted/allSchema.graphql \
  -H 'accept: text/plain, */*' \
  --silent \
  --output allSchema.graphql

Query the GraphQL

The next step is to write the client code to query the data. There is many ways to do it, I’ll demonstrate one but will not go in detail in this article.

async () => {
            try {
                const result: ApolloQueryResult<Proto.Query> = await this.client.query<Proto.Query>({
                    query: gql`
                        query Proto {
                            org(orgId: 0) {
                                name
                            }
                        }
                    `
                });
                if (result.data.org &amp;&amp; result.data.org.name) {
                    this.setState({
                        longName: result.data.org.name
                    });
                }
            } catch (error) {
                console.error(error.networkError.bodyText);
            }
        }

At that point, your may wonder what is “Proto.Query” type and how come the result is strongly typed. The “Proto.Query” is the type that will be generated from the query specified in the client.query. The gql contains a query name which is translated into a TypeScript’s namespace. It is important to name each query differently because of potential collisions. In my case, the org entity has only a single field, but later I might request more which will generate another org. A good way is to not call it “Proto” but something more relevant. For example, “Financial” or “Inventory” depending on the usage of the entity. Still, how do we generate the object.

Generate the TypeScript type

The generation is done by using a library named graphql-code-generator which scan the TypeScript file for gql and uses the unified schema (or could work by directly connecting to the GraphQL) and output in a specified folder the type. It means that the first time you are writing the gql, you should not strongly type the ApolloQueryResult, then will have access to the type. Then, every change in the type will mend the existing type which is a great experience. For example, removing in the gql a field that is being used in TypeScript will make the code not compile. The “graphql-code-generator” library has a bin that you can use in your package.json to read the codegen.yml file with your custom configuration. I added two scripts. One that analyzes the TypeScript files and generates and another one that I use while developing who constantly check in the background to generate types.

"generatetypes": "gql-gen --config graphql/codegen.yml",
"generatetypes:watch": "gql-gen --config graphql/codegen.yml --watch"

The codegen.yml has the schema and document configuration. At first, I was confusing about these two terms. The documentation is dry on the differentiation. The schema configuration is about where to get the GraphQL schema, this is the downloaded unified GraphQL schema from the server. The document is which TypeScript file to analyze (could be JavaScript file as well).

schema: "./graphql/**/*.graphql"
documents: "./src/**/*.+(tsx|ts)"
overwrite: true
generates:
  ./src/autogenerated/octoolstypes.ts:
    config: {}
    plugins:
      - "typescript-common"
      - "typescript-client"
require: ['ts-node/register']

It uses few plugins, hence you need to get the NPM packages accordingly.

"graphql-code-generator": "0.16.0",
"graphql-codegen-add": "0.16.0",
"graphql-codegen-typescript-client": "0.16.0",
"graphql-codegen-typescript-common": "0.16.0",
"graphql-codegen-typescript-server": "0.16.0"

The graphqlcodegen will check and generate the code. Unfortunately, it is slow. Even on a small code base, the process can take 10-15 seconds. Nonetheless, you have your TypeScript type generated for you and it will adapt with the backend (GraphQL server) changes.

Conclusion

Having the type generating automatically from the backend is fascinating and is efficient. If your business domain is rich in quantity of entities (interfaces) and if many of them have several fields and complex interconnection than having the type generated instead of typing it manually is a relief.

Related GraphQL Articles