Resizing an Image with NodeJs

This is the second post about project of creating a search tool for local pictures. As mentioned in the first post, this tool needs to use a web service to get information about the picture. This mean we need to upload the image that Microsoft Cognitive Vision/Face service will analyze and return a JSON object with information about the picture. Like most service, there is some constraints in term of the minimum and maximum of the size of what you can upload. Also, even for us, we do not want to send a 25 megs picture when it is not necessary. This article discuss about how to resize picture before sending a request to the web service. This will not only allow to be withing the range of the acceptable values, but also speed up the upload.

I decide to take the arbitrary value of sending picture with the widest side of 640px. This produce in average a file 30kb which is tiny but still enough for the cognitive service to give very good result. This value may not be good if you are building something similar where people are far or if you are not using portrait pictures. In my case, the main subjects are always close range, hence very easy to get detail at that small resolution.

Resizing a file requires to use a third-party library. This is something easy to find with JavaScript and NPM has a library named “Sharp” that do it perfectly. The TypeScript definition file is also available, so we are in business!

npm install --save sharp
npm install --save-dev @types/sharp

Before anything, even if this project is for myself, I defined some configuration variables. Some rigor is required when it’s cheap to do! The three first constant is the maximum size we want to output the image. I choose 640 pixel. The directory name is the constant of the folder where we will save the image we will send and where we will late save the JSON file with the analysed data. We save the resized image because on the website later, we will use this small image instead of the full resolution image. The website will be snappy and since we have the file, why not using this optimization for free. At 30kb for 2000 images, we only use 58 megs. The last constant is the glob pattern to get all underscore JPEG pictures. We will talk about glob very soon.

const maxSize = 640;
const directoryName = "metainfo";
const pathImagesDirectory = path.join(imagesDirectory, "**/_*.+(jpg|JPG)");

The second pre-task is to find the images to resize. Again, this will require a third-party library to simplify our life. We could recursively navigate folders, but it would be nicer to have a singe glob pattern that handle it.

npm install --save glob
npm install --save-dev @types/glob

From there, we need to import the module. We will bring the path and fs module of NodeJs to be able to create proper path syntax and to save file on disk.

import * as g from "glob";
import * as path from "path";
import * as sharp from "sharp";
import * as fs from "fs";

The first function that we need to create is the one that return a list of string that represent the file to resize. This will be all our underscore aka best pictures. We want to be sure that we can re-run this function multiple times, thus we need to ignore the output folder where we will save resized images. This function returns the list in a promise fashion because the glob library is asynchronous. Here is the first version which call the module function “Glob” and add everything into an array while sending in the console the file for debugging purpose.

function getImageToAnalyze(): Promise<string[]> {
    const fullPathFiles: string[] = [];
    const promise = new Promise<string[]>((resolve, reject) => {
        const glob = new g.Glob(pathImagesDirectory, { ignore: "**/" + directoryName + "/**" } as g.IOptions, (err: Error, matches: string[]) => {
            matches.forEach((file: string) => {
                console.log(file);
                fullPathFiles.push(file);
            });
            resolve(fullPathFiles);
        });
    });
    return promise;
}

This can be simplified by just returning the matches string array and returning the promise instead of using a variable. At the end, if you are not debugging you can use :

function getImageToAnalyze(): Promise<string[]> {
    return new Promise<string[]>((resolve, reject) => {
        const glob = new g.Glob(pathImagesDirectory, { ignore: "**/" + directoryName + "/**" } as g.IOptions, (err: Error, matches: string[]) => {
            resolve(matches);
        });
    });
}

As mentioned, the quality of this code is average. In reality, some loves are missing around the error scenario. Right now, if something is wrong, the rejection promise bubble up.

At this point, we can call the method with :

console.log("Step 1 : Getting images to analyze " + pathImagesDirectory);
getImageToAnalyze()
    .then((fullPathFiles: string[]) => {
        console.log("Step 2 : Resize " + fullPathFiles.length + " files");
        return resize(fullPathFiles);
    })

The code inside the “then” is the one executed if the promise is resolved successfully. It will start resizing the list of pictures and pass this list into the function that we will create in an instant.

The resize function is not the one that will do the resize. It will call the function that does the resize only if the picture has not been yet resized. This is great if something happen to fail and you need to re-run. The resize function will check in the “metainfo” folder, where we output the resized picture and only resize this one if not present. In both case, this function return a promise. The type of the promise is a list of IImage.

export interface IImage {
    thumbnailPath: string;
    originalFullPathImage: string;
}

This type allows to have the detail about the full path of the thumbnail “resized” picture and the original picture. When we have already resized, we just create an instance, when we do not have an image we create this one and then return a new instance. This method waits all resize to occur before resolving. This is the reason of the .all. We are doing so just to have a clear cut before moving to the next step and since we are launching multiple resizes in parallel, we are waiting to have them all done before analyzing.

function resize(fullPathFiles: string[]): Promise<IImage[]> {
    const listPromises: Array<Promise<IImage>> = [];
    const promise = new Promise<IImage[]>((resolve, reject) => {
        for (const imagePathFile of fullPathFiles) {
            const thumb = getThumbnailPathAndFileName(imagePathFile);
            if (fs.existsSync(thumb)) {
                listPromises.push(Promise.resolve({ thumbnailPath: thumb, originalFullPathImage: imagePathFile } as IImage));
            } else {
                listPromises.push(resizeImage(imagePathFile));
            }
        }
        Promise.all(listPromises)
            .then((value: IImage[]) => resolve(value));
    });
    return promise;
}

This function use a function to get the thumbnail path to lookup if it’s been already created or not. This function call another one too, and both of these methods are having the same goal of providing a path. The first one, the getThumbnailPathAndFileName get the original full quality picture path and return the full image path of where the resized thumbnail is stored. The second one is a function that will be resused in some occasion and it gives the metainfo directory. This is where the resized picture are stored, but also the JSON file with the analytic data are saved.

function getThumbnailPathAndFileName(imageFullPath: string): string {
    const dir = getMetainfoDirectoryPath(imageFullPath);
    const imageFilename = path.parse(imageFullPath);
    const thumbnail = path.join(dir, imageFilename.base);
    return thumbnail;
}

function getMetainfoDirectoryPath(imageFullPath: string): string {
    const onlyPath = path.dirname(imageFullPath);
    const imageFilename = path.parse(imageFullPath);
    const thumbnail = path.join(onlyPath, "/" + directoryName + "/");
    return thumbnail;
}

The last method is the actual resize logic. The first line of the method create a “sharp” object for the desired picture. Then we invoke the “metadata” method that will give us access to the image information. We need this to get the actual width and height and do some computation to get the wider side and find the ratio of resizing. Once we know the height and the width of the thumbnail we need to create the destination folder before saving. Finally, we need to call the “resize” method with the height and width calculated. The “webp” method is the one that generate the image. From there, we could generate a buffered image and use a stream to handle it in memory or to store it on disk like we will do with the method “toFile”. This return a promise that we use to generate and return the IImage.

function resizeImage(imageToProceed: string): Promise<IImage> {
    const sharpFile = sharp(imageToProceed);
    return sharpFile.metadata()
        .then((metadata: sharp.Metadata) => {
            const actualWidth = metadata.width;
            const actualHeight = metadata.height;
            let ratio = 1;
            if (actualWidth > actualHeight) {
                ratio = actualWidth / maxSize;
            } else {
                ratio = actualHeight / maxSize;
            }
            const newHeight = Math.round(actualHeight / ratio);
            const newWidth = Math.round(actualWidth / ratio);
            const thumbnailPath = getThumbnailPathAndFileName(imageToProceed);
            // Create directory thumbnail first
            const dir = getMetainfoDirectoryPath(imageToProceed);
            if (!fs.existsSync(dir)) {
                fs.mkdirSync(dir);
            }

            return sharpFile
                .resize(newWidth, newHeight)
                .webp()
                .toFile(thumbnailPath)
                .then((image: sharp.OutputInfo, ) => {
                    return { thumbnailPath: thumbnailPath, originalFullPathImage: imageToProceed } as IImage;
                });
        }, (reason: any) => {
            console.error(reason);
        });
}

This conclude the resize part of the project. It’s not as straight forward as it may seem, but noting is space rocket science either. This code can be optimized to start resizing without having analyzed if all the image are present or not. Some refactoring could be done around the ratio logic within the promise callback of sharp’s metadata method. We could also optimize the write to remain in memory and hence having not to reload the thumbnail from the disk but working the on the memory buffer. The last optimization wasn’t done because I wanted every step to be re-executed what ever the state in which they were stopped. I didn’t wanted to bring more logic to reload in memory if already generated. That said, it could be done. The full project is available on GitHub : https://github.com/MrDesjardins/CognitiveImagesCollection

Create a Local Search Tool for Pictures in NodeJs

I recently searched for a specific picture of my daughter on my local drive with some issue. First, I am taking a lot of picture and it was hard to find. Second, I have some good pictures and some average, but I keep them all, hence I have thousand and thousand of picture that are not easy to find. However, I always had since 2003 got a systematic way to store my picture which is a main folder that contains one folder per year and every year has many folder per event. The event folder always have the format “yyyy-mm-dd-EventDescriptionIn2words”. I also have the habit to prefix the best pictures with an underscore inside these folders. Still, the picture name are always the sequential number of my camera and they are not consequent in time. There is no way I can search for “Alicia happy in red dress during summer 2015” for example.

Here come the idea that I started few weeks ago: having a training set of pictures that will serve as a base for the system to figure out who is in my picture and having a service that analyse what is in the picture. On top of the data, a simple website that let me query the database of pictures and return me the best match with a link to the actual full quality picture. Before going any further, a word of caution, the idea of this project is not to develop something that will scale, or a stellar code, hence the quality of the code is very average, but workable solution. Everything is developed with NodeJs, TypeScript, Microsoft Cognitive Api, MongoDb and doesn’t have any unit tests. I may refactor this project someday, but for the moment, let’s just get out head around how to do it.

I’ll write several posts around this project. In fact, at the moment I am writing this article, I have only done half way through the first phase which is analyzing a little subset of my picture. This article will serve more as a description of what will be build.

First thing we need to do is to read a sample of all the images. For me, instead of scanning and analyzing my whole hard drive for picture, I will analyze only picture between a specific range of date. At this date, I have 34 000 pictures taken since 2009 (since I met my wife) and in this population 2 000 have been identified with an underscore which mean that I really like them. For the purpose of having a smaller set of search and not having to analyze for too long time I will only use pictures with an underscore. Second, in these pictures, I can stay that roughly 75% of people are my wife, my daughter or me. Hence, I will only try to identify these people and mark others as “unknown”. Third, I want to be able to know the emotion and what is going on in the picture. This will require a third party service and I will use Microsoft Azure Cognitive API. I’ll get more in detail in the article about the api.

Once the picture will be analyzed, the data will be stored in a MongoDB, which is a JSON based storage. This is great because the result of all the analysis will be in a JSON format. It will allow us to query the content to get results to display in the website. To simplify this project, I will mark the first milestone as scanning the picture and create one JSON file per underscore file inside a “metainfo” folder. The second milestone will be to hydrate the MongoDB and the third one to create a simple web application that will communicate and display the result from MongoDB.

I’ll stop here for the moment. You can find the source code of the progress of this project in this GitHub repository : https://github.com/MrDesjardins/CognitiveImagesCollection

JavaScript Navigation Performance Understanding with Application Insights

Application Insights has a table with performance details. It’s called “browser timings”. You can have a glimpse of what it contains by executing

browserTimings
| where timestamp >= ago(1d)
| where totalDuration  > 10000
| order by totalDuration desc nulls last 

The most interesting columns are these following four:

  • networkDuration
  • sendDuration
  • receiveDuration
  • processingDuration

To understand what it means, you can look at Azure Application Insights documentation. It has a good image (https://docs.microsoft.com/en-us/azure/application-insights/media/app-insights-javascript/08-client-split.png), in short, Application Insights is categorizing the official processing model from the navigation timing. Nine steps is heavy and some steps are not directly the cause of the code. The segregation in 4 categories help to focus of where you should spend your time to fix your performance issue.

Reading the navigation timing can be confusing in term of what needs to be improved. Most of the time, you do not need to understand every step to improve the overall performance.

The first interesting column of the browsertiming table is networkDuration. This column includes 4 of the navigation timing processing step that include mostly the network call. It includes redirection, if the fetched resource result is an HTTP redirect (3xx). It also has DNS and TCP delays. What it means is that it contains all the time before users reach the Asp.Net code. It’s the time to translate the domain into an IP address, the time between different hops that separate the user and the machine that host the HTTP server and the time the http request move from the machine to the HTTP server (IIS, Apache, etc). This time tend to be huge if the HTTP server is sleeping. For example, if you release a new version on Azure Website and do not warm the server, the first hit will be slow. This will translate in Application Insights by having the networkDuration higher than usual. That is why, it’s always good to remove from the statistic very high time, let say 1 minute. That said, I currently experience very long query above 5 minutes from GoogleBot that need to be investigated.

browserTimings
| where timestamp >= ago(12d)
| where totalDuration  < 300000
| where networkDuration  < 10000
| order by totalDuration desc nulls last 

The second column is sendDuration. The starting point is when the browser starts sending the first byte, the ending point the server sent the first byte back to the browser. In other word, it is when the browser receives the whole response from the HTTP server. In other words, it’s the time spent in your Asp.Net MVC controller. If you want to isolate long request to identifier Asp.Net MVC Controller’s action problem, you can change the Application Insight query to order by the sendDuration and find all every duration above 2 seconds (or what ever is your desired maximum time on the server).

browserTimings
| where timestamp >= ago(12d)
| where totalDuration  < 300000
| where networkDuration  < 10000
| where sendDuration > 2000
| where url !startswith "http://localhost"
| where url !contains "azurewebsites.net"  
| order by sendDuration desc nulls last 
| project timestamp , url, sendDuration

While exploring the data, I realized that I was sending development localhost into Application Insights. Only on for the client side since the code was injected and didn’t take in consideration the C# flag “TelemetryConfiguration.Active.DisableTelemetry”. That is why you can a new clause that get rid of any development requests. Also, I am using multiple Azure’s slots which mean I want to remove the experimental slot too by removing “azurewebsites.net” from the data.

browserTimings
| where timestamp >= ago(12d)
| where totalDuration  < 300000
| where networkDuration  < 10000
| where sendDuration > 2000
| where url !startswith "http://localhost"
| where url !contains "azurewebsites.net"  
| order by sendDuration desc nulls last 
| extend urlWithoutNumber =  replace(@"([^?]+).*", @"\1", replace(@"([a-z0-9]{8})\-([a-z0-9]{4})\-([a-z0-9]{4})\-([a-z0-9]{4})\-([a-z0-9]){12}", @"x", replace(@"(\d)", @"x", url)))
| project timestamp , urlWithoutNumber, sendDuration

The third column is the receiveDuration which is the time it tooks to download the data from the server. This can be long if you send a big HTML back for example. You can lower this metric by having a single page application where most request download only the data and not UI details. This metric is important to keep low especially on mobile where the connection is slow and users have limited data plan.

The last and forth column is the processingDuration. This is the time it takes for the browser to render the received data. It contains several JavaScript events. domLoading, domInteractive, domContentLoaded (JQuery DocumentReady), domComplete, loadEventStart and loadEventEnd. Quick recaps of these rendering events.

  • domLoading: The document has been downloaded from the server, the browser is ready to work on it.
  • domInteractive: The browser parsed the HTML and built the DOM.
  • domContentLoaded: The CSSOM is built (browser analyzed CSS). Browsers is not blocked by any JavaScript.
  • domComplete: The browser doesn’t have anymore images or any resource to download.
  • loadEventStart and loadEventEnd: Browser rendered the DOM to the user.

It means that you can reduce the processingDuration by having simpler CSS and faster JavaScript code. You can get this information in Application Insights by showing the percentiles of the processingDuration. I added some filtering to reduce the amount of result.

browserTimings
| where timestamp >= ago(7d)
| where networkDuration < 5000
| where totalDuration > 5000
| where url !startswith("http://localhost")
| where url !contains("azurewebsites.net")
| extend urlClean = replace(@"([^?]+).*", @"\1", replace(@"([a-z0-9]{8})\-([a-z0-9]{4})\-([a-z0-9]{4})\-([a-z0-9]{4})\-([a-z0-9]){12}", @"x", replace(@"(\d)", @"x", url)))
| summarize 
      percentiles(processingDuration, 50, 95) 
    , percentiles(totalDuration, 50, 95) 
    by urlClean
| where percentile_processingDuration_50 > 2000
| order by percentile_processingDuration_50 desc nulls last 

It’s good to note that some external libraries can increase the processingDuration. This is especially true with Google Adsense or third party that download CSS/Font/Script and execute them on your page.

Application Insights BrowserTimings is very useful to get insight of what is going on on your webpage in term of performance and figure out where to optimize your code. To conclude, here is a recapitulative of the 4 mains property of the Application Insights BrowserTimings.

  • networkDuration = Contact the server, can be slow if you just deployed (on the first hit).
  • sendDuration = Time on the server (Asp.Net Controller code).
  • receiveDuration = Time for the browser to download the data from the server.
  • processingDuration = Time for the browser to draw the downloaded data to the UI for the user to see it.

Service Worker, Push Notification and Asp.Net MVC – Part 2 of 3 Server Side

In the part one, we saw how to register a service worker and how to handle incoming messages if the user is actively on the website. However, we didn’t touch how to send a message through Google Firebase to receive the message. In this article, I’ll show how to send a message from an Azure Webjob, written in C#. This is a common scenario where you have a backend job running and executing some logics that needs to have the user to an action. Since the user may or may not be on the website (or the wrong page), the push notification is great to indicate that something must be done. The other big advantage is that the push notification with Google Firebase offers an almost instant messaging service. Within few milliseconds, the message goes from the server to Google Firebase server to the service worker that will use the push notification API of the browser to display the message.

The first thing, is to define a generic contract with an interface. I decided to create a simple one that return a simple boolean to indicate if the message sent is a success or a failure. The method signature allows to pass the “to” token, which is the unique identifier of the user for Firebase (the token saved from the Ajax call in the part 1). The remaining parameters are self explanatory with the title, message and url when the user click the notification.

public interface IPushNotification
{
    bool QueueMessage(string to, string title, string message, string urlNotificationClick);
}

The implementation is also very simple. It relies on the REST endpoint of Google Firebase.

public class GoogleFirebaseNotification:IPushNotification
{
    public bool QueueMessage(string to, string title, string message, string urlNotificationClick)
    {
        if (string.IsNullOrEmpty(to))
        {
            return false;
        }
        var serverApiKey = "SuperLongKeyHere";
        var firebaseGoogleUrl = "https://fcm.googleapis.com/fcm/send";

        var httpClient = new WebClient();
        httpClient.Headers.Add("Content-Type", "application/json");
        httpClient.Headers.Add(HttpRequestHeader.Authorization, "key=" + serverApiKey);
        var timeToLiveInSecond = 24 * 60; // 1 day
        var data = new
        {
            to = to,
            data = new
            {
                notification = new
                {
                    body = message,
                    title = title,
                    icon = "/Content/Images/Logos/BourseVirtuelle.png",
                    url = urlNotificationClick
                }
            },
            time_to_live = timeToLiveInSecond
        };

        var json = JsonConvert.SerializeObject(data);
        Byte[] byteArray = Encoding.UTF8.GetBytes(json);
        var responsebytes = httpClient.UploadData(firebaseGoogleUrl, "POST", byteArray);
        string responsebody = Encoding.UTF8.GetString(responsebytes);
        dynamic responseObject = JsonConvert.DeserializeObject(responsebody);

        return responseObject.success == "1";
    }
}

The first piece of puzzle is to use the right server api key. It’s under the Firebase console, under the setting’s cog and under the Cloud Messaging.

The remaining of the code is configuring the WebClient. You must use a specific content-type to be json. The second header that must be defined is the authorization key. This is where you set the cloud messaging server key. Finally, we setup the data from the signature. Some information are hardcoded like the icon to display as well as the time that we want Firebase to hold the message if the user doesn’t have a browser to collect the message. The last step is to retrieve the response and looks to see if the message got delivered successfully to the Firebase’s server.

When using with a webjob, you just need to use this implementation and pass the desired parameters. You can get from the token from a new column create in the AspNetUsers table and define a specific title and description depending of what the user must do.

Application Insights Build Version on all telemetry

Something very interesting is to know which version was affected by a telemetry. This is good for custom events, traces and very interesting for exceptions. However, adding this information on every calls are redundant and not clean. That is why, Application Insights allows you to add a telemetry initializer.

A telemetry initializer is a a piece of code that is executed when a telemetry starts. There is two steps to make it works. First, create a class that inherit ITelemetryInitializer. Second, register the class to Application Insights.

To accomplish the goal of having the system version in every telemetry, let’s create a class that will add in Application Insights’ context a property named BuildVersion. I place this class in my website project which allows me to grap the assembly version. Indeed, you need to use the AssemblyInfo.cs file and its versions on every release to have this method to work.

    public class AssemblyVersionInitializer : ITelemetryInitializer
    {
        public void Initialize(Microsoft.ApplicationInsights.Channel.ITelemetry telemetry)
        {
            telemetry.Context.Properties["BuildVersion"] = this.GetType().Assembly.GetName().Version.ToString();
        }
    }

The next and final step is to use this class.

public class MvcApplication : System.Web.HttpApplication
{
   protected void Application_Start()
   {
       TelemetryConfiguration.Active.TelemetryInitializers.Add(new AssemblyVersionInitializer());
   }
}

From there, what ever you add or not custom properties, it will always have also the BuildVersion one. The goal of having this BuildVersion is primary to see difference in your telemetries between version. You can clearly identify if a problem is resolve or is created between version. You can also see if the performance goes worse. However, this only work if you release often since Application Insights retention of the information if very limited with most of the data is restricted to 7 days (or 14 days).

Application Insights How to Handle undefined Custom Dimension property

Application Insights is awesome. It allows to query your system for events that you define. For example, when your user login you could create a new event to send data to Application Insights and then query it to know how many of your user did a valid login and how many failed.

In C#, you can have something that record if the login was successful and if not giving a reason why. This could be “wrong login” or “account not validated” or “too many tentative” etc. However, when the login is successful we do not need reason.

public void SendLogin(bool isValidLogin, string reason = "")
{
            var properties = new Dictionary<string, string>
                    {
                         {"IsValidLogin" , isValidLogin.ToString()}
                        ,{"LoginDetail" , reason}
                    };
            this.telemetry.TrackEvent("LoginRequestSuccess", properties);
}

The end result desired is a graph that gives the number of successful login, and the number of failed tentative by reason. The challenge is that the customDimension property of a success will not have any reason. A solution is to check if the custom dimension is defined, which mean null, and assign a temporary string for detail. By assigning a string, we can group by this detailed reason and then by time to spread the result on a time x-axis.

 customEvents
| where name == "LoginRequestSuccess"
| where timestamp >= ago(14d)
| extend d=parsejson(customDimensions)
| extend isValidLogin = d.IsValidLogin
| extend detail = iff(isnull(d.LoginDetail), "Okay", tostring(d.LoginDetail))
| project detail, timestamp 
| summarize count(detail) by detail, bin(timestamp, 1d)
| order by timestamp asc 

The important line if the one that extend detail. This on-the-fly column is getting the login detail which is provided when the login fail. Since it’s not provided when success, we do a check with isnull. If it is null, we set a temporary string, otherwise, we cast the provided login detail. The cast is required because the custom dimension is a dynamic type, not a string. The iff must return the same type for each condition. The first one is a hard-coded string, thus, the second must be a string.

How my new users couldn’t create new account

In the past, when you had your own VPS or server and needed to send email, you had two main options. If you web site was small with low requests you could have the web server send the email directly with Smpt, and if you had something with more volume could just queue the request and have a job doing it. You were mostly independent and could handle when to send the email.

These days, with cloud infrastructure, your web server is doing a single task and sending email is not done through a basic Smpt, but it uses an email service. A very popular one is SendGrid. I am specifying this one because if you are using Microsoft Azure, you have 25k email free per month. An to be honest, their integration is very simple and efficient with a nice C# Api. If you used to just create a new email account in your cPanel and use Smpt, you cannot even find any similitude with Azure — you must use a third-party tool to send email (or create your own email server in a VM). That said, the point is that you depend on an external service to send email like many other services that you depend on. Your code business develop dependency.

Few time ago, we wanted to send email to the last 15k active users of the old system to welcome them to the version 2 of the system. A webjob queried the last 15k users per login date to get the emails and sent the email with SendGrid. Withing few minutes SendGrid suspended my account. While this is unfortunate because emails for this good news couldn’t be sent, what was drastically bad was that any new account couldn’t validate their account : a validation link is sent by email. For more than 24 hours no new customer could reach the system.

The reason the account got suspended is that that emails were bouncing. That means that email couldn’t reach inbox of user. It can be because the email is invalid, the mailbox is full, the email doesn’t exist, the server is down, the message is too big, etc. It’s hard to have a valid list of email if initially the system allowed to create without having an email validation. It’s also hard if once validated that you allow the user to change without validating again.

SendGrid create a list of all bounced emails. I plan to be pro-active and do a database cleanup of those accounts. A good approach is force user to modify their email on their next login with a validation email to be sent. After x number of weeks, if the user didn’t change it, we can delete the account. Regarding any new users, their account is validated on creation, as well when they modify the email in their profile. We allow them to simply be removed from the email list which make them more comfortable to keep their real address in their account.

What could have been done to avoid that situation? This is very hard because their is not way to know if an email is a real one or not without sending at least once. I think a better approach could have been to send in batch of 20 emails every 10 minutes for few days instead of sending 15k emails to SendGrid. I took for granted that the service was dispatching email and throttling them if the request was too big.

Why Microsoft needs to do something about Azure Website and Https certificate

Websites are going Https as a normal protocol now. It’s secure and faster than it was. Even Google gives a push in their ranking algorithm to websites that are https. With Azure, the free and easy solution is to use LetsEncrypt. It’s a company that give for free certificate with a 3 months live before it expires. Afterward, it can be renewed. However, it’s a cumbersome task. Microsoft Azure had a top rated UserVoice asking to bring Https on Azure and since LetsEncrypt provided something convenient and that someone on the web created an Azure extension to setup it up, Microsoft closed this need as “done”. It’s not a surprise to see that now, the top ranked request in Azure UserVoice is to have something that doesn’t rely on a third party.

While LetsEncrypt is doing a great job, the problem is the free Azure Extension. It’s maintained by someone outside Microsoft and it’s not as ready, follow or is compatible with other Azure features. While the best solution would be to just have to use a checkbox to activate Https, at least a solution that is not broken should exist.

Some may have problem during setup — I did not. It went all smooth and took less than 1 hour. The problem started few weeks after the installed. First, it doesn’t support Azure Slot. It means that every time you switch between slot, something break. For example, switching slot break the jobs that are responsible to renew the certificate in 3 months. I learned that the hard way by having an email from LetsEncrypt few days before the certification expired.

Second, how to debug what this extension is doing is not clear and documented. It’s pretty black-boxed. You can end up having multiple certificated installed for the same DNS. That is what happen to me when I reinstalled the third-party Azure Extension. And, even if I had many new certificated installed, I was still receiving from LetsEncrypt notice about that my certificate was about to expire.

At the end, if you are using Azure Deployment Slot and the Azure Extension for LetsEncrypt, you will have to manually to install again the certificate. It’s not a big deal, but it’s not working “as advertised”. I do not understand how come Microsoft simply closed this first UserVoice request without evaluating that implications. The current proposed solution is broken with how Azure’s features and not maintained by anyone active (nothing has changed in the last 3 months).

Application Insight to get Popular Pages even if Multiple ids in URL

Application Insight is very powerful system that let you collect telemetry on your website. It’s a service on Azure, which is currently free with the limitation to keep data for 7 days and aggregated data for 90 days.

One interesting thing you can do, without having to configure anything else than using Application Insight with your website, is to query the pageViews table. It allows you to get information about each page requested by user. However, if you are using an application you may have url with ID. For example http://yourwebsite.com/user/12345/data/group/567/. The problem is that you will get result with noise. The same page has variables in the url. A simple fix is to replace any integer. Here is a code you can use on your own data which replace integer with a ‘x’ character.

 pageViews
 | extend replaced_operation_ParentName = replace(@'(\d+)', @'x', operation_Name)
 | where timestamp >= ago(7d) 
 | summarize c = count(replaced_operation_ParentName) by replaced_operation_ParentName
 | where c > 1
 | order by c desc 

The result look like this:

urlapplicationinsight

This is pretty fast, with less than 1 sec I got all the result. If you are not using integer in your url, but Guid, you can change the Regex to replace the pattern of a Guid.

Improving your Azure Redis performance and size with Lz4net

The title is a little misleading. I could rewrite it has improving Redis performance by compressing the object your serialized. It’s not related to Azure Redis particularly, neither to Lz4net which is a way to compress. However, I have learn that compression is improving Redis on Azure recently. It helps in two different ways. First, the Redis server and the website/webjobs needs to send the information to the Redis server. Having a smaller number of bytes to send it always faster. Second, you have a limit of size depending of the package you took on Azure with Redis. Compressing can save you some space. That space vary depending of what you cache. From my personal experience, any object serialized that takes more than 2ko gains from compression. I did some logging and the I have a reduction between 6% and 64% which is significant if you have object to cache that are around 100ko-200ko. Of course, this has CPU cost, but depending of the algorithm you use, you may not feel the penalty. I choose Lz4net which is a loose-less, very fast compression library. It’s open source and also available with Nuget.

Doing it is also simple, but the documentation around Lz4net is practically non-existent and Redis.StackExchange doesn’t provide detail about how to handle compressed data. The problem with StackExchange library is that it doesn’t allow you to use byte[] directly. Underneath, it converts the byte[] into a RedisValue. It works well for storing, however, when getting, the RedisValue to byte[] return null. Since the compressed data format in an array of bytes, this cause a problem. The trick is to encapsulate the data into a temporary object. You can read more from Marc Gravell on StackOverflow.

private class CompressedData
{
	public CompressedData()
	{
		
	}
	public CompressedData(byte[] data)
	{
		this.Data = data;
	}
	public byte[] Data
	{
		get; private set;
	}
}

This object can be serialized and used with StackExchange. It can also be restored from Redis, uncompressed, deserialized and used as object. Inside my Set method, the code looks like this:

var compressed = LZ4Codec.Wrap(Encoding.UTF8.GetBytes(serializedObjectToCache));
var compressedObject = new CompressedData(compressed);
string serializedCompressedObject = Serialization.Serialize(compressedObject);
//Set serializedCompressedObject with StackExchange Redis library

The Get method do the other way around:

string stringObject = //From StackExchange Redis library
var compressedObject = Serialization.Deserialize<CompressedData>(stringObject);
var uncompressedData = LZ4Codec.Unwrap(compressedObject.Data);
string unCompressed = Encoding.UTF8.GetString(uncompressedData);
T obj = Serialization.Deserialize<T>(unCompressed);

The result is really stunning. If you look at my personal numbers from a project that I applied this compression you can see that even for 5ko objects we have a gain.

RedisSize

For example, the 50th percentile with has a 23ko size for one key. This one go down by more than half when compressed. If we look at the 95th percentile we realize that the gain is even more touching 90% reduction by being from 478ko to 44ko. Compressing is often the critic of being bad for smaller object. However, i found that even object as small as 6ko was gaining by being reduced to 3ko. 35ko to 8ko and so on. Since the compression algorithm used is very fast, the experience was way more positive than impacting the performance negatively.