Create a Local Search Tool for Pictures in NodeJs

I recently searched for a specific picture of my daughter on my local drive with some issue. First, I am taking a lot of picture and it was hard to find. Second, I have some good pictures and some average, but I keep them all, hence I have thousand and thousand of picture that are not easy to find. However, I always had since 2003 got a systematic way to store my picture which is a main folder that contains one folder per year and every year has many folder per event. The event folder always have the format “yyyy-mm-dd-EventDescriptionIn2words”. I also have the habit to prefix the best pictures with an underscore inside these folders. Still, the picture name are always the sequential number of my camera and they are not consequent in time. There is no way I can search for “Alicia happy in red dress during summer 2015” for example.

Here come the idea that I started few weeks ago: having a training set of pictures that will serve as a base for the system to figure out who is in my picture and having a service that analyse what is in the picture. On top of the data, a simple website that let me query the database of pictures and return me the best match with a link to the actual full quality picture. Before going any further, a word of caution, the idea of this project is not to develop something that will scale, or a stellar code, hence the quality of the code is very average, but workable solution. Everything is developed with NodeJs, TypeScript, Microsoft Cognitive Api, MongoDb and doesn’t have any unit tests. I may refactor this project someday, but for the moment, let’s just get out head around how to do it.

I’ll write several posts around this project. In fact, at the moment I am writing this article, I have only done half way through the first phase which is analyzing a little subset of my picture. This article will serve more as a description of what will be build.

First thing we need to do is to read a sample of all the images. For me, instead of scanning and analyzing my whole hard drive for picture, I will analyze only picture between a specific range of date. At this date, I have 34 000 pictures taken since 2009 (since I met my wife) and in this population 2 000 have been identified with an underscore which mean that I really like them. For the purpose of having a smaller set of search and not having to analyze for too long time I will only use pictures with an underscore. Second, in these pictures, I can stay that roughly 75% of people are my wife, my daughter or me. Hence, I will only try to identify these people and mark others as “unknown”. Third, I want to be able to know the emotion and what is going on in the picture. This will require a third party service and I will use Microsoft Azure Cognitive API. I’ll get more in detail in the article about the api.

Once the picture will be analyzed, the data will be stored in a MongoDB, which is a JSON based storage. This is great because the result of all the analysis will be in a JSON format. It will allow us to query the content to get results to display in the website. To simplify this project, I will mark the first milestone as scanning the picture and create one JSON file per underscore file inside a “metainfo” folder. The second milestone will be to hydrate the MongoDB and the third one to create a simple web application that will communicate and display the result from MongoDB.

I’ll stop here for the moment. You can find the source code of the progress of this project in this GitHub repository : https://github.com/MrDesjardins/CognitiveImagesCollection

Using TypeScript, React and WebPack

I created an open source project to bootstrap TypeScript and React few months ago. You can see the first article about TypeScript/React/Gulp before this article. It wasn’t bundling the code, and was using Gulp which is in mid-2017 not the preferred tool to package JavaScript code. At this moment, Webpack is the most popular tool allowing to do everything Gulp or Grunt was doing but avoiding to rely on the middle man of having Gulp’s package (or Grunt’s package) to invoke the actual library. Webpack is also very smart in term of exploring the code and figure out dependencies. This article will focus to migrate from Gulp to Webpack for a TypeScript and React project.

First of all, we need to change index.html. The file was using RequireJs and was not referring to any bundles. The change is dual. We need to remove RequireJs. We will use Webpack to handle to load dependencies between modules. We also need to refer to bundles.

Before:

<html>
    <head>
        <title>TS + React Boilerplate v1.01</title>
    </head>

    <body>
        <div id="main"></div>
    </body>
    <script src="vendors/requirejs/require.js"></script>
    <script src="vendors/jquery/jquery.js"></script>
    <script>
        requirejs.config({
            //Every script without folder specified before name will be looked in output folder
            baseUrl: 'output/',
            paths: {
                //Every script paths that start with "vendors/" will get loaded from the folder in string
                vendors: 'vendors',
                jquery: '../vendors/jquery/jquery',
                react: "../vendors/react/dist/react",
                "react-dom": "../vendors/react-dom/dist/react-dom"
            }
        });
        //Startup file
        requirejs(['file1']);
    </script>

</html>

After:

<html>
    <head>
        <title>TS + React Boilerplate v1.01</title>
    </head>
    <body>
        <div id="main"></div>
    </body>
    <script src="vendorbundle.js"></script>
    <script src="appbundle.js"></script>
</html>

RequireJs’ configuration and the startup file are gone. The complexity will move into Webpack’s configuration file that we will see soon. So, at this point, you see that we won’t be using AMD. This mean that we need to build our TypeScript to use something else. We will use CommonJS.

{
  "compilerOptions": {
    "sourceMap": true,
    "target": "es6",
    "module": "commonjs",
    "outDir": "./deploy/output",
    "jsx": "react",
    "noImplicitAny": true
  },
  "exclude": [
    "node_modules",
    "**/*.spec.ts"
  ]
}

since Webpack will read the EcmaScript syntax used in each file of each module, it will transpile in CommonJS format. The JavaScript produced is read by Webpack and this one will bring all the file into a single one (bundle). This remove the need to load asynchronously (like AMD) was doing.

Changing to CommonJS made the code to require a change. If you want to load a relative to the file that want to import a module, it needs to start with “./” instead of directly the name. For example, you won’t be able to write :

import { Component } from "component1";"

but

import { Component } from "./component1";

The next change was around JQuery. The file that was using JQuery didn’t had any reference to JQuery, but now we explicitly mention the library.

import * as $ from "jquery";

Before going in Webpack configuration, we had with AMD a lazy loading file that was loading and using a specific file after 2 seconds. This is to simulate the “load on-demand” files that we may want not to load initially. The scenarios are multiple. This can be justify because the user is rarely using the feature, hence no need to load this one. This can also be that the user doesn’t have the authorization to do this kind of action, thus no need to load code that won’t be used.

Here is the AMD solution we had before:

import foo = require("folder1/fileToLazyLoad");
export class ClassB {
    public method1(): void {
        console.log("ClassB->method1");
        setTimeout(() => {
            requirejs(["folder1/fileToLazyLoad"], (c: typeof foo) => {
                const co = new c.ClassC();
                co.method1();
            });
        }, 2000);
    }
}

The first line is to tell TypeScript the type we want to lazy load. It was using “require” which we will still use. This time, we can use the relative path. So far, not much as change. However, we can see that we were using requirejs directly inside the timer. This time, we will use CommonJS and load the module. It’s almost the same thing — using a different library.

import foo = require("./fileToLazyLoad");
export class ClassB {
    public method1(): void {
        console.log("ClassB->method1");
        setTimeout(() => {
            System.import("./fileToLazyLoad").then((c: typeof foo) => {
                 const co = new c.ClassC();
                 co.method1();
            });
        }, 2000);
    }
}

We are at the point where bigger change will occurs. The change start with NPM module that we need to use. As we saw, we can remove RequireJS from the dependencies list. We also need to bring many libraries for Webpack, loader and utility library to clean and move files. Here is the complete list of dependencies:

"devDependencies": {
    "@types/express": "^4.0.35",
    "@types/jquery": "^2.0.46",
    "@types/react": "^15.0.26",
    "@types/react-dom": "^15.5.0",
    "@types/systemjs": "^0.20.2",
    "awesome-typescript-loader": "^3.1.3",
    "copyfiles": "^1.2.0",
    "del-cli": "^1.0.0",
    "express": "^4.15.3",
    "file-loader": "^0.11.2",
    "html-webpack-plugin": "^2.28.0",
    "source-map-loader": "^0.2.1",
    "typescript": "^2.3.4",
    "webpack": "^2.6.1",
    "tslint": "^5.4.2"
  },
  "dependencies": {
    "jquery": "^3.2.1",
    "react": "^15.5.4",
    "react-dom": "^15.5.4"
  }

The next step is that we won’t use Gulp to invoke action. If we want to delete previous generated deploy files, build TypeScript or run the server, we need to use the CLI of each of the tool we are using. We could use directly the TypeScript’s CLI, named tcs, and use xcopy to move file, etc. The problem is that it is not easy to remember. NPM allows to have custom script which can be invoked with “npm use ABC” where “ABC” is the name of your script. Here is the script we need to add to replace the Gulp tasks we had.

  "scripts": {
    "clean": "del-cli deploy/**",
    "package": "./node_modules/.bin/webpack --config webpack.config.js --display-error-details",
    "copy": "copyfiles -u 1 ./app/index.html ./deploy",
    "build": "del-cli deploy/** &amp; SET NODE_ENV=development &amp; webpack --config webpack.config.js --display-error-details &amp; copyfiles -u 1 ./app/index.html ./deploy/",
    "server": "node bin/www.js",
  },

The “clean” script use the “del-cli” to delete the deployment folder. This could be use a native Windows or Linux command, but using this tool allows to be cross platform. The “package” allows to run webpack with a specific configuration file and for debugging purpose to display verbose detail. The principle is the same for all others script. The “build” one is using the clean + package + copy. So, in practice, you should use only build and server.

The final step is to configure Webpack. This is done in the file webpack.config.js. You can rename it the way you want, you just need to specify it in the webpack command after –config. Webpack can use many NPM package to accomplish its job. For sure, you need at least the “webpack” package.

var path = require('path');
var webpack = require('webpack');

Webpack needs to have an entry point. In the example we are working on, the main file is the one that was used in the index.html with “requirejs([‘file1’]);”. This time, we do not have any indication in the HTML file. However, Webpack needs one, or many, entry point and will navigate through all the dependencies to make the main bundle.

module.exports = {
    entry: {
        app: "./app/scripts/file1.tsx"
    },

Entry may be the entry point, Output will be where the bundles goes. The filename uses the square bracket which will be replaced by the key provided by each entry point. In our example, we have “app” in the entry, and we will have a file produced with the name “appbundle.js” in the output. The directory is provided at the “path” property, which in that case is “deploy”.

    output: {
        path: path.resolve(__dirname, 'deploy'),
        filename: '[name]bundle.js'
    },

The next configuration is to tell which extension Webpack should care of. For us, it’s TypeScript

    resolve: {
        extensions: ['.ts', '.tsx', '.js', '.jsx']
    },

This property allows to have source map. This give the possibility to debug TypeScript in Chrome on the real individual file even if it’s bundled.

    devtool: "source-map",

Webpack work with rules. Every “test” evaluate a condition to execute a loader. We use two different ones. One to call the TypeScript loader that will transpile TS and TSX file. The second one is to generate source map. Even if we have the devtool to provide source map, we need a central place to handle third-party JS library too.

    module: {
        rules: [
            { test: /\.tsx?$/, loader: "awesome-typescript-loader" },
            { enforce: "pre", test: /\.js$/, loader: "source-map-loader" },
        ]
    },

The last piece is a plugin called CommonsChunkPlugin. It comes from Webpack and its role is to great additional bundle. In that case, we create a bundle name “vendorbundle.js”. The minChunk can be a number of time we see the reference to join the bundle. For example, if we have a module that we use often and that we want them to be in a common bundle, we can say “5” and if more than 5 modules reference than instead of being in the “app” one, it would go in this one. For us, we want to have all vendors module, which mean they are from node_modules directory. To do so, the minChunks allows to pass a function. When it contains the node_modules in the path, it goes into that bundle instead of the main one (app).

    plugins: [
        new webpack.optimize.CommonsChunkPlugin({
            name: "vendor",
            filename: "vendorbundle.js",
            minChunks: function(module) {
                return module.context &amp;&amp; module.context.indexOf('node_modules') !== -1;
            }
        })
    ]
};

You can find the exact code from this commit in GitHub. In this article, we saw how to remove Gulp to use Webpack. Not only it removes dependencies to Gulp and Gulp’s packages, it also bring a powerful tool to bundle smartly. The next step will be to bring an auto-reload when the code change to have TypeScript compile automatically to get JS deployed.

JavaScript Depth First Traversal

Traversing a tree by going in one path as down as possible is possible recursively or with an iteration. Let’s first create a node function that will hold the information of each node.

var node = function(v, c) {
  return {
    value : v,
    children : c
  };
};

var node5 = node("node5");
var node4 = node("node4");
var node3 = node("node3");
var node2 = node("node2", [node5]);
var node1 = node("node1", [node3, node4]);
var graph = node("root", [node1, node2]);

The graph is simple, and is the same that we used in the breadth first approach. The algorithm to traverse the tree doesn’t need any structure since we will call recursively every children when we find one. This has the advantage to not carry a stack with us, but the disadvantage that we may want to have additional variable to not go too deep in one direction.

function depthFirstRecursive(node, callback){
  callback(node);
  if(node.children){
    for(var i = 0 ; i < node.children.length; i++){
      var child = node.children[i];
      if(!child.hasBeenVisited){
        depthFirstRecursive(child, callback);
      }
    }
  }
}

depthFirstRecursive(graph, function(node){
  console.log(node.value);
});

This output : Root, Node1, Node3, Node4, Node2, Node5

The iterative approach is similar. However, we loop from the length of the children array to zero. We do this if we want to have the exact same output of the recursive one. Without loop in reverse, we still do a depth first but from the right side of the tree first instead of the left side.

function depthFirstIterative(root, callback){
  var stack = [];
  stack.unshift(root);
  while(stack.length > 0)
  {
    var node = stack.shift();
    callback(node);
    if(node.children){
      for(var i = node.children.length - 1; i >= 0; i--){
        var child = node.children[i];
        if(!child.discovered){
          child.discovered = true;
          stack.unshift(child);
        }
      }
    }
  }
}

depthFirstIterative(graph, function(node){
  console.log(node.value);
});

We are using unshift to push any new child at the beginning of the array (stack) and get the data from the beginning of the array to (it’s a stack!).

JavaScript Breadth First Traversal

Traversing a tree is not a task you have to do often, but it’s still a good exercise to practice moving around a tree. Breadth first traversal consist of navigating one level at a time, visiting each siblings before moving to children.

Breadth first requires the use of a queue for it’s iterative and recursive approach. This is a big different from is opposite depth first traversal approach who use only a stack for the iterative and doesn’t need to hold any values for its recursive algorithm.

Let’s start with defining a structure that represent a node.

var node = function(v, c) {
  return {
    value : v,
    children : c
  };
};

As you can see, each node has a value and a children collection. The name could be more significant, but let’s put that aside and create a tree.

var node5 = node("node5");
var node4 = node("node4");
var node3 = node("node3");
var node2 = node("node2", [node5]);
var node1 = node("node1", [node3, node4]);
var graph = node("root", [node1, node2]);

The tree is simple. It has a single root element that has 2 childrens : node 1 and node 2. Node a has two children : node 3 and 4. If we move back to the root, the second children, node 2, has only a single child : node 5.

function breadthFirstRecursive(queue, callback){
  if (queue.length === 0){
    return;
  }
  var node = queue.shift();
  callback(node);
  if(node.children){
    for(var i = 0 ; i < node.children.length; i++){
      var child = node.children[i];
      if(!child.hasBeenVisited){
        child.hasBeenVisited = true;
        queue.push(child);
      }
    }
  }
  breadthFirstRecursive(queue, callback);
}

The algorithm starts by looking at the queue to be sure that we still have something to proceed. If that’s the case, we take the first element from the left. This is important because children will be pushed in the queue and that we do not want to visit them until we are done with all siblings of the level. The last piece of code is the invocation where we put the root in the queue to be proceeded.

var queue = [];
queue.push(graph);
breadthFirstRecursive(queue, function(node){
  console.log(node.value);
});

This output : Root, Node1, Node2, Node3, Node4, Node5

The iterative approach is similar with the same output if we provide the same input.

function breadthFirstIterative(root, callback){
  var queue = [];
  queue.push(root);
  while(queue.length > 0)
  {
    var node = queue.shift();
    callback(node);
    if(node.children){
      for(var i = 0; i < node.children.length; i++){
        var child = node.children[i];
        if(!child.discovered){
          child.discovered = true;
          queue.push(child);
        }
      }
    }
  }
}

breadthFirstIterative(graph, function(node){
  console.log(node.value);
});

The recursive call is replaced by a loop that will proceed the queue until that this one is empty.

JavaScript Insertion Sort

Insertion sort name is confusing. It won’t swap in the same manner that the bubble sort or the selection sort did, however, it won’t insert new element in the array neither. It will save the value to position in a variable and go down the array to find the proper place to swap the value. During the progress of going down the array, many swapping can occur.

The idea is : Position at the beginning of the array by moving previous value up and finally swap once reach the correct spot.

function insertionSort(arrayToSort){
  var key;
  var length = arrayToSort.length;
  for (var i = 1; i < length; i++)
  {
    key = arrayToSort[i];
    var j = i-1;
    while (j >= 0 && arrayToSort[j] > key)
    {
      arrayToSort[j+1] = arrayToSort[j];
      j = j-1;
    }
    arrayToSort[j+1] = key;
  }
}

Insertion sort is O(n^2) with its double imbricated loop.

JavaScript Quick Sort

Quick sort divide the array in multiple sequence, hence is a very good candidate for recursivity. The asymptotic analysis gives a O(log(n^2)) in average and a worse time of O(n^2). The algorithm can be implemented in different flavor depending of where you start the initial pivot. Then, there is a second way to customize the quick sort and it’s by defining how we partition.

function swap(array, left, right)
{
    var temp = array[left];
    array[left] = array[right];
    array[right] = temp;
}

function partition(arrayToSort, left, right){
  var i = left;
  var j = right;
  var pivot = arrayToSort[Math.floor((left + right) / 2)]; // Middle

  while (i <= j) {
    while (arrayToSort[i] < pivot){
      i++;
    }
    while (arrayToSort[j] > pivot){
      j--;
    }
    if (i <= j) {
      swap(arrayToSort, i, j);
      i++;
      j--;
    }
  }

  return i;
}

function quickSort(arrayToSort, left, right){
  var index = partition(arrayToSort, left, right);
  if (left < index - 1){
    quickSort(arrayToSort, left, index - 1);
  }
  if (right > index){
    quickSort(arrayToSort, index, right);
  }
}

The algorithm divide the array in two (pivot) and go to each extremity of a partition which is between left to right where left or right is determined by the pivot. Most of the heavy-lifting is done by the partition function that will swap element if this one found that a number at the right of the pivot is bigger than one of the left side of the pivot. The algorithm goes from swap until the two extremity touch and then move back to the other side of the partition. Every call to quickSort function bring a new partition which divide the array in smaller and smaller array to partition.

The choice of the pivot will change the way the algorithm performs. The most in the middle, the best it is. This is something that we cannot know since this would require to traverse the whole array. This is why we can randomly choose a value, or in that case, taking the element in the middle.

The partition goal is to ensure that everything before the pivot is smaller than the pivot. By going recursively, at some point everything is sorted.

My First Two TypeScript 2.x Video Courses are Available

Early this year, I got contacted to record two courses by Packt. The two courses are around a simple TypeScript web project. The first course is entitled “Rapid Web Application Development with TypeScript 2.0” and the second course is entitled “Building Pro Web Apps with TypeScript 2.x”. This was the first time I had to record myself while coding and it was quite a challenge!


The first course shows how to setup your development environment to use TypeScript as well as how to organize your code and create a basic web application with TypeScript. I am touching briefly Gulp, NodeJs and NPM. Just enough to be ready to focus on TypeScript full speed. I am revisiting many concepts about object oriented that TypeScript provides without explaining the very basic stuff.


The second course goes way deeper with new TypeScript’s features that came with version 2 and expand the capability of the web application previously built in the first course. We explores types, mixin, decorators and many more async new features, etc.

These two courses represent a final product of about 3 hours of high quality content. The amount of work behind these 3 hours is 4 months where more than half of my evenings were dedicated to these courses. Between organizing the content, preparing the code, building the slides, recordings the audio and video, and applying the touch up from the technical reviewer and the editor, time was flying! I hope that everyone who take the course will enjoy the content and how it is presented.

Here is the link to the course 1 : Rapid Web Application with TypeScript 2.x
Here is the link to the course 2 : Building Pro Web Application with TypeScript 2.x

JavaScript Bubble Sort

The bubble sort is a O(n^2) without optimization and in the worse scenario. It’s a simple algorithm that traverse the array once completely and every subsequent pass traverse the array less and less.

On the first pass, it compares every element to bubble to the last position of the array the biggest number. On the second pass, it compares to bubble up the second biggest element to the before last position and so on. The bubble sort rely on swapping values.

The idea is : find the biggest one, put it at the end.

function swap(array, left, right)
{
    var temp = array[left];
    array[left] = array[right];
    array[right] = temp;
}
 
function bubbleSort(arrayToSort){
  var length = arrayToSort.length;
  for (var i = 0; i < length-1; i++){
    for (var j = 0; j < length-i-1; j++){
      if (arrayToSort[j] > arrayToSort[j+1]){
         swap(arrayToSort, j, j+1);
      }
    }
  }
}

A potential optimization would be that if no swap is done in the inner loop that we completely stop both loop. This can happen if the array is getting sorted before needing to more all elements. For example, imagine an array all sorted expect the first element which. The first loop needs to be executed once to bubble that value at its place, on the second loop, every value will in their right position, hence no swap. It means that we do not need to loop anymore. Here is the optimization.

function bubbleSort(arrayToSort){
  var length = arrayToSort.length;
  for (var i = 0; i < length-1; i++){
    var swapped = false;
    for (var j = 0; j < length-i-1; j++){
      if (arrayToSort[j] > arrayToSort[j+1]){
         swap(arrayToSort, j, j+1);
         swapped = true;
      }       
    }
    if(swapped === false){
      i = length;
    }
  }
}