Patrick Desjardins Blog
Patrick Desjardins picture from a conference

How to Transfer Files Between Computers Using HDMI (Part 6: Instruction and Black and White)

Posted on: 2023-06-14

Last time, we determined that the original idea of relying on a insignificant character to fill the frame is not great. It causes non-text content to break as soon as the selected character can be found in the middle of the content. The solution is to reserve space at the beginning of the first frame (after the red frame) to inject the number of byte to retrieve. Then, the rest of the frame can be filled with null character to have a full video frame.

Instruction Header

The video frame structure has now a function to inject into the frame the instruction:

pub fn write_instruction(&mut self, instruction: &Instruction, size: u8) -> (u16, u16) {
    let mut instruction_index = 0;
    let mut x: u16 = 0;
    let mut y: u16 = 0;
    'outer: for i in (0..self.frame_size.height as u16).step_by(size as usize) {
        for j in (0..self.frame_size.width as u16).step_by(size as usize) {
            if instruction_index < 64 {
                let (r, g, b) = get_rgb_for_bit(
                    instruction.relevant_byte_count_in_64bits[instruction_index],
                );
                self.write(r, g, b, j, i, size);
                x = j + size as u16;
                instruction_index += 1;
            } else {
                break 'outer;
            }
        }

        y = i + size as u16;
    }
    if x == self.frame_size.width as u16 {
        x = 0; // y is already increased
    }
    return (x, y);
}

The function works well with the size meaning that if each data is set to be stored into a bigger size than 1 pixel, the instruction will also be biggest. The instruction stores its data into the first 64 spaces. If we use a size of 1, it takes 64 pixels of the first frame of the whole video. With that information on hand, we can store the remaining data after. At extraction time, once the red frame is found, we can extract the first 64 pixels (if size 1) and know how many byte of data to extract.

The solution works well!

Loseless

Most video frame were written using Open CV using the VideoWriter::fourcc('p', 'n', 'g', ' '). Using this fourcc make the code work well locally to inject data and extract locally. However, when it is time to transfer using the capture card and the HDMI cable it does not work. Colors are shifted with R, G and B lossing their original value. Also, the format does not let me open the file on the source computer with VLC. However, using the black and white algorithm with VideoWriter::fourcc('a', 'v', 'c', '1') worked well locally even if the format is not loseless. Mostly because each color are not taken the value straigh. It always round it up to the most black or white. Because the range is from 0 to 255, it gives some space to imperfection that we can easily put back.

pub fn get_bit_from_rgb(rgb: &Vec<u8>) -> bool {
    let sum: u32 = rgb.iter().map(|x| *x as u32).sum();
    sum >= (255_u32 * (rgb.len() as u32) / 2)
}

The function shows that it takes a collection of color (u8 because 0 to 255) and make the sum and divides. So if the color is 10,10,40 instead of 0,0,0 the sum is 60. Then, divided by the amount of color. In that case three. It gives 20. Because 20 is < than half of the to 126 it is considered black.

Video Encoding

To be more faitful to the transmission format, the video encoding is now dynamically chosen depending of the algorithm.

let fourcc = if options.algo == options::AlgoFrame::RGB {
    VideoWriter::fourcc('p', 'n', 'g', ' ')
} else {
    VideoWriter::fourcc('a', 'v', 'c', '1')
};

That being said, the RGB solution is and will never be suitable to transfer data without rethinking to a mechanism to bring resiliency on data corruption which happen massively with video compression.

Testing Locally

Injecting a zip file of 13.3 megs produces a video file of 47.1 megs in 5 seconds. The video length is 1 second.

cargo run --release -- -m inject -i outputs/test2.zip -o outputs/test2zip.mp4 --fps 30 --height 1080 --width 1920 --size 1 -p true -a bw

Extracting the zip file takes 15 seconds.

cargo run --release -- -m extract -i outputs/test2zip.mp4 -o outputs/test2_extract.zip --fps 30 --height 1080 --width 1920 --size 1 -p true -a bw

Testing with Capture Card and HDMI

The first test was a picture of 2 megs. I started using 15 fps and pixel size of 5. The slowest fps was to ensure that more duplicate would be there to avoid missing frame. I would still read at 30 fps on the target computer. The size of 5 is also to ensure the video well received. The source computer with VLC couldn't play the acv1 format and I had to use the mp4v fourcc. More investigation is needed to understand the reason. The produced video size is 104 megs and has 13 seconds length.

However, I see in the terminal when closing ffmpeg using q this message:

real-time buffer too full or near too full frame dropped!

Also, during the extraction process, the instruction gives zero byte. The pixel colors were all black. Maybe the two are related? Adding -rtbufsize to ffmpeg resolved the error message but the data could not be more extracted.

ffmpeg  -r 30 -f dshow -s 1920x1080 -vcodec mjpeg -rtbufsize 200M -i video="USB Video" -r 30 out1.mp4

The next step was to ensure we are not missing data on the frame. To verify this hypothesis I decided to start with creating a small video that do a square of size pixel of different color. It looks like a rainbow from the outside skirt to the inside. Then, passing the video into the capture card and comparing to see if some colors were missing on any edge. I performed few tests with different size without finding anything clear.

Then, I decided to perform a second test. This time, using diagonal of 10 pixels on each corner. By using a bigger size than 1 pixel for each square, I can count the square. I did with 1 pixel to ensure the integrity since using bigger size for each square is easier to count but might have the corner pixel missing few pixels.

After several different configurations, I confirmed that what is recorded by the ffmpeg was full frames. Thus, the last explanation was that reading the frames, especially the one containing the instruction, was faulty. Could the frame with the instruction be missing? Maybe frames are not all captured causing a stream of video with few gaps which are not perceptive visually but are critical to form a comprehensible message back.

Conclusion

In this section we saw that adding instruction and using black and white frame work well locally but then again using an HDMI cable with a capture card cause some issues like not being able to decrypt the size of the data. We validated that every pixel projected into the capture card is showing up in the produced video in the computer that read the video. In the next post, we will see how to mitigate the issue by analyzing what is happening by creating an additional test and changing the assumption that we capture every frame.