A simple, small-ish network for producing embeddings for black and white binary images. Takes a 32x32 drawing a produces a 64-dimensional embedding.
You can see this in action on https://huggingface.co/spaces/JosephCatrambone/tiny_doodle_embedding
Input Format:
The model expects a (b, 32, 32) float32 input, generally with 0.0 being "background" and 1.0 being "foreground", similar to MNIST.
The model is trained with QuickDraw data, and image data being justified to the top-left corner (0,0), so when using the model take steps to align images to the top-left.
Output:
Given a batch of (b, 32, 32), the model will produce a normalized (b, 64) matrix of floats.
Sample usage:
import onnxruntime as ort
import numpy
ort_sess = ort.InferenceSession('tiny_doodle_embedding.onnx')
def compare(input_img_a, input_img_b):
img_a = process_input(input_img_a) # Crop and resize the input image so it's binary and fits in a 32x32 array.
img_b = process_input(input_img_b)
a_embedding = ort_sess.run(None, {'input': img_a.astype(numpy.float32)})[0]
b_embedding = ort_sess.run(None, {'input': img_b.astype(numpy.float32)})[0]
sim = numpy.dot(a_embedding , b_embedding.T) # Or a_embedding @ b_embedding.T
Training Details:
This model was trained on images taken from the Google QuickDraw dataset, rasterized to 32x32 binary images. Augmentations were basic, consisting of noise and an occasional dilation.
The model was trained for 100 epochs on a consumer-grade nVidia 3090.
Details of the run are visible at https://wandb.ai/josephc/tiny_doodle_model/runs/7wqz4w7g?nw=nwuserjosephc
Power Use and Environmental Considerations:
The model consumed 120W for a duration of 570 minutes for training the final version. Excess heat from the training process was used to heat the home of the author in place of gas heating.