Building Learning Machines

05 Feb 2024

Know your goal

When starting to build a ML algorithm, you have to know the answer to “what is success?”. There are likely some parameters that limit your project such as how compute intensive training the network is, how compute intensive running inference is and what types of input and output data structures you expect from the network. Ideally, you also have a “validation” and “test” set that accurately maps to the types of inference tasks that you expect to use with the network and a reasonable idea of what accuracy threshold is necessary for the algorithm to be “useful.” This could constitute a strong baseline, likely comprised of how one would approach the problem with the current available tools.

How to engineer learning

I think there’s a few key practices we can learn from software engineering about building learning systems.

Build your learning system

Your learning system probably has three moving parts; data, network architecture and loss function.

The model is smarter than you, but you should be learning too

Ideally, in a good learning system both you and the network should be learning. The network will likely (hopefully?) outperform you at the task once fully trained but you should learn: 1. Details about the data. How does the data affect how the network learns? 2. What the most important components of the network are, where is most of the learning happening? 3. What are the numerical or mathematical properties of the loss that affect the network? Are there certain numerical instabilities in training or does the loss function need to preserve certain symmetries for the network to learn?

Throughout this post, I have described how I think about organizing machine learning codebases to optimize improvements but I am sure there is a lot that I am missing or have not learned yet! Feel free to reach out if you have any suggestions or want to talk about testing or building ML research codebases!