Don’t forget to understand the at-inference usage profile

If you are deploying your algorithm as a microservice endpoint it’s worth thinking about how often and when it will be called. For typical software applications you may well expect a steady request rate. Whereas, for many machine learning applications it can be called as part of a large batch process leading to bursty volumes where there are no requests for five days then a need to handle 5 million inferences at once. A nice thing about using a walking skeleton (Create a Walking Skeleton/ Steel Thread) is that you get an early understanding of the demand profile and can set up load balancing for appropriate provisioning.

PreviousUse machine learning judiciously NextDon’t make it difficult for a data scientists to access data or use the tools they need

Last updated 3 years ago