Why do machine learning models fail in production?

There are three important aspect here. Let me explain you one by one

  1. Precaution at the time of model design.
  2. Wrong way of measuring model results
  3. genuine model issue

Let me elaborate all of them one by one.

  • Precaution at the time of model design

If some one has taken care to

  • Not select effect variables as cause variables
  • Not let his data get affected by seasonality (unless the model is also supposed to predict performance during seasonal period – like toy sales in christmas)
  • Not let his data get affected by extreme effects (like taking data for a place, and let say a month which has got hugely impacted due to tornado and hence no facility in place to collect / deposit money )
  • Remove multi collinearity
  • Select all significant variables only
  • Select those variables only which is consistent in out of time validation data

then there is very little chance of his machine learning model performing poorly in production.

  • Wrong way of measuring model results

At times, people do wrong measurement of model impact. For example, if you a develop an acquisition model (say range 0 – 999) and then put into production and applying a cut off of 600, the model will not show the kind of KS which you have got at the time of model development.

There are two ways of handling this.

  1. One method is that you let 1- 2 % population come without this score and measure impact on data. This will give you correct measurement of KS but you are also incurring some cost of having such data. Because some bad prospects are not getting declined.
  2. Other method is that you measure model performance (KS) above cut off in development data and compare it against the model performance in production data
  • Genuine model issue

if you have taken care of the above two steps and still your model is not performing then there are multiple possibilities

  1. If population has changed and it is no more the same population, who were earlier – you need ti develop new model
  2. if there were some issue as mentioned in above two headers – choice is clear, rectify it
  3. If population is highly volatile and every month it is changing – then drop the idea of having a machine learning model. It won’t work

Leave a Reply

Your email address will not be published. Required fields are marked *