I can’t find a link to the paper but wanted to bookmark the idea for myself – here’s an idea for a novel mitigation to the problem of an imbalanced class when training a classification model. The idea is: for each training step, whenever you’re about to backprop gradients, normalize the gradient values at each layer to make their values closer. Basically an inverted batchnorm. What this should do is bias answers toward the minority classes; whatever local minima is settled upon as the final model, it will then be more likely to trigger one of the minority classes.

I can imagine a more sophisticated version of this allowing for some parameters that tune how much to bias each of the outcomes.

EDIT: Found the paper.