Focal Loss was introduced by Lin et al

Con this case, the activation function does not depend in scores of other classes con $$C$$ more than $$C_1 = C_i$$. So the gradient respect onesto the each conteggio $$s_i$$ durante $$s$$ will only depend on the loss given by its binary problem.

• Caffe: Sigmoid Ciclocampestre-Entropy Loss Layer
• Pytorch: BCEWithLogitsLoss
• TensorFlow: sigmoid_cross_entropy.

## Focal Loss

, from Facebook, per this paper. They claim sicuro improve one-stage object detectors using Focal Loss to train per detector they name RetinaNet. Focal loss is per Cross-Entropy Loss that weighs the contribution of each sample sicuro the loss based durante the classification error. The pensiero is that, if per sample is already classified correctly by the CNN, its contribution puro the loss decreases. With this strategy, they claim to solve the problem of class imbalance by making the loss implicitly focus in those problematic classes. Moreover, they also weight the contribution of each class preciso the lose con per more explicit class balancing. They use Sigmoid activations, so Focal loss could also be considered per Binary Ciclocross-Entropy Loss. We define it for each binary problem as:

Where $$(1 – s_i)\gamma$$, with the focusing parameter $$\varieta >= 0$$, is verso modulating factor puro scampato the influence of correctly classified samples mediante the loss. With $$\tipo = 0$$, Focal Loss is equivalent preciso Binary Ciclocampestre Entropy Loss.

Where we have separated formulation for when the class $$C_i = C_1$$ is positive or negative (and therefore, the class $$C_2$$ is positive). As before, we have $$s_2 = 1 – s_1$$ and $$t2 = 1 – t_1$$.

The gradient gets per bit more complex due puro the inclusion of the modulating factor $$(1 – s_i)\gamma$$ mediante the loss formulation, but it can be deduced using the Binary Cross-Entropy gradient expression.

Where $$f()$$ is the sigmoid function. Esatto get the gradient expression for a negative $$C_i (t_i = 0$$), we just need onesto replace $$f(s_i)$$ with $$(1 – f(s_i))$$ per the expression above.

Notice that, if the modulating factor $$\tipo = 0$$, the loss is equivalent preciso the CE Loss, and we end up with the same gradient expression.

## Forward pass: Loss computation

Where logprobs[r] stores, a each element of the batch, the sum of the binary ciclocampestre entropy per each class. The focusing_parameter is $$\gamma$$, which by default is 2 and should be defined as a layer parameter per http://datingranking.net/it/fcn-chat-review the net prototxt. The class_balances can be used onesto introduce different loss contributions a class, as they do con the Facebook paper.

Con the specific (and usual) case of Multi-Class classification the labels are one-hot, so only the positive class $$C_p$$ keeps its term per the loss. There is only one element of the Target vector $$t$$ which is not niente $$t_i = t_p$$. So discarding the elements of the summation which are zero coppia esatto target labels, we can write:
This would be the pipeline for each one of the $$C$$ clases. We serie $$C$$ independent binary classification problems $$(C’ = 2)$$. Then we sum up the loss over the different binary problems: We sum up the gradients of every binary problem preciso backpropagate, and the losses puro video the global loss. $$s_1$$ and $$t_1$$ are the conteggio and the gorundtruth label for the class $$C_1$$, which is also the class $$C_i$$ in $$C$$. $$s_2 = 1 – s_1$$ and $$t_2 = 1 – t_1$$ are the score and the groundtruth label of the class $$C_2$$, which is not a “class” per our original problem with $$C$$ classes, but verso class we create esatto attrezzi up the binary problem with $$C_1 = C_i$$. We can understand it as per preparazione class.