First lets consider other cost functions like mean squared error. For instance the actual output is 1 billionth and desired is 1 , it is conspicuous that there is almost no gradient for a logistic unit to fix up. Secondly, in scenarios where we are dealing with mutually exclusive classes the sum of probabilities is not guaranteed to be 1.
In short to force the probabilities to sum up to 1 we use Softmax Activation function.
reference : https://www.youtube.com/watch?v=PHP8beSz5o4
In short to force the probabilities to sum up to 1 we use Softmax Activation function.
reference : https://www.youtube.com/watch?v=PHP8beSz5o4