Updated: Jul 29

This post tries to implement it in pure python to better understand it’s inner workings and then compare it to other popular implementations for cross-validation.

Cross entropy can be used to define a loss function in machine learning and is usually used when training a classification problem.

In information theory, the cross entropy between two probability distributionspandqover the same underlying set of events measures the average number of bits needed to identify an event drawn from the set if a coding scheme used for the set is optimized for an estimated probability distributionq, rather than the true distributionp. (source)