Pages

Monday, June 18, 2018

Theoretical Understanding of Deep Learning using Information Theory


Information theory in communication
Information theory has been proved useful in many applications such as digital communication, computer networking and multimedia compression. For example, the concept of entropy (average information content) can be used to determine the minimum length a given message can be encoded before transmission and the concept of mutual information can be used to determine the channel capacity of a given communication channel. 
Information bottleneck
(analogous to optimal representations)

Regardless of the recent breakthroughs of deep learning, 'why they work, the way they work?', still remains a mystery. Recently, I read about an interesting attempt [1] where the researchers tried to understand the inner workings of deep learning algorithms using the notion of information theory. There, they have come up with a concept called "information bottleneck" to describe the most relevant features or signals with sufficient discriminative ability, excluding all the noise, to solve a given problem. In other words, the goal of the deep neural network is to maximize the information bottleneck (compressing the network as much as possible while preserving the prediction accuracy).

More information can be found here (their original paper) and here (a much simplified version!).

[1] Opening the Black Box of Deep Neural Networks via Information, Schwartz-Viz & Tishby, ICRI-CI 2017

Image sources:
[1] http://www.nairaland.com/3943054/information-theory-made-really-simple
[2] https://www.techwell.com/techwell-insights/2017/05/finding-bottlenecks-agile-and-devops-delivery-cycle

No comments:

Post a Comment