Dev 007: 2018

Sunday, October 28, 2018

On Creativity and Abstractions of Neural Networks

"Are GANs just a tool for human artists? Or are human artists at
the risk of becoming tools for GANs?"

Today we had a guest lecture titled "Creativity and Abstractions of Neural Networks" by David Ha (@HardMaru), Research Scientist at Google Brain, facilitated by Michal Fabinger.

Among all the interesting topics he discussed such as Sketch-RNN, Kanji-RNN and world models, what captivated me most is his ideas about abstraction, machine creativity and evolutional models. What exactly discussed on those topics (as I understood) is,

Generating images based on latent vectors in auto encoders is a useful way to understand how the network understands abstract representations about data. In world models [1], he has used RNN to predict the next latent vector which can think of as an abstract representation of the reality.

Creative machines learn and form new policies to survive or to perform better. This can be somewhat evolutionary (may be not during the life time of one agent). The agents can adopt to different scenarios by modifying them selves too (self-modifying agents).

Some other quotes or facts about human perception that (I think) has inspired his work.

Sketch-RNN [2]:

"The function of vision is to update the internal model of the world inside our head, but what we put on a piece of paper is the internal model" ~ Harold Cohen (1928 -2016), Reflections of design and building AARON

World Models:

"The image of the world around us, which we carry in our head, is just a model. Nobody in their head imagines all the world, government or country. We have only selected concepts, and relationships between them, and we use those to represent the real system." ~ Jay Write Forrester (1918-2016), Father of system dynamics

[1] https://worldmodels.github.io/
[2] https://arxiv.org/abs/1704.03477

Wednesday, July 4, 2018

teamLab: Blurring the Boundaries between Art and Science

My Lizard Painting

Yesterday, we visited MORI building digital art museum: teamLab Borderless (This name is quite long and too hard to remember in the right order :P) which was opened recently in Odaiba... to make Odaiba, or Tokyo for that matter, even greater!

Even though some exhibits look a bit trivial in the beginning, (I felt that the exhibition ticket was somewhat over priced, although it was at discounted price and regardless of the fact that we did not pay for it), a second thought after further reading made me feel so overwhelmed, impressed and fascinated about the extent of innovation, creativity and philosophical thoughts that they have put together in to each piece of art.

This museum gives us a great feel of how digital involvement can nicely complement the traditional forms of art and overcome their inherent limitations. The museum is based on few great concepts. One such concept that highly captivated my curious (... well, about perception, in its all forms) mind is their notion on ultra subjective space. Comparing that concept with the western perspective of paintings and ancient Japanese spatial recognition made the idea even more lucrative.

If you are planning to visit this museum, I highly recommend that you understand those concepts before you visit the museum, in addition to other "things you should read before you visit", to make your museum experience even better.

On the other hand, some activities were quite fun too. Look at how I painted a cute lizard and the way it came alive a little later with all those "natural lizard like moves"!

Tuesday, July 3, 2018

Network Dissection to Divulge the Hidden Semantics of CNN

Needless to mention that nowadays deep convolutional neural networks (CNNs) have gained immense popularity due to its ability to classify or recognize scenes or objects with reasonable accuracy. However, we already know that CNNs can also be fooled by adversarial attacks, so that a given image, that was accurately recognized by a CNN earlier, can be altered in a way that even though its still possible for a human to recognize well, CNN would fail to do so [1]. So, the natural question arises "Are they genuinely learning about object or scenes like we humans do?"

Dissection

Researchers from MIT have recently conducted some experiments along that line as what's happening in hidden layers of CNNs still remains a mystery [2]. Their experiments aim to find out if those individual hidden units align with some human interpretable concepts such as parts of an object or objects in a scene. E.g., lamps (object detector unit) in place recognition, bicycle wheel (part detector) in object detection. If so, they need to find a way to quantify the emerged 'interpretability'. It's interesting to know that neurologists perform a similar task to uncover the behavior of biological neurons too.

Researchers have conducted experiments to find which factors (E.g., axis representation, training techniques) influences to interpretability of those hidden units too. They have found that interpretability is axis dependent, in the sense that if you change the rotation of a given image, the hidden units will no longer be interpretable. Further, different training techniques such as dropout or batch normalization have an impact on interpretability too.

You can find more details on this research here.

[1] https://kaushalya.github.io/DL-models-resistant-to-adversarial-attacks/
[2] D. Bau*, B. Zhou*, A. Khosla, A. Oliva, and A. Torralba. "Network Dissection: Quantifying Interpretability of Deep Visual Representations." Computer Vision and Pattern Recognition (CVPR), 2017. Oral.

Sunday, July 1, 2018

Taskonomy: Disentangling Task Transfer Learning

image source: Taskonomy [1]

The common computer vision tasks such as depth estimation, edge detection are usually performed in isolation.

While scanning through this year’s CVPR papers, I noticed this interesting research [1] (CVPR Best Paper award winner!) that introduced a term called “Taskonomy” (Task + Taxonomy).

Taskonomy focuses on deriving the relationships between these common computer vision tasks so that it can find some representations obtained by certain computer vision tasks that can be useful (in terms of efficient computation time and/ or requirement for less labeled data) in other computer vision tasks.

This is also known as ‘task transferability’. Some interesting visualizations and more information on this research can be found here.

[1] http://taskonomy.stanford.edu/

Seeing is 'Not Necessarily' Believing?

Recently, I read the Nature article, “Our useful inability to see the reality”that introduced a book called “Deviate: The Science of Seeing Differently”. I haven’t read the book yet, but this seems to shed some light on the idea of perception. The basic idea seems to be that we see what we want to see, based on our past experience, and not necessarily what’s out there in reality. In other words, the information we acquire from our eyes has much less to do with how we derive the actual meaning of it (in relation to ourselves of course). Specifically, it is being discovered that the 90% of the neurons that are responsible to make sense of what we see don’t consist of the visual fields in the brain.

Saturday, June 30, 2018

Favorite quotes from "Lab Girl" by Hope Jahren

"While looking at the graph, I thought about how I now knew something for certain that only an hour ago had been an absolutely unknown, and I slowly began to appreciate how my life had just changed. I was the only person in an infinite exploding universe who knew that the powder was made of opal. In a wide, wide world, full of unimaginable numbers of people, I was - in addition to being small and insufficient - special. I was not only a quirky bundle of genes, but I was also unique existentially, because of the tiny detail that I knew about Creation, because of what I had seen and then understood. Until I phoned someone, the concrete knowledge that opal was the mineral that fortified each seed on each hackberry tree was mine alone. Whether or not this was something worth knowing seemed another problem for another day. I stood and absorbed this revelation as my life turned a page, and my first scientific discovery shone, as even the cheapest plastic toy does when it is new."

"I had worked and waited for this day. In solving this mystery I had also proved something, at least to myself, and I finally knew what real research would feel like."

"Afterward I reward myself by sitting in my office choosing and ordering chemicals and equipment, feeling like a giddy bride picking out her gift registry"

"As research scientists, we will never ever be secure."

"He knows about the many nextstory.doc files in my hard drive; He knows how I like to sift through the thesaurus for hours; he knows that nothing feels better to me than finding the exactly right word that stabs cleanly at the heart of what you are trying to say."

Thursday, June 28, 2018

Look Closer to See Better

image source: wikipedia

Hearing about this recent research made me feel a little dumb, and hopefully you will feel the same too. But, anyways it's quite impressive to see the advanced tasks that machines are getting capable of. What we usually hear is that even though recognizing a cat is a simple task for humans, it is quite challenging task for a machine, or let's say.. for a computer.

Then, try to recognize what's in this image? If I was given this task, I would have just said that it's a 'bird', hopefully you would too, unless you are a bird expert or enthusiast. Of course it's a bird, but what if your computer is smart enough to say that it's a 'Laysan albatross' 😂Not feeling dumb enough yet? Seems like the computer is aware of which features in which areas of its body make it a 'Laysan albatross' too.

Even though, there exists some promising research on region detection and fine grained feature learning (E.g., find which region of this bird contain more discriminative features from other bird species and then learn those features, so that we can recognize the bird species of a new, previously unseen, image), they still have some limitations.

So this research [1] focuses on a method where the two components namely attention based region detection and fine grained feature learning strengthen or reinforce each other by giving them feedback to perform better as a whole. The first component starts by looking at the coarse grained features of a given image to identify which areas to pay more attention to. Then the second component will further analyze the fine grained details of those areas to learn what features make this area unique to this species. If the second component is struggling to make confident decisions on recognizing the bird species, then it will inform that to first model as the selected region might not be very accurate.

More information about this research can be found here.

[1] J. Fu, H. Zheng and T. Mei, "Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 4476-4484.

Sunday, June 24, 2018

What makes Paris look like Paris?

windows with railings in Paris

Cities have their own character. May be that’s what makes some cities notable than the others. In her award winning memoire “Eat, Pray, Love”, Elizebeth Gilbert mentions that there’s a ‘word' for each city. She assigns ‘Sex' for Rome, ‘Achieve' for NewYork, ‘Conform' for Stockholm (To add more cities that I have been to, how about ‘ Tranquility' for Kyoto, ‘Elegance' for Ginza and ‘Vibrant' for Shibuya?). When terrorists attacked Paris in 2015, more than 7 million people shared their support for Paris under the #PrayforParis hash tag within 10 hours. Have you ever thought what characteristics make a city feels the way it is? Can we make a machine that can ‘feel' the same way about cities as the humans do?

May be we are not there yet. Nevertheless, researchers from Carnegie Mellon University and Inria have taken an innovative baby step towards this research direction by asking the question “What makes Paris look like Paris?” [1]. Is it the Eiffel tower what makes Paris looks like Paris? How can we find if a given image is taken in Paris if the Eiffel tower is not present in that image?

To start with, they asked people who have been to Paris before, to recognize Paris from some other cities like London or Prague. Humans could achieve this task with significant level of accuracy. In order to make a machine that can perceive a city as the same way as a human does, first we need to figure out, "What characteristics of Paris help humans to perceive Paris as Paris?". So, their research focuses on automatically mining the frequently occurring patterns or characteristics (features) that make Paris geographically discriminative than the other cities. Even though, there can be both local and global features, the researchers have focused only on local, high dimensional features. Hence, image patches at different resolutions, represented as HOG+color descriptors are used for the experiments. Image patches are labeled as two sets namely Paris and non-Paris (London, Prague etc.) Initially, the non discriminative patches, things that can occur in any city such as cars or sidewalks, are eliminated using nearest neighborhood algorithm. If an image patch is similar to other image patches in ‘both' Paris set and non-Paris set, then that image patch is considered as not discriminative and vice versa.

Paris Window painting
by Janis McElmurry

However, the notion of “similarity” can be purely subjective when it comes to similarity between different aspects. So, the standard similarity measurements used in the nearest neighborhood algorithm might not represent the similarity between the elements from different cities well. Accordingly, researchers have come up with a distance or similarity metric that can be learned or adopted to find discriminative features using the available image patches in an iterative manner. This algorithm is executed with images from different cities such as Paris and Barcelona to find distinctive stylist elements of each city.

Interesting fact about this research (well, at least for myself) is artists can use these research findings as useful cues to better capture the style of a given place. More details about this research can be found here.

[1] Carl Doersch, Saurabh Singh, Abhinav Gupta, Josef Sivic, and Alexei A. Efros. What Makes Paris Look like Paris? ACM Transactions on Graphics (SIGGRAPH 2012), August 2012, vol. 31, No. 3.

Tuesday, June 19, 2018

Panoptic Segmentation

Panoptic segmentation is a topic that was discussed during our lab seminar recently, because it could potentially improve scene understanding in autonomous vehicles using vision sensors.

Successful approaches based on convolutional nets have been proposed previously for semantic segmentation task. Further, methods based on object or region proposals have become popular to detect individual objects as well.

Image source: [1]

The idea behind panoptic segmentation [1] is unifying the tasks of semantic segmentation (studying about 'stuff' such as sky, grass, regions) and instance segmentation using object detectors (studying about countable 'things', E.g., different instances of cars).

'Panoptic quality (PQ)' metric is proposed as a novel method to evaluate the proposed approach. More details about this can be found here and a simpler version here.

[1] Panoptic Segmentation, Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, Piotr Dollar

Monday, June 18, 2018

Theoretical Understanding of Deep Learning using Information Theory

Information theory in communication

Information theory has been proved useful in many applications such as digital communication, computer networking and multimedia compression. For example, the concept of entropy (average information content) can be used to determine the minimum length a given message can be encoded before transmission and the concept of mutual information can be used to determine the channel capacity of a given communication channel.

Information bottleneck
(analogous to optimal representations)

Regardless of the recent breakthroughs of deep learning, 'why they work, the way they work?', still remains a mystery. Recently, I read about an interesting attempt [1] where the researchers tried to understand the inner workings of deep learning algorithms using the notion of information theory. There, they have come up with a concept called "information bottleneck" to describe the most relevant features or signals with sufficient discriminative ability, excluding all the noise, to solve a given problem. In other words, the goal of the deep neural network is to maximize the information bottleneck (compressing the network as much as possible while preserving the prediction accuracy).

More information can be found here (their original paper) and here (a much simplified version!).

[1] Opening the Black Box of Deep Neural Networks via Information, Schwartz-Viz & Tishby, ICRI-CI 2017

Image sources:

[1] http://www.nairaland.com/3943054/information-theory-made-really-simple

[2] https://www.techwell.com/techwell-insights/2017/05/finding-bottlenecks-agile-and-devops-delivery-cycle

Pages