Dev 007: ECCV 2020: My Takeaways

Wednesday, September 23, 2020

ECCV 2020: My Takeaways

ECCV 2020 was my first virtual conference experience. There was a very fancy virtual conference environment that looked somewhat realistic :)

source: https://eccv.6connex.eu/

A workshop that I attended even had a funny avatar just for myself using Gather Town. I was mindlessly running here and there among virtual booths just because it was so much fun 😂. This time I wanted to focus more on the domains that I'm not familiar with, so I chose sessions accordingly. Needless to say that having so many interesting sessions was quite overwhelming (in a good way), so I had to first browse through everything and prioritize which ones to attend. Both conferences and 'workshops & tutorials' sites were quite well-organized, so it wasn't difficult to do so. Honestly, I felt that the virtual conference is more effective and efficient in so many ways if we forget about the sightseeing aspect of live conferences ;).

I mainly attended two workshops.

Computer vision for medical imaging: my research mainly focuses on human-like scene understanding. Digital camera sees the world somewhat similar to how a human would see. So, it was interesting to see machine perception from a different perspective, where we can see beyond the visible spectrum. There was a nice introductory session for newbies in medical imaging where they explained how to capture x-ray, ultra-sound, gamma, MRI and PET scans very comprehensively.

How AI will transform health-care?: A (stroke) clinicians view
Challenges and pitfalls in medical image analysis

Video Turing test: Toward Human-level Video Story Understanding: I attended two main sessions in this workshop.

10 questions for a theory of vision by Maro Gori: Using motion invariance [1] as a fundamental way to incorporate scale invariance, rotation invariance and deform invariance, etc. is quite innovative. Also, the ignorance of the "time" aspect for visual recognition (which I partly agree) was discussed. Of course, I clearly agree that the temporal dimension is important for scene understanding. However, as I feel humans can clearly recognize a single image, even an action or activity up to some extent without having multiple frames.
Common sense intelligence by Yejin Choi: common sense reasoning is clearly a brave topic as it is hard to clearly define, yet very important aspect of realizing AI. If I say that this talk completely blew my mind, I'm not exaggerating. There she discussed the gap between perceptual level and cognition level visual understanding and inferring about the dynamic state change of the world. The related paper to this talk is [2]. So glad to see that this talk somewhat supports my view about still image recognition (..feeling relieved lol 👻).

Highlights from the regular sessions:

A unifying mutual information view of metric learning: cross-entropy vs. pairwise losses [3]:

I attended this session because metric learning is my new-found love 😍 The core idea is deep metric learning can be approached both using pairwise losses and cross-entropy. In both cases, minimizing the contrastive loss is equivalent to maximizing the mutual information between features and labels.

Grounded situation recognition [4] (AllenAI):

This paper clearly mentioned the issue with image captioning when it comes to human-like model evaluation. I also felt that when a human see a scene he understands semantic concepts rather than grammatically correct sentences. In addition to the main situation recognition task, they have proposed a few additional tasks such as conditional localization and grounded semantic chaining.

Women in computer vision

Panel discussions and mentoring sessions were quite useful to keep us motivated during challenging times. I'm so grateful that role models in the computer vision field (both men and women) took some time and effort to share their experience with us.

Push forward. You will find some way.
If you want something, just ask for it. Be prepared for rejection.
First, you should do some good work. Attention comes next.
Quality over quantity (don't try to be a paper factory)
Don't go after low hanging fruits (e.g., incremental research). Do something different and new.
History is important in any field.
Having hobbies not only helps you to relax. They help your work also.
Never give up :)

Industrial booths:

I got to know about Voxel 51 tool. If you are doing object detection related research, this tool might come in handy to deeply and easily analyze your detection results. Loved the "confidence slider" and "uniqueness" features [5].

Overall, it was such a great experience and kept me fascinated during the pandemic. It was like an "academic vacation" 💃. Kudos to the organizers who successfully organized a virtual conference for the first time.

[1] Betti, A., Gori, M. and Melacci, S., 2020. Learning visual features under motion invariance. Neural Networks.

[2] Park JS, Bhagavatula C, Mottaghi R, Farhadi A, Choi Y. VisualCOMET: Reasoning about the Dynamic Context of a Still Image.

[3] https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123510545.pdf

[4] https://arxiv.org/abs/2003.12058

[5] https://voxel51.com/docs/fiftyone/user_guide/brain.html