Sunday, December 2, 2012

Content Extraction and Context Inference based Information Retrieval

Final year research project


At present, most of the information retrieval mechanisms consider only exact matches of textual metadata such as topics, manual tags and descriptions etc. These methods are yet to provide the right information to match the level of human intuition driven relevance. The main contributor to such factors is due to the lack of assessing relevance of the content and context of the available data in a unified manner. 

Extracting semantic content and inferring knowledge from low level features has always been a major challenge because of the well-known semantic gap issue.

The proposed solution strives to overcome the above mentioned difficulty by providing a framework based approach using machine learning and knowledge representation where right information can be retrieved regardless of the content format or contextual discrepancies. Given that information can be embedded as any content format, the proposed framework analyzes and provides a set of content and context descriptors which can be used in any information retrieval application. 

Key Words:
Information Retrieval, Computer Vision, Ontology, Development Framework

Literature Review

In spite of exponential growth of information, which comes in various forms such as video,
image and audio, modern information retrieval mechanisms still use manual, text based
metadata such as topic, tags and description to provide relevant results to user queries. 

Actual content and its associated context are not considered by information retrieval mechanisms
when assessing relevance of the search result to the given user query. 

Accordingly, modern information retrieval systems are yet far beyond from the way human instinct would assess relevance of the information. 

Considering existing products, given a sample image or audio, Query By Example (QBE) methods have been suggested to find similar content by analyzing and matching the actual content. However, in this approach matching is done between low level features. 

It is difficult for user to match the search intention in close proximity to the respective low level features, which results in significant semantic information loss. Ontology driven methods are invented to provide semantic analysis of the information using an underlying knowledge base. However, these solutions are either text or web content based. 

Each actual content representation and its associated context hold equal value when weighing relevance of information to the user. So, due to the information loss that can incur by only considering the text, relevance of the results is greatly reduced. Further, to bridge the gap between human cognitive way of information retrieval and automatic information retrieval, machines should simulate this behavior by analyzing the obtainable audio, visual content. 

Even though it is evident that visual information such as image and video helps user to understand a particular concept more realistically with less effort, no research has been done to find a collective way of processing visual information along with text and audio and infer the associated context.
After an extensive study on the research being done on this area it is found that, extracting semantic concepts from low level observable features has always been a major challenge due to the well-known semantic gap issue. 

When it comes to feature detection and extraction of visual information, local features are preferred over global features to avoid back ground clutter. With the advent of scale invariant feature detectors and descriptors such as SIFT and SURF, object detection and recognition task has improved drastically with invariance to scale, rotation and illumination. Also, due to the high dimensionality of these descriptors, distinctiveness is greatly improved. Bag of visual Words model has adopted the above advantages in its application on semantic object detection. 

During the evaluation of different content processing mechanisms for image, video and audio, it is found that even though the processing mechanism for each content format differs from each other, all of them have a common flow namely, pre-processing, feature detection, feature extraction and semantic concept detection. 

Once the semantic concepts are extracted from the low level features, context should be inferred for a given set of concepts to further refine the solution. 

Wide range of context inferring approaches is considered namely, Natural Language Processing (NLP), Logic based methods (formal logic, predicate logic), Fuzzy reasoning, Probabilistic methods (Bayesian Reasoning) and semantic networks. 

Semantic networks are designated as the main technique to infer the context, yet no sufficient algorithm was encountered to meet the exact requirement. Further, author discovered that concept of fuzzy reasoning can be used to assess relevance in form of gradual transition between falsehood to truth during application in information retrieval. 

Considering the above given aspects, it is concluded that the suggested solution for the problem domain is unique and feasible, yet immensely challenging due to its limitations in feature extraction techniques to bridge semantic gap issue, especially for a wide domain. Further, limitation on available algorithms for context inference has made it even more time consuming and thought provoking.

 High Level Design

Rich Picture in Information Retrieval Application

High Level Architecture

Data Flow

Class Diagram (Framework Design)


Content Negotiation

Content Extraction

  • Image/ video processing tool: OpenCV (EmguCV)
  • Algorithms: Bag of visual words model, SIFT/ SURF for visual feature detection and extraction, K-means for feature clustering, Naive Bayes for classification

SIFT/ SURF Algorithm comparison

Visual word histograms for similar concepts

Context Inference

Context inference for ambiguous scenarios

Semantic network: WordNet

Application in Information Retrieval

Once content and context descriptors are retrieved from the framework, they can be used in many applications. One such application is given here. 

Literal relatedness between user context and content context can be derived using relevance decision factor. 

This measure can be used to assess the relevance of the available content to a particular user. 

For example, user context can be presented in different forms such as profession, personal interests and present short term search intention. Content context can be metadata of the available content.

Testing and Evaluation

Precision and recall has been used in many information retrieval applications to assess relevance.

For testing, test cases were derived according to the given criteria for each component. Training and ground truth images were taken from Google images and Flickr. 

Accuracy of the concept detection for images with different variations such as scale, illumination and back ground clutter were tested. 

Context inference component was tested for different threshold values to get quantitative measures for relevance such as precision and recall. 

Then, non-functional testing was performed. Test results indicate that the implementation of prototype is successful. 

Further, critical evaluation was performed with participation of domain experts from academic or technical background. Participants confirmed that the suggested approach helps to improve the relevance of information retrieval and thus it is a timely need. 

Future Enhancements

    • Support different content processing mechanisms for same content type (E.g., Research papers and new paper given as text content) 
    • Different processing modules such as audio and web content can be implemented and plugged in to the framework  
    • Fusion of audio words with visual content can be used to improve the accuracy of semantic concept recognition in video content  
    • Scale and evaluate the performance on a realistic database


No comments:

Post a Comment