Final year research project
Abstract
At present, most of the information retrieval mechanisms consider only exact matches of
textual metadata such as topics, manual tags and descriptions etc. These methods are yet to
provide the right information to match the level of human intuition driven relevance. The main
contributor to such factors is due to the lack of assessing relevance of the content and context
of the available data in a unified manner.
Extracting semantic content and inferring knowledge from low level features has always been a major challenge because of the well-known semantic gap issue.
The proposed solution strives to overcome the above mentioned difficulty by providing a framework based approach using machine learning and knowledge representation where right information can be retrieved regardless of the content format or contextual discrepancies. Given that information can be embedded as any content format, the proposed framework analyzes and provides a set of content and context descriptors which can be used in any information retrieval application.
Extracting semantic content and inferring knowledge from low level features has always been a major challenge because of the well-known semantic gap issue.
The proposed solution strives to overcome the above mentioned difficulty by providing a framework based approach using machine learning and knowledge representation where right information can be retrieved regardless of the content format or contextual discrepancies. Given that information can be embedded as any content format, the proposed framework analyzes and provides a set of content and context descriptors which can be used in any information retrieval application.
Key Words:
Information Retrieval, Computer Vision, Ontology, Development Framework
Information Retrieval, Computer Vision, Ontology, Development Framework
Literature Review
In spite of exponential growth of information, which comes in various forms such as video,
image and audio, modern information retrieval mechanisms still use manual, text based
metadata such as topic, tags and description to provide relevant results to user queries.
Actual content and its associated context are not considered by information retrieval mechanisms
when assessing relevance of the search result to the given user query.
Accordingly, modern information retrieval systems are yet far beyond from the way human instinct would assess relevance of the information.
Considering existing products, given a sample image or audio, Query By Example (QBE)
methods have been suggested to find similar content by analyzing and matching the actual
content. However, in this approach matching is done between low level features.
It is difficult
for user to match the search intention in close proximity to the respective low level features,
which results in significant semantic information loss. Ontology driven methods are invented to
provide semantic analysis of the information using an underlying knowledge base. However,
these solutions are either text or web content based.
Each actual content representation and its associated context hold equal value when weighing
relevance of information to the user. So, due to the information loss that can incur by only
considering the text, relevance of the results is greatly reduced. Further, to bridge the gap
between human cognitive way of information retrieval and automatic information retrieval,
machines should simulate this behavior by analyzing the obtainable audio, visual content.
Even though it is evident that visual information such as image and video helps user to
understand a particular concept more realistically with less effort, no research has been done
to find a collective way of processing visual information along with text and audio and infer the
associated context.
After an extensive study on the research being done on this area it is found that, extracting
semantic concepts from low level observable features has always been a major challenge due
to the well-known semantic gap issue.
When it comes to feature detection and extraction of visual information, local features are
preferred over global features to avoid back ground clutter. With the advent of scale invariant
feature detectors and descriptors such as SIFT and SURF, object detection and recognition task
has improved drastically with invariance to scale, rotation and illumination. Also, due to the
high dimensionality of these descriptors, distinctiveness is greatly improved. Bag of visual
Words model has adopted the above advantages in its application on semantic object
detection.
During the evaluation of different content processing mechanisms for image, video and audio,
it is found that even though the processing mechanism for each content format differs from
each other, all of them have a common flow namely, pre-processing, feature detection, feature
extraction and semantic concept detection.
Once the semantic concepts are extracted from the low level features, context should be inferred for a given set of concepts to further refine the solution.
Once the semantic concepts are extracted from the low level features, context should be inferred for a given set of concepts to further refine the solution.
Wide range of context inferring approaches is considered namely, Natural Language Processing
(NLP), Logic based methods (formal logic, predicate logic), Fuzzy reasoning, Probabilistic
methods (Bayesian Reasoning) and semantic networks.
Semantic networks are designated as
the main technique to infer the context, yet no sufficient algorithm was encountered to meet
the exact requirement. Further, author discovered that concept of fuzzy reasoning can be used
to assess relevance in form of gradual transition between falsehood to truth during application
in information retrieval.
Considering the above given aspects, it is concluded that the suggested solution for the
problem domain is unique and feasible, yet immensely challenging due to its limitations in
feature extraction techniques to bridge semantic gap issue, especially for a wide domain.
Further, limitation on available algorithms for context inference has made it even more time
consuming and thought provoking.
High Level Design
Rich Picture in Information Retrieval Application
High Level Architecture
Data Flow
Class Diagram (Framework Design)
Implementation
Content Negotiation
Content Extraction
- Image/ video processing tool: OpenCV (EmguCV)
- Algorithms: Bag of visual words model, SIFT/ SURF for visual feature detection and extraction, K-means for feature clustering, Naive Bayes for classification
SIFT/ SURF Algorithm comparison
Visual word histograms for similar concepts
Context Inference
Context inference for ambiguous scenarios
Semantic network: WordNet
Application in Information Retrieval
Once content and context descriptors are retrieved from the framework, they can be used in
many applications. One such application is given here.
Literal relatedness between user context and content context can be derived using relevance
decision factor.
This measure can be used to assess the relevance of the available content to a
particular user.
For example, user context can be presented in different forms such as
profession, personal interests and present short term search intention. Content context can be
metadata of the available content.
Testing and Evaluation
Precision and recall has been used in many information retrieval
applications to assess relevance.
For testing, test cases were derived according to the given criteria for each component. Training and ground truth images were taken from Google images and Flickr.
Accuracy of the concept detection for images with different variations such as scale, illumination and back ground clutter were tested.
Context inference component was tested for different threshold values to get quantitative measures for relevance such as precision and recall.
Then, non-functional testing was performed. Test results indicate that the implementation of prototype is successful.
Further, critical evaluation was performed with participation of domain experts from academic or technical background. Participants confirmed that the suggested approach helps to improve the relevance of information retrieval and thus it is a timely need.
For testing, test cases were derived according to the given criteria for each component. Training and ground truth images were taken from Google images and Flickr.
Accuracy of the concept detection for images with different variations such as scale, illumination and back ground clutter were tested.
Context inference component was tested for different threshold values to get quantitative measures for relevance such as precision and recall.
Then, non-functional testing was performed. Test results indicate that the implementation of prototype is successful.
Further, critical evaluation was performed with participation of domain experts from academic or technical background. Participants confirmed that the suggested approach helps to improve the relevance of information retrieval and thus it is a timely need.
Future Enhancements
-
- Support different content processing mechanisms for same content type (E.g., Research papers and new paper given as text content)
- Different processing modules such as audio and web content can be implemented and plugged in to the framework
- Fusion of audio words with visual content can be used to improve the accuracy of semantic concept recognition in video content
- Scale and evaluate the performance on a realistic database