This is an example scenario to understand the basic concepts behind SOLR/ Lucene indexing and search using advertising web site[4].
Use case:
Searcher: I want to search cars by different aspects such as car model, location, transmission, special features etc. Also, I want to see the similar cars that belong to same model as recommendations.
SOLR uses index which is an optimized data structure for fast retrieval.
To create an index, we need to come up with a set of documents with fields in it. How do we create a document for the following advertisement?
Title: Toyota Rav4 for sale
Category: Jeeps
Location: Seeduwa
Model: Toyota Rav4
Transmission: Automatic
Description: find later
SOLR document:
document 1 Toyota Rav4 for sale
Jeeps Seeduwa Toyota Rav4 Automatic Brought Brand New By Toyota Lanka-Toyota Rav4 ACA21, YOM-2003, HG-XXXX Auto, done approx 79,500 Km excellent condition, Full Option, Alloy Wheels, Hood Railings, call No brokers please.
Some more documents based on advertisements...
document 2
Nissan March for sale
Cars Dankotuwa K11 Automatic A/C, P/S, P/W, Center locking, registered year 1998, full option, Auto, New battery, Alloys, 4 doors, Home used car, Mint condition, Negotiable,
document 3
Nissan March K12 for rent Cars Galle K12 Automatic A/C, P/S, P/W, Center locking, registered year 2004, full option, Auto, New battery, Alloys, 4 doors, cup holder, Doctor used car, Mint condition, Negotiable,
Inverted Index
Then SOLR creates an inverted index as given below: (Lets take example field as Title)
toyota doc1(1x)
rav4 doc1(1x)
sale doc1(1x) doc2(1x)
nissan doc2(1x)
march doc2(1x)
1x means the term frequency of the document for that particular field.
Lucene Analyzers
Note that “for” term here is eliminated during Lucene stop word removal process using Lucene text analysers. You can come up with your own analyser based on your preference as well.
Field configuration and search
You can configure, which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields using schema.xml.
For example, if you need to index description field as well and the description value of the field should be retrievable during search, what you need to do is add the following line in schema.xml [1].
Now, assume a user search for a vehicle.
Search query: “nissan cars for rent”
SOLR query would be /solr/select/?q=title:”nissan cars for rent"
Ok what about the other fields (Category, location, transmission etc. ?)?
By default, SOLR standard query parser can only search one field. To use multiple fields such as title and description and give them a weight to consider during retrieval based on their significance (boosts) we should use Dismax parser [2, 3]. Simply said, using Dismax parser you can make title field more important than description field.
Anatomy of a SOLR query
q - main search statement fl - fields to be returned wt - response writer (response format)
http://localhost:8983/solr/select?q=*:*&wt=json - select all the advertisements
http://localhost:8983/solr/select?q=*:*&fl=title,category,location,transmission&sort=title desc - select title,category,location,transmission and sort by title in descending order
wt parameter = response writer http://localhost:8983/solr/select?q=*:*&wt=json - Display results in json format http://localhost:8983/solr/select?q=*:*&wt=xml - Display results in XML format
http://localhost:8983/solr/select?q=category:cars&fl=title,category,location,transmission - Give results related to cars only
more option can be found at [5].
Coming up next...
Extending SOLR functionality using RequestHandlers and Components
In my opinion, to implement methods in abstract class you need to inherit the abstract class.One of the key benefits of inheritance is to minimise the amount of duplicate code by implement common functionalities in parent classes. so if the abstract class have some common generic behaviour that can be shared with its concrete classes, then using abstract class would be optimal.
However, if all methods are abstract and those methods do not represent any unique/significant behaviour related to the class instances, it may be better to use interface instead.
Use abstract classes to define planned inheritance hierarchies. Classes with already defined inheritance hierarchy can extend their behavior in terms of the “roles” they can play, which are not common to its parents all the other children, using interfaces. Abstract classes will not help in this situation because of multiple inheritance restriction in java language.
A key difference between interface and abstract class is, “Interfaces simulate multiple inheritance” for languages where multiple inheritance is not supported due to “Deadly Diamond of Death” problem.
How interfaces avoid “Deadly Diamond of Death” problem?
Since interface methods do not have their underlying implementation, unlike the inherited class methods, there won’t be this problem as there can be multiple method signatures that are same, but there can be only one implementation for a particular class instance as duplicate methods cannot be compiled without any errors.
If it is obvious for you that this has nothing to do with an issue on granting access, check for version incompatibilities of the .class or the related class.
This is new addition in java 1.7, so if by default JDK is set as older version, this exception will be given. However, when I check java -version and it says java version "1.7.0_45".
If you have java version specific code in your maven application add the following section in your pom.xml
Still it will give the following error:
[ERROR] Failed to execute goal X.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project X: Compilation failure
[ERROR] Failure executing javac, but could not parse the error:
[ERROR] javac: invalid target release: 1.7
[ERROR] Usage: javac [ERROR] use -help for a list of possible options
To solve this issue, set the JAVA_HOME variable to the following using any of the following methods:
// Set JAVA_HOME for one session
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home
OR
// Set JAVA_HOME for permanently
vim ~/.bash_profile
export JAVA_HOME=$(/usr/libexec/java_home)
source .bash_profile
echo $JAVA_HOME
Now compile the application
For those who are curious...
When deciding which JVM to consider for compiling, path specified in JAVA_HOME is used. Here's how to check that.
echo $JAVA_HOME
If it is not specified in JAVA_HOME, using the following command, you can see where JDK is located in your machine:
which java
It will give something like this: /usr/bin/java
Try this to find where this command is heading to.
ls -l /usr/bin/java
This is a symbolic link to the path /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands
Now try the following command:
cd /System/Library/Frameworks/JavaVM.framework/Versions
ls
Check where "CurrentJDK" version is linked to. (Right click > Get info) Mine it was /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents.
Version specified as the "currentJDK" will determine which JVM should be used from the available JVMs.
So, this is why I got the "package java.nio.file does not exist" at the first place, as the default referenced JDK is older than 1.7.
How to point Current JDK to correct version?
cd /System/Library/Frameworks/JavaVM.framework/Versions
sudo rm CurrentJDK
sudo ln -s /Library/Java/JavaVirtualMachines/jdk1.7.0_21.jdk/Contents/ CurrentJDK
Additional info...
Also, use the following command to verify from where the Java -version is read. (for fun!.. :))
sudo dtrace -n 'syscall::posix_spawn:entry { trace(copyinstr(arg1)); }' -c "/usr/bin/java -version"
It will output something like this:
dtrace: description 'syscall::posix_spawn:entry ' matched 1 probe
dtrace: pid 7584 has exited
CPU ID FUNCTION:NAME
2 629 posix_spawn:entry /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/bin/java