Alfresco summit 2014 started on 7th October (which was also my birthday and my Zaizi first year anniversary!!!) at London Hilton Metropole hotel.
This was my first Alfresco summit experience, which turned out to be very exciting and rewarding.
On the first day, I got the opportunity to attend full day Alfresco training by Rui Fernandes, Alfresco international trainer, to get hands-on experience with web scripts.
Then, we had the "partner drinks" event, which was a networking event for all the Alfresco partners. (Special thanks for Zaizi London friends who wished me for my birthday and and sang happy birthday song in the event to make me feel home, and also for Zaizi Colombo friends who sent me surprise birthday gifts!)
There, people were interested about Zaizi - Connecting Intelligence slogan and they were so curious to know what we are doing with semantics, NLP and machine learning at Zaizi.
.
The summit sessions were officially started next day followed by the product keynote by Thomas DeMeo, the VP of product management for Alfresco
The first session I attended was " Activiti in the Event-Driven Architecture" by Robin Bramley Chief Scientific Officer for Ixxus. It was about using complex event processing in BPM for business activity monitoring (infer co-related event or patterns). The solution uses Activiti, Apache Camel and Esper. It was interesting to see how organisations can gain meaningful insight for business operations using their day to day events.
The next session was "Whats new in Alfresco 5 search?” Andy Hind and Mike Farman from Alfresco
. There, the presenters explained the new search enhancements such as Alfresco faceted search, auto-suggest and spelling and performance improvements, achieved using new SOLR4 features.
The next session, namely "Declarative configuration of share session” by Dave Draper from Alfresco focused on creating dynamic UI component design using Akiue framework. There was thunderous applause from the audience, when the speaker created a simple 'things to do' UI application in 3 mins.
Next one was the lightening talks session. It was a challenging session for speakers, because the slides were auto playing and each speaker got only 5 mins to talk, yet it was fun for the audience. The "easy edit" application, which helps to open the documents in their native application itself (E.g., MS doc in MS Office) for edit was a cool feature. I exactly do not remember if this was the session which the speakers took off their shirts to release their new product. ;)
The last session for the day was "Top 10 tricks to improve Alfresco performance” by Sergio from Zaizi. Using his experience with Alfresco, he explained different SOLR settings, Alfresco configuration settings and clustering mechanisms for share and alfresco that should be in place to improve Alfresco performance.
Also, he mentioned one more important thing that applies not only with Alfresco performance, but with anything we do in our lives. That is… “What ever you do, do it with passion!”
After that, we attended the awesome evening party at Palace Suite, with The band Sway Allstars.
Too bad I missed the Zaizi after party at Apres London, as I had to go back to hotel for my speech preparation for next day. #LastMinutePanic
Next day, I presented in the session “Set your content straight” which talks about applying machine learning techniques in Alfresco to enhance metadata and search capabilities.
Then, Ainga, CEO Zaizi explained few customer use cases where we can take the advantage of technologies such as content analytics, data linking, intelligence search and machine learning to solve business problems in his session "Content Intelligence: Advanced Information Technologies for Alfresco"
.
Another session that I was interested about was "Indexing and searching speech contained in audio and video content” by Romain Guinot from Alfresco.
There, they have integrated Pocketsphinx : speech-to-text engine with Alfresco transformations to enable video and audio content indexing and search. Audio and video provides better help when it comes to grasp a particular perception effectively, yet they are consuming more network bandwidth than other medias such as text.
So, this provides a nice solution for the above issue by indexing the actual content of the audio or video files and make them available for searching.
Then, Lucas was presenting the session on "Scale Audit & Reporting with a NoSQL Architecture”.
Last but not least: Bringing Alfresco to Google glass! In this presentation, it was demonstrated how google glass can be used as a wearable device to publish Alfresco content using GDK - Glass Development Kit for this. After the session, we could try out the Google glass too! :)
On the whole, it was a great opportunity to see how people are using cutting edge technologies and customising Alfresco to change the way enterprises manage and interact with their documents.
If you miss this event, the recorded sessions will be available in Alfresco summit site and Youtube after some time, so it’s worth downloading and watching them.
When extending this game as multi-user game, another player baby fish is added to the game. In addition to escaping from enemy fish and meeting friendly fish, the baby fish players should compete with each other to go home sooner than the other while getting maximum number of points.
Design
Client - server architecture
Client server architecture is used instead of peer-to-peer approach to due to its simplicity and ease of development.
In a client-server architecture all the game players (baby fishes), or "clients", are connected to a central machine, the Fish game server.
The server is responsible for important decisions such as creating game friend/ enemy fish collection, managing state and broadcasting this information (x, y coordinates of players and non-players) to the individual clients.
As a result, the server becomes a key bottleneck for both bandwidth and computations.
However, this approach will consume more network bandwidth.
Concurrent game playing using multi-threading
Multi- threading approach is used to enable multiple users to play the game concurrently. A separate thread will represent each client.
Network Access using socket communication
TCP/ IP Socket communication (low level network communication) is used for two-way communication between server and clients. Remote Method Invocation (RMI) approach is not used here, as it will incur additional processing overhead.
Using encapsulation to support multiple network protocols
FishServer class and FishClient interface do not include any TCP/ IP networking specific programming. This generic design will support different network protocols with out changing the core game logic.
Class Diagram
Sequence Diagram
Implementation
Threading in Java
Synchronized keyword
Since game server is accepting different threads that request or send messages, which access same resources (E.g., objects, variables), for preventing thread interference and memory consistency errors.
Runnable interface
Runnable interface is used to implement what each player client thread is supposed to do once executed.
Networking in Java
Serializable Interface
We need to send the game information such as game scores and player/ non-player x, y coordinates across network.
To achieve the above, state of the objects is transmitted across network by converting objects to byte arrays.
Classes are serialized before sending through network and de-serialized once received using Serializable interface.
Socket and ServerSocket
ServerSocket object is created to listen for incoming game player client connections.
The accept method is used to wait for incoming connections.
The accept method returns an instance of a Socket class, which represents the connection to the new client.
ObjectOutputStream/ ObjectInputStream methods (writeObject, readObject) are used to get object streams to reading from and writing to the new client.
The lost fish is a simple educational game for children to learn about behaviors and characteristics of different sea creatures such as fishes and plants in a fun and challenging way.
E.g., Dolphin is an innocent, friendly sea animal, whereas shark is harmful
Also, it is intended to make the children aware of common characteristics of the sea creatures, which belongs to a particular category.
E.g., harmful animals will scream furiously and look angry
In this game, player is a baby fish that has lost in the sea. Baby fish has to find its way home passing different barriers, while escaping from harmful animals.
Game Rules
Assume, before starting the game, baby fish will be given some knowledge on the sea creatures by its mother fish. But, mother fish will not be able to tell about “all” the sea creatures.
If the baby fish collides with a harmful creature, the baby fish will get weak. (Loose energy)
If the baby fish identifies and meet friendly fish it will gain energy. If energy level is below zero, baby fish will die and the game is over.
Baby fish will get bonus points if it reach/find home soon.
Win the Game
Baby fish has to reach home with maximum energy level with maximum bonus points, to win the game.
Design
This game is designed with the intention of improving further with a variety of sea animals with different appearances and behaviors. Then, game will keep the players interested and also this will lead to a good learning resource as well.
Design Decisions
OOP concepts such as encapsulation, inheritance and polymorphism are used to improve the reusability, maintainability and the extendibility of the game.
Strategy Design Pattern
Strategy design pattern is used to effectively extend the game with new sea creatures with diverse behaviors, to keep the player entertained. Also, different behaviors can be dynamically invoked using this approach.
An example scenario is given below:
Assume we have to add two new sea animals (E.g., whale and jelly fish) with different/ existing sound behaviors to the game, with out doing major changes to the core game design and avoiding duplicate code. If we use the traditional inheritance approach, where all the sea animal behaviors (E.g., sound/ scream) inherit from parent class SeaAnimal, then the code will be duplicated for similar behaviors. But, the given approach will solve that problem using interface implementations for similar behaviors.
The current design supports the above scenario in two different ways.
• Inheritance: New sea creatures can be created by extending “SeaAnimal” abstract class
• Polymorphism: Novel sound behaviors can be added or existing sound behaviors can be reused using “Sound” interface.
Using Constants
Constant values are used wherever applicable to improve reusability and maintainability.
Class Diagram
Sequence diagrams
UI Design
Here's the video: https://www.youtube.com/watch?v=ipv_6yYAUw4 References
This is an example scenario to understand the basic concepts behind SOLR/ Lucene indexing and search using advertising web site[4].
Use case:
Searcher: I want to search cars by different aspects such as car model, location, transmission, special features etc. Also, I want to see the similar cars that belong to same model as recommendations.
SOLR uses index which is an optimized data structure for fast retrieval.
To create an index, we need to come up with a set of documents with fields in it. How do we create a document for the following advertisement?
Title: Toyota Rav4 for sale
Category: Jeeps
Location: Seeduwa
Model: Toyota Rav4
Transmission: Automatic
Description: find later
SOLR document:
document 1 Toyota Rav4 for sale
Jeeps Seeduwa Toyota Rav4 Automatic Brought Brand New By Toyota Lanka-Toyota Rav4 ACA21, YOM-2003, HG-XXXX Auto, done approx 79,500 Km excellent condition, Full Option, Alloy Wheels, Hood Railings, call No brokers please.
Some more documents based on advertisements...
document 2
Nissan March for sale
Cars Dankotuwa K11 Automatic A/C, P/S, P/W, Center locking, registered year 1998, full option, Auto, New battery, Alloys, 4 doors, Home used car, Mint condition, Negotiable,
document 3
Nissan March K12 for rent Cars Galle K12 Automatic A/C, P/S, P/W, Center locking, registered year 2004, full option, Auto, New battery, Alloys, 4 doors, cup holder, Doctor used car, Mint condition, Negotiable,
Inverted Index
Then SOLR creates an inverted index as given below: (Lets take example field as Title)
toyota doc1(1x)
rav4 doc1(1x)
sale doc1(1x) doc2(1x)
nissan doc2(1x)
march doc2(1x)
1x means the term frequency of the document for that particular field.
Lucene Analyzers
Note that “for” term here is eliminated during Lucene stop word removal process using Lucene text analysers. You can come up with your own analyser based on your preference as well.
Field configuration and search
You can configure, which fields your documents can contain, and how those fields should be dealt with when adding documents to the index, or when querying those fields using schema.xml.
For example, if you need to index description field as well and the description value of the field should be retrievable during search, what you need to do is add the following line in schema.xml [1].
Now, assume a user search for a vehicle.
Search query: “nissan cars for rent”
SOLR query would be /solr/select/?q=title:”nissan cars for rent"
Ok what about the other fields (Category, location, transmission etc. ?)?
By default, SOLR standard query parser can only search one field. To use multiple fields such as title and description and give them a weight to consider during retrieval based on their significance (boosts) we should use Dismax parser [2, 3]. Simply said, using Dismax parser you can make title field more important than description field.
Anatomy of a SOLR query
q - main search statement fl - fields to be returned wt - response writer (response format)
http://localhost:8983/solr/select?q=*:*&wt=json - select all the advertisements
http://localhost:8983/solr/select?q=*:*&fl=title,category,location,transmission&sort=title desc - select title,category,location,transmission and sort by title in descending order
wt parameter = response writer http://localhost:8983/solr/select?q=*:*&wt=json - Display results in json format http://localhost:8983/solr/select?q=*:*&wt=xml - Display results in XML format
http://localhost:8983/solr/select?q=category:cars&fl=title,category,location,transmission - Give results related to cars only
more option can be found at [5].
Coming up next...
Extending SOLR functionality using RequestHandlers and Components
In my opinion, to implement methods in abstract class you need to inherit the abstract class.One of the key benefits of inheritance is to minimise the amount of duplicate code by implement common functionalities in parent classes. so if the abstract class have some common generic behaviour that can be shared with its concrete classes, then using abstract class would be optimal.
However, if all methods are abstract and those methods do not represent any unique/significant behaviour related to the class instances, it may be better to use interface instead.
Use abstract classes to define planned inheritance hierarchies. Classes with already defined inheritance hierarchy can extend their behavior in terms of the “roles” they can play, which are not common to its parents all the other children, using interfaces. Abstract classes will not help in this situation because of multiple inheritance restriction in java language.
A key difference between interface and abstract class is, “Interfaces simulate multiple inheritance” for languages where multiple inheritance is not supported due to “Deadly Diamond of Death” problem.
How interfaces avoid “Deadly Diamond of Death” problem?
Since interface methods do not have their underlying implementation, unlike the inherited class methods, there won’t be this problem as there can be multiple method signatures that are same, but there can be only one implementation for a particular class instance as duplicate methods cannot be compiled without any errors.
If it is obvious for you that this has nothing to do with an issue on granting access, check for version incompatibilities of the .class or the related class.
This is new addition in java 1.7, so if by default JDK is set as older version, this exception will be given. However, when I check java -version and it says java version "1.7.0_45".
If you have java version specific code in your maven application add the following section in your pom.xml
Still it will give the following error:
[ERROR] Failed to execute goal X.plugins:maven-compiler-plugin:2.5.1:compile (default-compile) on project X: Compilation failure
[ERROR] Failure executing javac, but could not parse the error:
[ERROR] javac: invalid target release: 1.7
[ERROR] Usage: javac [ERROR] use -help for a list of possible options
To solve this issue, set the JAVA_HOME variable to the following using any of the following methods:
// Set JAVA_HOME for one session
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home
OR
// Set JAVA_HOME for permanently
vim ~/.bash_profile
export JAVA_HOME=$(/usr/libexec/java_home)
source .bash_profile
echo $JAVA_HOME
Now compile the application
For those who are curious...
When deciding which JVM to consider for compiling, path specified in JAVA_HOME is used. Here's how to check that.
echo $JAVA_HOME
If it is not specified in JAVA_HOME, using the following command, you can see where JDK is located in your machine:
which java
It will give something like this: /usr/bin/java
Try this to find where this command is heading to.
ls -l /usr/bin/java
This is a symbolic link to the path /System/Library/Frameworks/JavaVM.framework/Versions/Current/Commands
Now try the following command:
cd /System/Library/Frameworks/JavaVM.framework/Versions
ls
Check where "CurrentJDK" version is linked to. (Right click > Get info) Mine it was /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents.
Version specified as the "currentJDK" will determine which JVM should be used from the available JVMs.
So, this is why I got the "package java.nio.file does not exist" at the first place, as the default referenced JDK is older than 1.7.
How to point Current JDK to correct version?
cd /System/Library/Frameworks/JavaVM.framework/Versions
sudo rm CurrentJDK
sudo ln -s /Library/Java/JavaVirtualMachines/jdk1.7.0_21.jdk/Contents/ CurrentJDK
Additional info...
Also, use the following command to verify from where the Java -version is read. (for fun!.. :))
sudo dtrace -n 'syscall::posix_spawn:entry { trace(copyinstr(arg1)); }' -c "/usr/bin/java -version"
It will output something like this:
dtrace: description 'syscall::posix_spawn:entry ' matched 1 probe
dtrace: pid 7584 has exited
CPU ID FUNCTION:NAME
2 629 posix_spawn:entry /Library/Java/JavaVirtualMachines/jdk1.7.0_45.jdk/Contents/Home/bin/java
Name Entity Recognition (NER) is a significant method for extracting structured information from unstructured text and organize information in a semantically accurate form for further inference and decision making.
NER has been a key pre-processing step for most of the natural language processing applications such as information extraction, machine translation, information retrieval, topic detection, text summarization and automatic question answering tasks.
In NER tasks frequently detected entities are Person, Location, Organization, Time, Currency , Percentage, Phone number, and ISBN.
E.g., When translating "Sinhala text to English text", we need to figure out what are person names, locations in that text, so that we can avoid the overhead of finding corresponding English meaning for them. This is also helpful in question answering scenarios such as "Where Enrique was born?"
Different methods such as rule based systems, statistical and gazetteers have been used in NER task, however, statistical approaches have been more prominent and other methods are used to refine the results as post processing mechanism.
In computational statistics, NER has been identified as sequence labeling task and Conditional Random Fields (CRF ) has been successfully used to implement this.
In this article I will use CRF++to explain how to implement a named entity recognizer using a simple example.
Consider the following input sentence:
"Enrique was born in Spain"
Now, by looking at this sentence any human can understand that Spain is a Location. But machines are unable to do so without previous learning.
So, to learn the computer we need to identify a set of features that links the aspects of what we observe in this sentence with the class we want to predict, which in this case "Location". How can we do that?
Considering the token/ word "Spain" itself is not sufficient to decide that it is a location in a generic manner. So, we consider its "context" as well which includes the previous/ next word, it's POS tag, previous NE tag etc. to infer the NE tag for token "Spain".
Feature Template and Training dataset
In this example, I will use "previous word" as the feature. So, we will define this in feature template as given below:
# Unigram
U00:%x[-1,0]
# Bigram
B
U00 is unique id to identify the feature.
I will explain %x[row, column] using the following sentence that we going to train the model.
I live in Colombo
First, we need to define the sentence according to the following format. (training.data)
I O
live O
in O
Colombo Location
current word: Colombo
-1: in
0: first column (Here, I have given only one column. But new columns are added when we define more features such as POS tag)
In the above training file last column refers to the answers we give to model NE task.
So, this feature indicates the model that after the word "in", it is "likely" to find a "Location".
Now we train the model:
crf_learn template train.data model
Model file is generated using feature template and the training data file.
Inference
Now we need to know if the following sentence has any important entities such as Location.
"Enrique was born in Spain"
We need to format input file also according to the above format. (test.data)
Enrique
was
born
in
Spain
Now we use the following command to test the model.
crf_test -m model test.data
Outcome would be the following:
Enrique O
was O
born O
in O
Spain Location
Likewise, the model will give predictions on entities present in the input files based on the given features and available training data.
Note: Check the Unicode compatibility for different languages. E.g., for Sinhala Language it's UTF-7.
To load a resource file such as x.properties for program use, first thing that we would consider will be specifying the absolute file path as given below:
InputStream input = new FileInputStream("/Users/jwithanawasam/some_dir/src/main/resources/
config.properties”);
However, when ever we moved the project to another location, this path has to be changed, which is not acceptable.
FileInputStream (Relative path)
So, the next option would be to use the relative file path as given below, instead of giving absolute file path:
InputStream input = new FileInputStream("src/main/resources/config.properties”);
This approach seems to solve the above mentioned concern.
However, problem with this is the relative path is depending on the current working directory, which JVM is started. In this scenario, it is "/Users/jwithanawasam/some_dir". But, in a different deployment setting this may change, which leads to change the given relative path accordingly. Moreover, we, as developers do not have much control over JVMs current working directory.
In any of the above cases, we will get java.io.FileNotFoundException error, which is a familiar exception for most java developers.
class.getResourceAsStream
At runtime, JVM checks the class path to locate any user defined classes and packages. (In Maven, build artifacts and dependancies are stored under path given for M2_REPO class path variable. E.g., /Users/jwithanawasam/.m2/repository) The .jar file which is the deployable unit of the project will be located here.
JVM uses class loader to load java libraries specified in class path.
So, best thing we can do is load the resource specifying a path relative to its class path using class loader. Here, specified relative path will work irrespective of the actual disk location the package is deployed.
Following methods reads the file using class loader.
Usually, in Java projects resources such as configuration files, images etc. are located in src/main/resources/ path. So, if we add a resource immediately inside this folder, during packaging, the file will be located in the immediate folder in .jar file.
We can verify this using the following command to extract content of jar file:
jar xf someproject.jar
If you place the resources in another sub folder, then you have to specify the path relative to src/main/resources/ path.
So, using this approach we can load resources using relative paths in a hard disk location independent manner. Once we package the application, it is ready to be deployed anywhere, as it it is, without the overhead of having to validate resource file paths, thus improving the portability of the application.
ServletContext.getResourceAsStream for web applications
For web applications, use the following method:
ServletContext context = getServletContext();
InputStream is = context.getResourceAsStream("/filename.txt");
Here, file path is taken relative to your web application folder. (The unzipped version of the .war file)
E.g., mywebapplication.war (unzipped) will have a hierarchy similar to the following.
mywebapplication
META-INF
WEB-INF
classes
filename.txt
So, "/" means the root of this web application folder.
This method allows servlet containers to make a resource available to a servlet from any location, without using a class loader.
I want to pursue "Artificial Intelligence" as my future career. So, was thinking of a way improve my maths skills. Then I found this awesome site!
https://www.khanacademy.org
Khan academy will first give you are maths pretest to assess your skills. Based on that you will achieve some points. And then you got to do the other organized set or questions one by one. If you stuck in some question, then they will give you a hint. If that does not work, there is a video demo for you to learn the area related to that question.
I find it very effective way to learn, so try this out!
Assume you have folder named "foldername" in your local file system and you need add that folder with its content to SVN repository.
Following operation is a client side operation and it won't have any impact on the server: svn add foldername
Once you commit files using following command, files, folders within the given folder will be added to the repository: svn commit foldername
You can do both of the above given commands in one step using "svn import" as well.
Additional notes:
if you want to get rid of commit message in svn.commit.tmp, use the following command: svn commit Sensefy -F svn-commit.tmp
If you get the following error:
svn: Commit failed (details follow): svn: Could not use external editor to fetch log message; consider setting the $SVN_EDITOR environment variable or using the --message (-m) or --file (-F) options svn: None of the environment variables SVN_EDITOR, VISUAL or EDITOR are set, and no 'editor-cmd' run-time configuration option was found
use following command to set vi editor as default editor. export SVN_EDITOR=vim