Dev 007: 2013

Thursday, December 19, 2013

Topic Modeling: Infer topics for documents using Latent Dirichlet Allocation (LDA)

Introduction to Latent Dirichlet Allocation (LDA)

In LDA model, first you need to create a vocabulary on probabilistic term distribution over each topic using a set of training documents.

In a simple scenario, assume there are 2 documents in the training set and their content has following unique, important terms. (Important terms is extracted using TF vectors as I have mentioned later)

Document 1: "car", "hybrid", "Toyota"
Document 2: "birds", "parrot", "Sri Lanka"

Using the above terms, LDA creates a vocabulary on probabilistic term distribution over each topic as given below: We define that we need to form 2 topics from this training content.

Topic 1: car: 0.7, hybrid: 0.1, Toyota: 0.1, birds: 0.02, parrot: 0.03, Sri Lanka: 0.05

Topic 1: Term-Topic distribution

Topic 2: car: 0.05, hybrid: 0.03, Toyota: 0.02, birds: 0.4, parrot: 0.5, Sri Lanka: 0.1

Topic 2: Term-Topic distribution

The topic model is created based on above training data which will be later used for inference.

For a new document, you need to infer the probabilistic topic distribution over document. Assume the document content is as follows:

Document 3: "Toyota", "Prius", "Hybrid", "For sale", "2003"

For the above document, probabilistic topic distribution over document will (roughly!) be a value like this:

Topic 1: 0.99, Topic 2: 0.01

Topic distribution over the new document

So, we can use the terms in the topics with high probability (E.g., car, hybrid) as metadata for the document which can be used in different applications such as search indexing, document clustering, business analytic etc.

Pre-processing

Preparing input TF vectors

To bring out the important words within a document, we normally use TF-IDF vectors. However, in LDA, TF vectors are used instead of TF-IDF words to recognize the co-occurrence or correlation between words.

(In vector space model [VSM] it is assumed that occurrences of the words are independent of each other, but this assumption is wrong in many cases! n-gram generation is a solution for this problem)

Convert input documents to SequenceFile format

sequence file is a flat file consisting of binary key value pairs. This is used as input/ output file format for map-reduce jobs in Hadoop, which is the underlying framework which Mahout is running on.

        Configuration conf = new Configuration();
        HadoopUtil.delete(conf, new Path(infoDirectory));
        SequenceFilesFromDirectory sfd = new SequenceFilesFromDirectory();

        // input: directory contains number of text documents
        // output: the directory where the sequence files will be created
        String[] para = { "-i", targetInputDirectoryPath, "-o", sequenceFileDirectoryPath };
        sfd.run(para);

Convert sequence files to TF vectors

Configuration conf = new Configuration();

Tokenization and Analyzing

During the tokenization, document content will be split in to set of terms/tokens. Different analyzers may use different tokenizers. Stemming and removing stop words can be done and customized in this stage. Please note that both stemming and stop words are language dependent.

You can specify your own analyzer if you want, specifying on how you want the terms to be extracted. That has to be extended by the Lucene Analyzer class.

StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_43);

DocumentProcessor.tokenizeDocuments(new Path(sequenceFileinputDirectoryPath + "/" + "part-m-00000"), analyzer.getClass().asSubclass(Analyzer.class),
                new Path(infoDirectory + "/" + DocumentProcessor.TOKENIZED_DOCUMENT_OUTPUT_FOLDER), conf);
        analyzer.close();

There are couple of important parameters for generating TF vectors.

In mahout, DictionaryVectorizer class is used for TF weighting and n-gram collocation.

// Minimum frequency of the term in the entire collection to be considered as part of the dictionary file. Terms with lesser frequencies are ignored.
        int minSupport = 5;

// Maximum size of n-grams to be selected. For more information, visit: ngram collocation in Mahout
        int maxNGramSize = 2;

// Minimum log likelihood ratio (This is related to ngram collocation. Read more here.)
// This work only when maxNGramSize > 1 (Less significant ngrams have lower score here)
        float minLLRValue = 50;

// Parameters for Hadoop map reduce operations
        int reduceTasks = 1;
        int chunkSize = 200;
        boolean sequentialAccessOutput = true;

    DictionaryVectorizer.createTermFrequencyVectors(new Path(infoDirectory + DocumentProcessor.TOKENIZED_DOCUMENT_OUTPUT_FOLDER),
                new Path(infoDirectory), DictionaryVectorizer.DOCUMENT_VECTOR_OUTPUT_FOLDER, conf, minSupport, maxNGramSize, minLLRValue,
                -1.0f, false, reduceTasks, chunkSize, sequentialAccessOutput, true);

Once the TF vectors are generated for each training document, the model can be created.

Training

Generate term distribution for each topic and generate topic distribution for each training document
(Read about the CVB algorithm in mahout here.)

CVB0Driver cvbDriver = new CVB0Driver();

I will explain the parameters and how you need to assign them values. Before that you need to read the training dictionary in to memory as given below:

Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path(
                dictionaryFilePath), conf);
        Text key = new Text();
        IntWritable val = new IntWritable();
        ArrayList dictLst = new ArrayList();
        while (reader.next(key,val)) {
            System.out.println(key.toString()+" "+val.toString());
            dictLst.add(key.toString());
        }
        String[] dictionary = new String[dictLst.size()];
        dictionary = dictLst.toArray(dictionary);

Then, you have to convert vector representation of documents to a matrix, like this.
        RowIdJob rowidjob = new RowIdJob();
       String[] para = { "-i", inputVectorPath, "-o",
               TRAINING_DOCS_OUTPUTMATRIX_PATH };
       rowidjob.run(para);

Now, I will explain each parameters and factors you should consider on deciding values.

// Input path to the above created matrix using TF vectors
Path inputPath = new Path(TRAINING_DOCS_OUTPUTMATRIX_PATH + "/matrix");

// Path to save the model (Note: You may need this during inferring new documents)
Path topicModelOutputPath = new Path(TRAINING_MODEL_PATH);

// Numbe of topics (#important!) Lower value results in broader topics and higher value may result in niche topics. Optimal value for this parameter can vary depending on the given use case. Large number of topics may cause the system to slowdown.
int numTopics = 2;

// Number of terms in the training dictionary. Here's the method to read that:
private static int getNumTerms(Configuration conf, Path dictionaryPath) throws IOException {
    FileSystem fs = dictionaryPath.getFileSystem(conf);
    Text key = new Text();
    IntWritable value = new IntWritable();
    int maxTermId = -1;
    for (FileStatus stat : fs.globStatus(dictionaryPath)) {
      SequenceFile.Reader reader = new SequenceFile.Reader(fs, stat.getPath(), conf);
      while (reader.next(key, value)) {
        maxTermId = Math.max(maxTermId, value.get());
      }
      reader.close();
    }

    return maxTermId + 1;
}

int numTerms = getNumTerms(conf, new Path(TRAINING_DOCS_ROOT_PATH + "dictionary.file-0"));

// Smoothing parameters for p(topic|document) prior: This value can control how term topic likelihood is calculated for each document
        double alpha = 0.0001;
       double eta = 0.0001;
       int maxIterations = 10;
       int iterationBlockSize = 10;
       double convergenceDelta = 0;
       Path dictionaryPath = new Path(TRAINING_DOCS_ROOT_PATH + "dictionary.file-0");

// Final output path for probabilistic topic distribution training documents
       Path docTopicOutputPath = new Path(TRAINING_DOCS_TOPIC_OUTPUT_PATH);

// Temporary output path for saving models in each iteration
       Path topicModelStateTempPath = new Path(TRAINING_MODEL_TEMP_PATH);

       long randomSeed = 1;

// This is a measurement of how well a probability distribution or probability model predicts a sample. LDA is a generative model, you start with a known model and try to explain the data by refining parameters to fit the model of the data. These values can be taken to evaluate the performance.
       boolean backfillPerplexity = false;

       int numReduceTasks = 1;
       int maxItersPerDoc = 10;
       int numUpdateThreads = 1;
       int numTrainThreads = 4;
       float testFraction = 0;

       cvbDriver.run(conf, inputPath, topicModelOutputPath,
               numTopics, numTerms, alpha, eta, maxIterations, iterationBlockSize, convergenceDelta, dictionaryPath, docTopicOutputPath, topicModelStateTempPath, randomSeed, testFraction, numTrainThreads, numUpdateThreads, maxItersPerDoc, numReduceTasks, backfillPerplexity)   ;

Once this step is completed the training phase of topic modeling is over. Now, lets see how to infer new documents using the trained model.

Topic Inference for new document

To infer topic distribution for new document, you need to follow the same steps for the new document which I have mentioned earlier.

Pre-processing - stop word removal
Convert the document to sequence file format
Convert the content in the sequence file to TF vectors

There is an important step here, (Even I missed this step at the first time and got wrong results as the outcome :( )

We need to map the new document's dictionary with the training documents' dictionary and identify the common terms, that appears in both. Then, a TF vector needs to be created for the new document with the cardinality of training documents' dictionary. This is how you should do that.

        //Get the model dictionary file
                HashMap modelDictionary = new HashMap<>();
                SequenceFile.Reader reader = new SequenceFile.Reader(fs, new Path("reuters-dir/dictionary.file-0"), conf);
                Text keyModelDict = new Text();
                IntWritable valModelDict = new IntWritable();
                int cardinality = 0;
                while(reader.next(keyModelDict, valModelDict)){
                    cardinality++;
                    modelDictionary.put(keyModelDict.toString(), Integer.parseInt(valModelDict.toString()));
                }

                RandomAccessSparseVector newDocVector = new RandomAccessSparseVector(cardinality);

                reader.close();

        //Get the new document dictionary file
                ArrayList newDocDictionaryWords = new ArrayList<>();
                reader = new SequenceFile.Reader(fs, new Path("reuters-test-dir/dictionary.file-0"), conf);
                Text keyNewDict = new Text();
                IntWritable newVal = new IntWritable();
                while(reader.next(keyNewDict,newVal)){
                    System.out.println("Key: "+keyNewDict.toString()+" Val: "+newVal);
                    newDocDictionaryWords.add(keyNewDict.toString());
                }

                //Get the document frequency count of the new vector
                HashMap newDocTermFreq = new HashMap<>();
                reader = new SequenceFile.Reader(fs, new Path("reuters-test-dir/wordcount/ngrams/part-r-00000"), conf);
                Text keyTFNew = new Text();
                DoubleWritable valTFNew = new DoubleWritable();
                while(reader.next(keyTFNew, valTFNew)){
                    newDocTermFreq.put(keyTFNew.toString(), Double.parseDouble(valTFNew.toString()));
                }

                //perform the process of term frequency vector creation
                for (String string : newDocDictionaryWords) {
                    if(modelDictionary.containsKey(string)){
                        int index = modelDictionary.get(string);
                        double tf = newDocTermFreq.get(string);
                        newDocVector.set(index, tf);
                    }
                }
                System.out.println(newDocVector.asFormatString());

Read the model (Term distribution for each topic)

// Dictionary is the training dictionary

    double alpha = 0.0001; // default: doc-topic smoothing
    double eta = 0.0001; // default: term-topic smoothing
    double modelWeight = 1f;

TopicModel model = new TopicModel(conf, eta, alpha, dictionary, 1, modelWeight, TRAINING_MODEL_PATH));

Infer topic distribution for the new document

The final result, which is probabilistic topic distribution over new document will be stored in this vector
If you have a prior guess as to what the topic distribution should be, you can start with it here, instead of the uniform prior

        Vector docTopics = new DenseVector(new double[model.getNumTopics()]).assign(1.0/model.getNumTopics());

Empty matrix holding intermediate data - Term-Topic likelihoods for each term in the new document will be stored here.

        Matrix docTopicModel = new SparseRowMatrix(model.getNumTopics(), newDocVector.size());

int maxIters = 100;
        for(int i = 0; i < maxIters; i++) {
            model.trainDocTopicModel(newDocVector, docTopics, docTopicModel);
        }
    model.stop();

To be continued...

References: Mahout In Action, Wikipedia

Wednesday, December 18, 2013

How to resolve "import java.neo.file cannot be resolved" error?

Ypu will get the "import java.neo.file cannot be resolved" error with following imports:
import java.nio.file.Files;
import java.nio.file.Paths;

To resolve that do the following:
Right click on the project > Properties > Java Compiler > Set Compiler compliance level as 1.7

Refresh the project

Tuesday, December 17, 2013

Preview not responding in Mavericks

To resolve the issue do the following:

Force quit the non responding preview application
In Finder, Go > Go to folder
Type the following path in Go to the folder: ~/Library/
Check com.apple.Preview. * folders/ files in the following folders and move to trash if any:

Cache
Containers
Preferences
Saved Application State

Then, restart the computer

Thursday, December 12, 2013

Difference between Topic Modeling and Document Clustering

Topic modeling is one way of implementing clustering for a document collection. In this article, by the term "clustering" I mean a popular clustering mechanism such as K-means, fuzzy K-means etc.

So, the difference is the way how these both mechanisms have been implemented. Even though both of them returns similar type of outcome, the actual data/ knowledge embedded in the outcome can be different.

In topic modeling, each document is represented as a distribution of topics. And essentially, topic is a probability distribution over words. As opposed to topic modelling, in document clustering, cluster is composed of collection of documents. (not topics)

.. to be continued!

Issues with examples in Mahout In Action (Hello World program for clustering) with mahout 0.9

I encountered following issues and here's how I fixed them:

The method getIdentifier() is undefined for the type Cluster:

Exception in thread "main" java.io.IOException: wrong value class: org.apache.mahout.clustering.kmeans.Kluster is not interface org.apache.mahout.clustering.Cluster

Replace the error code with the following:

         SequenceFile.Writer writer
              = new SequenceFile.Writer(
                  fs, conf,      path, Text.class, Kluster.class);

            Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure());
            writer.append(new Text(cluster.getIdentifier()), cluster);

Exception in thread "main" java.io.IOException: wrong value class: 0.0: null is not class org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable

Replace import org.apache.mahout.clustering.classify.WeightedVectorWritable; with import org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable; and the replaces the related types as well.

The corrected code can be found here.

package org.apache.mahout.jaytest;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.SequenceFile;
import org.apache.hadoop.io.Text;
import org.apache.mahout.clustering.Cluster;
import org.apache.mahout.clustering.classify.WeightedPropertyVectorWritable;

import org.apache.mahout.clustering.kmeans.KMeansDriver;
import org.apache.mahout.clustering.kmeans.Kluster;
import org.apache.mahout.common.distance.EuclideanDistanceMeasure;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;

public class helloworld {

    public static final double[][] points = { {1, 1}, {2, 1}, {1, 2},
        {2, 2}, {3, 3}, {8, 8},
        {9, 8}, {8, 9}, {9, 9}};

    // Write data to sequence files in Hadoop (write the vector to sequence file)
    public static void writePointsToFile(List points, String fileName,
            FileSystem fs,
            Configuration conf) throws IOException {

                    Path path = new Path(fileName);
                    SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf,
                    path, LongWritable.class, VectorWritable.class);
                    long recNum = 0;
                    VectorWritable vec = new VectorWritable();

                    for (Vector point : points) {
                        vec.set(point);
                        writer.append(new LongWritable(recNum++), vec);
                    }

                    writer.close();
    }

    // Read the points to vector from 2D array
    public static List getPoints(double[][] raw) {
        List points = new ArrayList();
        for (int i = 0; i < raw.length; i++) {
            double[] fr = raw[i];
            Vector vec = new RandomAccessSparseVector(fr.length);
            vec.assign(fr);
            points.add(vec);
        }
        return points;
        }

    public static void main(String args[]) throws Exception {

        // specify the number of clusters
        int k = 2;

        // read the values (features) - generate vectors from input data
        List vectors = getPoints(points);

        // Create input directories for data
        File testData = new File("testdata");

        if (!testData.exists()) {
            testData.mkdir();
        }
        testData = new File("testdata/points");
        if (!testData.exists()) {
            testData.mkdir();
        }

        // Write initial centers
        Configuration conf = new Configuration();

        FileSystem fs = FileSystem.get(conf);

        // Write vectors to input directory
        writePointsToFile(vectors,
              "testdata/points/file1", fs, conf);

        Path path = new Path("testdata/clusters/part-00000");

        SequenceFile.Writer writer
              = new SequenceFile.Writer(
                  fs, conf,      path, Text.class, Kluster.class);

        for (int i = 0; i < k; i++) {
            Vector vec = vectors.get(i);

            // write the initial center here as vec
            Kluster cluster = new Kluster(vec, i, new EuclideanDistanceMeasure());
            writer.append(new Text(cluster.getIdentifier()), cluster);
        }

        writer.close();

        // Run K-means algorithm
        KMeansDriver.run(conf, new Path("testdata/points"), new Path("testdata/clusters"),
        new Path("output"), new EuclideanDistanceMeasure(),
                0.001, 10, true, 0, false);
        SequenceFile.Reader reader
              = new SequenceFile.Reader(fs,
                  new Path("output/" + Cluster.CLUSTERED_POINTS_DIR
                      + "/part-m-00000"), conf);
        IntWritable key = new IntWritable();

        // Read output values
        WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable(); while (reader.next(key, value)) {
            System.out.println(
                value.toString() + " belongs to cluster "
                    + key.toString());
        }
        reader.close();
    }

}

Thursday, November 28, 2013

Issues with using Huawei Model E1550 with Mac OSX 10.9

Use the following guide, if you are having problems using Huawei HSDPA dongle in OSX 10.9 Mavericks.

Download Mobile Partner for Mac OSX using the link below: (File name: MOBILE_CONNECT.ISO)
http://huaweifirmwares.com/download/mobile-partner-for-mac-os-x/

Double click on the MOBILE_CONNECT.ISO, and installation process will start. (P.S. Mac OS X supports ISO files natively without the need for any third-party software.)

Complete the installation process. A detailed step by step guide can be found here:
http://www.modemunlock.com/huawei-mobile-partner-for-mac-download.html

Now if you plug in the USB modem and try to connect using Mobile Partner, you will get the following error and program will crash:
Connection Terminated!

To resolve that, follow the steps given below:

Open the terminal

Type the following command:

cd /etc/ppp

Point-to-Point Protocol (PPP) is a method of connecting a computer to the Internet. Working in the data link layer of the OSI model, PPP sends the computer's TCP/IP packets to a server that puts them onto the Internet.
(Ref: http://www.webopedia.com/TERM/P/PPP.html)

you are inside the ppp folder now. type ls -l command. you should see the file named "options". Make sure the user has the required rights to edit the file. If not, use the following command to grant rights. (you need to provide admin credentials)

sudo chown options

Then edit the file using the following command:
vi options

go to 'insert' mode and type the following in that file:

+pap
-chap

+pap = Force PAP authentication (This immediately connects an incoming call to pppd, and then uses PAP (Password Authentication Protocol) to authenticate the client.)

Then click esc and save the file using :wq.

Restart the OS.

Then try again to connect using Mobile Partner.

Bluetooth file recieving failure in OSX 10.9 from other devices

To solve the issue, do the following:
System Preferences > View > Sharing > Check "Bluetooth Sharing": On

Install Maven Integration plugin (m2eclipse) for eclipse

Open Eclipse
Go to Help > Install New Software
Under available software, give the following URL as Work with: site
http://download.eclipse.org/technology/m2e/releases
Hit Enter

Application can't be opened because it is from an unidentified developer

Apple menu > System Preferences > Security and Privacy > General > Allow apps downloaded from > "Anywhere"

Monday, November 25, 2013

How to troubleshoot "Missing artifacts" issues in Maven

Check whether the artifacts given in the pom.xml file matches with the artifacts installed in the local repository (.m2/repositories).

I encountered this error once due to last update date appending to the artifact name as given below:

org.apache.stanbol.enhancer.servicesapi-0.11.0-20131021.093418-197.jar
org.apache.stanbol.enhancer.servicesapi-0.11.0-20131021.093418-197.jar.sha1
org.apache.stanbol.enhancer.servicesapi-0.11.0-20131021.093418-197.pom
org.apache.stanbol.enhancer.servicesapi-0.11.0-20131021.093418-197.pom.sha1
org.apache.stanbol.enhancer.servicesapi-0.11.0-SNAPSHOT.jar
org.apache.stanbol.enhancer.servicesapi-0.11.0-SNAPSHOT.pom

Where as pom.xml expected a different set of dependancies as given below:


      org.apache.stanbol
      org.apache.stanbol.enhancer.servicesapi
      0.11.0-SNAPSHOT


To resolve the issue, delete all the files in the local repository for the .jar file in question using following command.
rm /*

Then install the artifacts again using the goal clean install.
E.g, in Eclipse right click on the project > Run as > Maven build... > Goal "clean install" > Run

active developer path ("/Volumes/Xcode/Xcode.app/Contents/Developer") does not exist, use xcode-select to change

Try the following command:
sudo xcode-select -switch /Applications/Xcode.app/Contents/Developer

If the above command works, do not read further :)

If you get the following error,
error: invalid developer directory '/Applications/Xcode.app/Contents/Developer'

Copy the XCode.dmg to Applications directory and double click it. Then copy the XCode to Applications directory and try the above command again.

Friday, November 22, 2013

Error: JAVA_HOME is not set.

Following command will output the java installation directory
which java

Mine is, /usr/bin/java (OSX 10.9)

Then set the class path using the command given below:
export JAVA_HOME=/usr/bin/java

Thursday, November 21, 2013

Install Maven in Mac OSX 10.9 (Mavericks)

OSX 10.9 (mavericks), maven is not installed by default. So, here's how to do that.

If you haven't install homebrew, install that first unsing the following command:

Install homebrew
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go/install)"

Then, install Maven.
brew install maven

Type the below command to verify that Maven is installed successfully.
mvn -version

Tuesday, November 12, 2013

Temporal Information Extraction using OpenNLP

Friday, November 8, 2013

Create a patch using SVN for newly added file

I submitted my first svn patch as a contributor to Apache. :) Here's how I did that.

Go to the correct directory where you have done the changes.

Add the file to the repository:
svn add {filename}
E.g., svn add file1.java

First check the differences:
svn diff

Create the patch file:
svn diff > {patch name}.patch
E.g., svn diff > patchforissue01.patch

Wednesday, November 6, 2013

How to fix java.net.ConnectException: Connection refused?

A "connection refused" error happens when you attempt to open a TCP connection to an IP address / port where there is nothing currently listening for connections. Check whether the IP and the port connected is correct.

E.g., Connection to http://localhost:8765 refused

Debug Java Applications using JPDA in Eclipse

JVM tools interface defines that program being debugged should provide VM for debugging.

IDEs such as Eclipse, implement JDI (Java Debug Interface). JDI is a high level interface to define information related to remote debug requests.

-Xdebug - Enables debugging information

JDWP (Java Debug Wire protocol): format of debug information and requests (communication channel)
Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n

Xrunjdwp -
Loads JDWP in the target VM (communicate with debugger application)

transport -
The socket being used

server-
y - yes for target application to listen to a debugger to attach

address -
transport address for the connection

suspend -
n - no for target VM to be suspended until debugger application connects

Remote debugging in Eclipse

Run > Debug Configurations > Remote Java Application > Create new debug configuration and specify the port (address given above)

java.lang.NoSuchFieldError: DEF_CONTENT_CHARSET in Apache Stanbol Integration Tests

Check for multiple versions of httpcore .jar in the class path and remove if any. If there are duplicates, it reads the one which appears first in your classpath.

- Check classpath in Eclipse: Run > Run Configurations > Classpath

- Leave only project dependencies: Select the project > Edit > Edit Runtime classpath > check "Only include exported entries" > Ok > Run

Monday, November 4, 2013

Fixing java.lang.OutOfMemoryError: PermGen space error in Apache Stanbol

If you get the above error when launching the full launcher in debug mode use the following command to overcome the issue:

java -Xms256m -Xmx1024m -XX:PermSize=512m -XX:MaxPermSize=512m -Xdebug -Xrunjdwp:transport=dt_socket,address=8787,server=y,suspend=n -jar org.apache.stanbol.launchers.full-0.10.0-SNAPSHOT.jar -p 8080

Some more info. on the Java PermGen space for those who want to go beyond fixing the issue :).

This memory slot is reserved for long term objects which are not garbage collected by default. And if there is a memory leak in PermGen space, setting -Xmx won't help as this is not stored in general heap. Reason for OutOfMemory issue in PermGen is a memory leak in the Class loader.

Wednesday, June 5, 2013

TEST post for my final year project (Please Ignore)

Sale in Colombo!!! 10% off for selected items!

Don't miss this opportunity!

Venue: Town hall, Union Place
Date: 25th Jan 2013

Visit us on www.pcproducts.com

Tuesday, April 16, 2013

Out of Memory Exception in Visual Studio 2012

Check for following warning:
2. Warning 1 There was a mismatch between the processor architecture of the project being built "MSIL" and the processor architecture of the reference "CECIIR", "x86". This mismatch may cause runtime failures. Please consider changing the targeted processor architecture of your project through the Configuration Manager so as to align the processor architectures between your project and references, or take a dependency on references with a processor architecture that matches the targeted processor architecture of your project. C:\Windows\Microsoft.NET\Framework\v4.0.30319\Microsoft.Common.targets 1578

Go to Configuration manager and set target platform as "any CPU"

Monday, April 15, 2013

Unable to load DLL 'opencv_core242': The specified module could not be found. Emgu CV

Set Project's Platform target as x86(correct value) and copy all the bin/x86 native c++ dlls in EmguCV folder to output directory of the application.

Configuration.AppSettings getting null in C# class library

Default configuration for class library is the configuration file of the application that is consuming it. For example, if it is web application, then move your configurations to web.config.

A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections.

Try following! It worked for me. :)

Go to sqlservr.exe in the task manager processes
Terminate the process
Go to Services
Start SQL Server(SQL Express)
Try to connect again

Friday, April 5, 2013

The controller for path was not found or does not implement IController in WebAPI

You will get this exception when you are trying to access WebApi controller through a jquery client as Ajax request. To fix this issue, in the url property give path to controller as given below:

url: '/api/controllername/actionname'

Friday, March 15, 2013

The resource cannot be found. in VS 2012 WebAPI

Funny solution!

If you get this error when running the app from Visual Studio 2012, go back to app and make sure HTML file is not selected in the solution explorer. It worked for me! Don't know the reason behind that though.

Wednesday, March 13, 2013

Missing Entity Framework dll in VS2012

Right click references
Manage Nuget Packages
Select Entity Framework

For more info:
http://msdn.microsoft.com/en-us/data/ee712906

Wednesday, March 6, 2013

An unhandled exception of type 'Emgu.CV.Util.CvException' occurred in Emgu.CV.dll Additional information: OpenCV: Unknown array type

To fix this issue, check whether the number of columns, rows and channels of source matrix match with the destination matrix.

When converting from OpenCV to EmguCV:
CV_32FC1 means 32 bit floating point single channel matrix

Saturday, March 2, 2013

An unhandled exception of type 'Emgu.CV.Util.CvException' occurred in Emgu.CV.dll Additional information: OpenCV: type == src2.type() && src1.cols == src2.cols && (type == CV_32F || type == CV_8U) in BOWImgDescriptorExtractor

Make sure the feature extractor used to generate vocabulary and the extractor given for BOWImgDescriptorExtractor are of same type. (E.g., SIFT)

More info:
http://stackoverflow.com/questions/15181699/error-in-bowimgdescriptorextractor-in-emgucv
http://code.opencv.org/issues/1879

Friday, February 22, 2013

View Property manager in VS2010

Tools > Import and Export Settings > Reset all settings > Visual C++ Development Settings

Saturday, February 16, 2013

.mat files associated with MS Access short cut

Control Panel > Default Programs > Associate a file type or protocol with a specific program > Select .mat > Change > Matlab.exe

http://www.mathworks.in/support/solutions/en/data/1-VW0R7/

Wednesday, February 13, 2013

Load configuration settings dynamically using Matlab Eval function

Configuration script file = config.m

% Test function to call configuration setting dynamically

function test (config)

eval (config)
....

% script to call function 'test'
test('config');

Pages