n-gram

Introduction:
Wikipedia defines  An n-gram as:  a subsequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application.
An n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram"; and size 4 or more is simply called an "n-gram".
Usages
n-grams are used in various areas of statistical natural language processing and genetic sequence analysis.
Examples:
Examples of word level 3-grams and 4-grams (and counts of the number of times they appeared) from the Google n-gram corpus.
  • ceramics collectables collectibles (55)
  • ceramics collectables fine (130)
  • ceramics collected by (52)
  • ceramics collectible pottery (50)
  • ceramics collectibles cooking (45)
4-grams
  • serve as the incoming (92)
  • serve as the incubator (99)
  • serve as the independent (794)
  • serve as the index (223)
  • serve as the indication (72)
  • serve as the indicator (120)
References:
1. Wikipedia: N-Gram

America's Got Talent: Impossible

There are very good links shared among Facebook friends. The links generally refer to short videos, advertisements and articles. This time I really liked one link shared by Kancho. Basically, this video is a dance, know as "America's Got Talent: Impossible". Its amazing. Enjoy !!

Page Rank Algorithm

Google uses Page Rank Algorithm (PRA) to rank web pages so that finding needle in web's haystack can be accomplished. Here is a good tutorial that explains how it finds our pages in the web.



Fun with mathematics II

We can do multiplication by means of visual ways. I have found both video and article that shows how one can compute the multiplication of ANY DIGIT NUMBERS using visual method.

A. Video
I am hundred percent sure that you will enjoy this video. Watch it first and then we discuss the principle behind this approach.

B. Article

Note: I have copied the following paragraphs and a picture from [1]. If you don't understand here, please go to that site.

Example: Multiply 22 by 13.

Note: This figure is taken from [1]

Draw 2 lines slanted upward to the right, and then move downward to the right a short distance and draw another 2 lines upward to the right (see the magenta lines in Figure 1). Then draw 1 line slanted downward to the right, and then move upward to the right a short distance and draw another 3 lines slanted downward to the right (the cyan lines in Figure 1).

Now count up the number of intersection points in each corner of the figure. The number of intersection points at left (green-shaded region) will be the first digit of the answer. Sum the number of intersection points at the top and bottom of the square (in the blue-shaded region); this will be the middle digit of the answer. The number of intersection points at right (in the yellow-shaded region) will be the last digit of the answer.

This will work to multiply any two two-digit numbers, but if any of the green, blue, gold sums have 10 or more points in them, be sure to carry the tens digit to the left, just as you would if you were adding.

C. Understanding the LOGIC
Below, I've described the way we did the multiplication in our school and if you have noticed, the same thing is happening in the visual method as well. Here it goes:

Ex1
   2 2
x 1 3
-------
   6  6
2 2  x
-------------
2 8 6

Ex2:
1 5 6
3 5 8
---------------------------------------
           8    40   48
     5  25    30   x x
 3 15 18    xx    x x
-------------------------
 5   5  8      4     8 [ while adding , carry should be propagated towards left ]

We know that two non parallel lines always meet exactly at a point. Note that when a number of lines (representing one digit number, e.g. five lines for 5 ) crosses  a number of other lines ( which represents another one digit number), then the number of  points formed by the crossings is equal to their product.

I understood the logic by referring back to my ways of doing the calculation. It may not be clear to you by my explanation. Better visit the website and read the article in such a case.

References:

  1. Su, Francis E., et al. "Squaring Quickly." Mudd Math Fun Facts

Fun with mathematics I

I found many interesting methods to solve mathematical problems. In this blog, I'll post some of them.

How to find square of a number quickly ? 
Multiplying two numbers will be easy if one of them is a multiple of 10. There is also a high probability that a person knows the squares of small numbers by heart. If we incorporate these two ideas, we can get a quicker solution to get the square of a number.
Sample problems and Solutions [1] :
It's based on the algebra identity for the difference of squares, but with a twist!
54^2 = 50 * 58 + 4^2 = 2916.
42^2 = 40 * 44 + 2^2 = 1764.
37^2 = 34 * 40 + 3^2 = 1369.
Logic
a2 = (a-b)(a+b) + b2.

How to find square of a TWO DIGIT number that ENDS in 5? 
Here again, I've taken examples from [1]. Its very easy to calculate the square of a given two digit number if it ends in 5.
Remember three things
* Two digit number
* Ends in 5
* Find square
Sample problems and Solutions
45^2 = 2025
85^2 = 7225, etc.
Logic:
If the first digit is N and the second digit is 5, then the last 2 digits of the answer will be 25, and the preceding digits will be N*(N+1).

References:
  1. Su, Francis E., et al. "Squaring Quickly." Mudd Math Fun Facts

Using Multiple Search Engines

I just found a useful website. It is really useful because it saves user's time by presenting the results of a query from two different search engines. In other words, with same input effort, one can get results from two search engines. As an example, if I want to search NEPAL in Google, I would be happy if I can get the results for NEPAL from other search engine e.g. Yahoo, on side-by-side. This would be much interesting if your screen is big enough.





Happy browsing !!

Using TreeTagger

I recently used TreeTagger to get the part-of-speech (POS) of English and French texts. As mentioned in its original website [1], TreeTagger is a language independent tool used for annotating text with part-of-speech and lemma information.

Installation is pretty easy. One just needs to follow the instructions given in the website. After the installation it will tell you something like :

You should add /home/nobal/TreeTagger/cmd and /home/nobal/TreeTagger/bin to the command search path.

And here is how you can add path (in ubuntu) :

sudo gedit /etc/bash.bashrc
at the end of file

PATH=$PATH:~/TreeTagger/bin:~/TreeTagger/cmd
export PATH


Don't forget to restart the terminal to get the effects. To verify, use this command:

echo $PATH

External Links:
[1]. TreeTagger

Lexical Similarity Measure

My last post described the Levenshtein's algorithm for finding the differences between two sequences which does so by means of "edit distance" metric.  Recently, I read an algorithm that utilizes the same metric to calculate the lexical similarity measure between two strings Li and Lj:

SM(Li, Lj ) := max(0 , (min( |Li|, |Lj| ) - ed(Li,Lj) ) / min(|Li|, |Lj|))

where

|Li| and |Lj| are the lengths of  Li and Lj ,
ed(Li, Lj) is the edit distance between Li and Lj

This method can be applied for comparing two set of strings as well. Details can be found in the paper: A. Maedche and S. Staab, "Measuring Similarity between Ontologies", Karlsruhe, Germany.

Levenshtein distance

Levenshtein distance, also known as edit distance, is a metric that measures the amount of differences between two given sequences. Wikipedia says the Levenshtein distance between two strings is given by the minimum number of operations needed to transform one string into the other, where an operation is an insertion, deletion, or substitution of a single character.

Usage: It is often used in applications that need to determine how similar, or different, two strings are, such as spell checkers.

Examples:
The Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
  1. kitten → sitten (substitution of 's' for 'k')
  2. sitten → sittin (substitution of 'i' for 'e')
  3. sittin → sitting (insert 'g' at the end).

Ontology Search Engines

I was looking for an ontology. I found that there exist many ontology search engines. Beauty of these search engines is that they only search ontologies unlike Google, and Yahoo which are general web search engines.
 
Why do we need ontology search engines?
They help to find the suitable ontologies for given user requirements so that reuse of knowledge bases can be made.

Examples
  • SWOOGLE: An ontology search engine.
  • SCARLET: Discovering relations between two concepts.
  • WATSON: Search Ontology and Semantically Related Documents

Bloggers are the sensors

I prefer to say: "BLOGGERS ARE THE SENSORS". Sensors are triggered as soon as they sense the changes in the environment; bloggers are triggered as soon as they sense popular news or stories in the society. Similarly, sensors help humankind by providing their services. A number of BLOGGERS can easily direct the society by providing their views. Think from another angle, what if sensors of a machine don't work ? Answer is simple: the machine may not work properly. Similarly, if bloggers don't blog and don't raise social issues and views, society may not be directed the way it should be.

Definition: Feed distillation task

The task of identifying the most relevant feed for a given topic or query term is  known as the “feed distillation task”.

My Technical Blog gets momentum

I decided to accelerate my technical blog too. I'd started it since my MS thesis's proposal defense. However, I couldn't manage my time to update this blog for  a certain period. Now again its time to express technical stuffs to the blogosphere's visitors. Therefore, I've commenced posting technical stuffs...

How is blogosphere growing ?

According to some estimates, “the size of the Blogosphere continues to double every six months” and there are over seventy million blogs (with many that are actively posting). However, some studies indicate that of all these blogs and feeds, the ones that really matter are relatively few.

A nice saying of Mahatma Gandhi

Last week I read an article which had quoted a saying of Mahatma Gandhi, a famous Indian politician. The apothegm was: "You may never know what results come of your action, but if you do nothing, there will be no result".

The adage is so simple but carries much meaning. It is true that we may or may not know the outcomes of our actions. As an example, we won't know what benefits we get if we invest to a firm unless we really achieve them. If we don't invest to it, what do we expect? Of course no benefit! The results provide feedbacks of our actions and thus progress can be made. In our example, based on the results of the investement i.e. revenues, we can think of investing to similar firms. This saying, therefore, encourages people not to fear with taking risks. The more you fear the lesser you accelerate !

Retrospect: Flash back to 11 August, 2006

I’m not going to talk about 9/11 because it was not 9/11. Rather it was 8/11 i.e. August 11, 2006. May be you guys have forgotten this date. If so, let us retrospect to the date. Now you already loaded the date in your memory, didn’t you? You are right - it was the day when we’d taken a flight to Thailand in order to join our master’s program at AIT. I can guess the flight was the first international flight to most of us.

It was exactly three years ago when we’d taken the flight of RNAC (now NAC), government owned national flag-carrier of Nepal, from Tribhuvan International Airport (TIA), Kathmandu to Don Muang International Airport (DMK), Bangkok. DMK was shortly closed (Sept 28, 2006) for international flights as Suvarnabhumi Airport was kicked off. Consequently, our batch became the last AIT batch from Nepal who took the flight to DMK from TIA.

We six people had taken the group ticket. To the rest, we didn’t know. We commenced introducing each other from TIA. Later we found there so many friends who were going to AIT. Here is a list:
  • Bharat Shrestha, Sailesh Bharati, Bishnu Acharya, Pratik Shrestha, Binod Chaudhari, & I
  • Sangita Sharma, Abha Baidhya, Mukta Sapkota, Lalita Thakali (?)
  • Prabal Khanal, Agni Nepal, Prakash Gupta
If we look at the statistics, the flight was carrying more than 30 % of the total students from Nepal who registered for AIT’s master program in August 2006. That was what made the flight very exciting.

Few interesting events were emerged during the flight. For example, as it was his first flight (national or international), “Kale” was quite nervous when there was turbulence in the flight :). He was babbling: ... oh God ... ... :D. I won't go further to this event as anybody can easily guess ... :). To me, the international flight was quite relaxing because there was less turbulence compared to the domestic flights.

People used to say that any first or last event is unforgettable. Our first international flight also aligns to the saying. Since then the time has made us juniors, then seniors and now alumni. In each of these stages we have got a lot of friends (national and international), experiences (both good and bad), and more importantly the thinking power (to any topic). In addition, some people knew how to cook ;) during their AIT life. Finally, no matter wherever and in what situation we are living now, the day definitely provided us a window from where we started watching the world. Hope casting an eye over this note made you remind the especial day once again. Missing all your companies and love !

***

In this occasion, I would like to dedicate a song, the song that we played in the AIT's welcome show, to all the 2006 August batch friends who took the same flight. Please check it: