Monday, December 05, 2011

DEL-ORD

Seven years went under the bridge like time standing still...

Funny how, at times, life moves backwards and takes you to places you've been trying to run away from all this while. Perhaps, it's just life's way of reminding you of things that you've been missing and yearning for deep inside.

Tuesday, November 01, 2011

life

Guess, woke up on the wrong side of the bed today - feeling little down and out. Don't know why but I have these phases where I just feel run down - times when I look back at my life and don't feel really great about where I'm headed. I know that it's just a passing phase and soon I'll be back to normal - living life day by day. The thing about living your life in a way which doesn't excite you or keep you involved everyday is that it's just not sustainable, one day, you'll break out of it for sure. There's no point in continuing with something which has long been broken long ago. The issue is with finding out the thing that moves you, that makes you feel good but as I've always quoted - "I can't tell you what I want from life but I do know for sure what I don't want!". Maybe, I do know what I want but you can't always have what you want, can you?

Monday, September 19, 2011

Windows Phone 7–A Review

OriginalPngIt’s been more than 3 months now that I’ve been using a WP7 and I guess now I have enough idea about what I like about WP7 and what I don’t. This post is a mini-review on the review and I would not touch upon the hardware much. Also, I haven’t upgraded to Mango (WP 7.5) so I won’t talk about Mango much, though most of the stuff that I’ll be writing about would be applicable even for Mango.

Preamble – I’ve been a Windows Mobile user since quite a while and have had Windows Mobile powered devices since Windows Mobile 2003 SE. I still own quite a few Windows Mobile powered devices from Windows Mobile 5 to Windows Mobile 6.5. I have also played with iOS 4 and Android (2.3) powered devices.

The Likes -

Firstly the good things about WP7 – the OS is actually quite snappy and live tiles is a pretty cool and refreshingly different idea than almost all the other mobile devices out in the market. The lock screen on WP7 is the best with all the important stuff right there on the lock screen itself including missed calls, email/text notifications and calendar stuff. Native integration with the cloud and social networks (facebook, twitter, linkedIn) is pretty slick too.

The Dislikes or things that need to be better -

This is going be a much bigger list than the “likes” list – not because there are more things to dislike about this OS but because some of the things are simply annoying and make you feel like pulling out your hair Smile

  1. No Outlook Backup - I have been used to syncing my older WM phones with Outlook – everything on my phone was backed up including Contacts, Tasks/Calendar using Outlook. I did try backing up on the cloud using Live Mesh but somehow was more comfortable with backing up to a local computer which I use daily. The first thing that I did when I got my new WP7 phone was to connect it to the computer and heck, figured out there is no Outlook sync – so there was no way for me to get my contacts back on the phone. Quite stupid given that Outlook is still the best personal manager (email, tasks, contacts and calendar) available and I use it almost daily. I don’t think it can get any more stupid that getting your contacts synced from Facebook/Google/Hotmail takes a matter of minutes but syncing with your own personal computer is impossible. So, I had to figure out a way to sync my Outlook contacts/calendar with the cloud – in the end I had to use GO Contact Sync to sync my contacts with my Google a/c so that I can get them on my phone.
  2. No Text Message Backup - The text messages on my WM 6.5 were backed up using MS MyPhone. In WP7 there is no way to backup text messages so one hard-reset and all your messages are lost forever!
  3. Look ma no paste – I know there is copy-paste support since NoDo but what I am talking about is the paste support in the dialer app. I don’t know what the devs who disabled this support were smoking but there have been numerous instances where the phone number on a website is not recognized by the OS and the only way to currently dial such number is to write it down somewhere (or use speech) and then dial it manually. Yes, there are 3rd party apps which let you paste a number, but seriously, why can’t the native dialer app have this?
  4. No Tasks Support – Guess fixed in Mango but really silly that there is no way to sync your tasks and there is no native task app (with reminders) in WP 7.0.
  5. Text Contact Details – If you want to share a contact’s details with somebody else on WP7 then you are quite out of luck – guess sharing contacts via text messages is not considered very social any longer – you should just share them on the cloud somewhere!
  6. International Assist – Perhaps I should have RTFMed before making a phone call but this feature gave me headaches when I tried reaching a local 1-800 number and the dialler kept on adding a + before the 1 (i.e. making it an international call). A frantic Google search made me realize that it’s a feature! I hate it when some apps try to outsmart the user. Ideally, the International Assist should be turned off by default and not turned on.
  7. Mail an attachment – The only attachment that you can send from the Pocket Outlook (or whatever it’s called) is of type pictures! I know I can send Office documents using the Office hub but seriously why I can only attach pictures from the mail client is beyond me.
  8. Half Baked Call History – The call history on WP7 is like one endless list of calls without any support for fitlers like Incoming, Outgoing and Missed. I can perhaps live without the filters but there is no way in the Call history to tap on a number and view the call history log just for that number. The call history by number is important when you do get a missed call from an unknown number and you want to check if you have received any other calls from that number or not.
  9. Lack of Customization/ General UI observation – Let’s face it, users love customizing their phones – customization is what makes a phone truly yours. Unfortunately, the only changes to the appearance that you can make is altering the accent color (from a list of predefined ones) and choosing a light v/s dark theme.

    The live tiles home screen wastes a bit of real estate by showing the right arrow always. I have never clicked on the arrow ever and have always swiped to get to the app list – so not sure if that arrow is needed. Perhaps, the navigation arrow can be removed all together giving more horizontal space to the tiles. Maybe it’s just me but I somehow preferred the honeycomb UI of the 6.5 for the app list instead of an endless list (with jumplist in Mango) of WP7.

  10. Capacitive Hardware Buttons – Maybe I am nit-picking but there should be some way of at least disabling these buttons while you are playing a game like Fruit Ninja – so many times I’ve had the Bing Search open because I moved my finger accidently on the Search Icon while playing.
  11. Zune Now Playing Artist Info – The Zune Now playing integration thingie (where it pulls up artist information from the marketplace) has never worked for me even though I have an US Zune a/c with US mailing & billing address. Funnily, Zune has no issues in pulling artist information from marketplace on my computer.

That’s pretty much it – looking at the “dislikes list” it might look like that I have more to bitch about WP7 than what I like but the fact is I actually find WP7 to be quite slick and if the above inconveniences are fixed than it’s going to be quite a force to reckon with in the Mobile space.

* Image courtesy – Dell.

Saturday, July 16, 2011

Another (Elegant) Pairing Function

 

In my last post I talked about Cantor Pairing function for uniquely pairing two integers into a single number. Unfortunately, I ran into some overflow issues with that when generating pairs for very big integers like int.Max – luckily, I don’t have to deal with such large numbers in my application!

Anyway, in case you do have to deal with very large numbers and are looking at reducing them into a single number then there’s another pairing function called Elegant Pairing Function (warning pdf), which lets you reduce 2 non-negative integers to a single pair.

image

I did run into some issues with rounding off errors while generating the square root due to the precision of the double data type (Math.Sqrt takes a double) causing the Math.Ceil value to be one higher. Basically, even though the square-root of 1152921506754330623 (the paired value of int.Max & int.Max-1) is 1073741824.9999999990686774262519 (which after calling the Math.Floor would return back 1073741824), the Math.Sqrt returns it as 1073741825. Once I found out that the problem was with the Math.Sqrt, the fix was easy – given that flooring the Square-root of a number might result in rounding error by 1, if you just check for that condition where Square Of Floor Of Square Root Of Number > Number, and just decrement the Floor by 1, you can successfully calculate the elegant pair and reverse them for any non-negative integers up to Int.Max. Below is the C# code for the calculation of the Elegant Pair along with the reversal -

        static ulong ElegantPair(uint x, uint y)
{
if (x >= y)
{
return (ulong)x * x + x + y;
}
else
{
return (ulong)y * y + x;
}
}

static uint[] ElegantReverse(ulong z)
{
uint[] pair = new uint[2];
double preciseZ = Math.Sqrt(z);
ulong floor = (ulong)Math.Floor(preciseZ);
if (floor * floor > z)
{
floor
--;
}
ulong t = z - (ulong)(floor*floor);
if (t < floor)
{
pair[
0] = (uint)t;
pair[
1] = (uint)floor;
}
else
{
pair[
0] = (uint)floor;
pair[
1] = (uint)t - (uint)floor;
}
return pair;
}

Thursday, June 09, 2011

Cantor Pairing Function and Reversal

Update - In case you have to pair very large non-negative integers, do read my post on Elegant Pairing Function.

In my last post on Dice Coefficients I talked about how a nested NxN loop for finding similarity can be changed to a reducing inner loop since Similarity Score between X,Y is reversible i.e. Sim{X,Y} = Sim{Y,X}. This means if you have 26 sets in the universe (from A-Z), you would hit the same pair twice while calculating the Dice Coefficient between them. If we can somehow, store the similarity already calculated in the first iteration between two sets (say A,B), then we don’t have to re-calculate the similarity between (B,A). Hence, our nested loop can be rewritten as-
            for (int i = 0; i < universe.length; i++)
{
for (int j = i + 1; j < universe.length; j++)
{
//calc similarity between set[i] & set[j]
                }
}


By the way, for a very large value of N (universe.length), the average complexity of the above algorithm is still ~ O(N^2) but at least we have reduced the number of iterations by nearly half.


Now that we have a similarity score between two sets, we would need a mechanism to store this score and somehow tag it with the Pair for which the similarity is calculated. Also, it would be nice if the scores can be stored in a hashtable (as retrieval is on average O(1)) and make our hashtable key uniquely identify the pair for which the score is calculated. In short, we need some way to uniquely encode two docIds into a single number – enter "Cantor Pairing Function”. Pairing functions is a reversible process to uniquely encode two natural numbers into a single number. Calculating the “Cantor Pair” is quite easy but the documentation on the reversible process is a little convoluted. Anyway, below is the C# code for generating the unique number and then reversing it to get back the original numbers (for x,y>0).


static int CantorPair(short x, short y)
{
return ((x + y) * (x + y + 1)) / 2 + y;
}

static short[] Reverse(int z)
{
short[] pair = new short[2];
int t = (int)Math.Floor((-1D + Math.Sqrt(1D + 8 * z))/2D);
int x = t * (t + 3) / 2 - z;
int y = z - t * (t + 1) / 2;
pair[0] = (short)x;
pair[1] = (short)y;
return pair;
}


As you can see the CantorPair() returns back an int for two shorts to avoid any number overflows, if you are trying to generate the pair number for two ints, you would need to use long.

Saturday, June 04, 2011

Dice Coefficient–A naive Similarity engine using Set Theory

One of the ways to keep a user engaged on your site is to build some sort of recommendation engine – wherein the application automatically recommends similar “content” based on the user browsing history. One classic example of this kind of recommendation engine would be the one at amazon, which shows you a list of recommendations based on your browsing history. Another approach of doing a recommendation engine would be to suggest the user a list of “similar” content based on the current item that he’s viewing, for e.g., if you are on a online music store, and currently viewing “Dark side of the moon” album by Pink Floyd, then perhaps the application can show you list of albums similar to this for e.g.”Wish You Were Here” by Pink Floyd & “Meddle” by Led Zeppelin etc. The similarity between two items in these cases is generally calculated based on some attributes which define the items. Obviously, similarity is a relative term – so to find which items are more similar to a given item, we need to score them – the one with the higher score is “more similar” than the one with the lower score for a given item. Hence, there is a need to calculate a similarity score between two items.

We had a similar problem to solve on our site - build a similarity engine based on the categories that the items are associated with. Categories are nothing more than “tags”; so to speak; in the current web 2.0 semantics. One very naive approach to calculate similarity score between two items X, Y; which have tags Tx & Ty respectively would be to find the number of tags which are common in both, so the similarity score between (X,Y) would be -
Sim(X,Y) = |{Tx} ∩ {Ty}|
For e.g. if
{Tx} = {“music”, “rock”, “pink floyd”, “cult”} for X=”Dark Side of the moon”, where Tx = list of tags for that album.
and
{Ty} = {"music","led zeppelin","cult","rock"} for Y="Meddle", where Ty = list of tags for Meddle.
Then Sim(X,Y) = 3.

We can similarly, calculate the similarity between all the items N in our collection (if you notice it is a O(N^2) operation- more on optimizing this later) and then easily find the Top-K most similar items for a given item as generally, you would only be showing top 10 or so similar items. Below is the sample C# code snippet -

        static int Similarity(IList<string> doc1Tags, IList<string> doc2Tags)
{
HashSet
<string> tx = new HashSet<string>(doc1Tags);
HashSet
<string> ty = new HashSet<string>(doc2Tags);
tx.IntersectWith(ty);
return tx.Count;
}


One big drawback with our above similarity engine is that it is heavily biased towards items with more number of tags – as items with more tags are more likely to have higher number of common elements. The other drawback is that there is no correlation between the similarity scores of two different items, it can be any integer, making it impossible to set any kind of threshold or find do cross-similarity analysis for e.g. you might want to limit the top-k similar documents to only documents which are really similar to reduce the noise documents or you might want to know if Doc A is more similar to Doc B than Doc C is to Doc D.

A better way of finding similarity would be to length normalize the similarity score, this way a. The score is not biased towards document with more tags & b.  The similarity score is always between 0 & 1, so it’s easier to set the thresholds if required. So how do we length normalize, our scores? Below is one way of doing it, by dividing the intersection with |Tx| and |Ty|, where |Tx| = length of Tags for X. This gives us:

Sim(X,Y) = (2*|Tx ∩ Ty|)/(|Tx|+|Ty|)


This is what the Dice Coefficient is, a way of finding similarity measure between two sets. So lets change our earlier code to find Dice Coefficient:



        static float Similarity(IList<string> doc1Tags, IList<string> doc2Tags)
{
HashSet
<string> tx = new HashSet<string>(doc1Tags);
HashSet
<string> ty = new HashSet<string>(doc2Tags);
int lenTx = tx.Count;
int lenTy = ty.Count;
tx.IntersectWith(ty);
float diceCoeff = (float)(2*tx.Count)/(float)(lenTx + lenTy);
return diceCoeff;
}


This is pretty much what we did on our project apart from few minor adjustments and optimizing the entire stuff so that we don’t iterate N^2 times (hint – Sim(X,Y) = Sim(Y,X). Also, I believe the F# sets are better equipped for this instead of the HashSet<T> class that is part of the BCL as F# sets are immutable i.e. when you do an intersection with another set, you get a completely new set instead of modifying the first set in-place – this is important when you are calculating similarity in a loop as you don’t want the original set to be touched. The other thing is to use a Heap structure for storing the Top-K documents for a given document instead of storing all the similar documents and then only picking the top K similar documents. Also, the loop for calculating similarity for a given document with other n documents; is a great candidate for parallelization using the TPL – pity we are still on 3.5! Have fun!

Sunday, May 29, 2011

ASP.Net Cache and updating the value using the Indexer

What happens if you update an object already in the cache using the indexer, something like -

HttpRuntime.Cache[“someKey”] = value; ?

A. The cached object is overwritten with the new value while retaining the other information like expiration (be it absolute or sliding) and cache dependencies.

B. The cache object is overwritten and the other cache information are reset to defaults.

The correct answer is B, calling the indexer on the cache object internally calls Cache.Insert(object) which passes default values for the CacheDependency, and uses default expiration policy (i.e. the object never expires unless the server is low on memory). So the next time,  you find that your objects in cache are not honoring the cache TTLs or not getting evicted on dependency changes, make sure that you are not updating the object somewhere in the code using the indexer. I learnt it the hard-way while debugging the issue with a “forever-cached” object on our production site!

Sunday, May 15, 2011

hearbeat...

Wow, been a while since I last posted anything over here (not that I've been regular at posting somewhere else either). Guess, I've been a running a little out of steam lately when it comes to writing stuff. It's not like I don't have anything to share off-late, it's just that I don't have the energy to blog about it these days - & that pretty much sums up how the life is meandering along all this while. So this post is pretty much a beacon just to let people know that I am still alive!

Anyway, I've been working on couple of "different" stuff off-late. On our ASP.Net website we've be using the built in In-Proc Cache to cache our data-objects and when you have a pretty decent cluster size, you run into data-consistence issue apart from having to figure out how to purge caches which are not scattered all over your web-farm. I've been thinking of getting some centralized caching engine for a while and after giving MS Velocity (or whatever it's called now) & Memcached a trial, I've decided to go with memcached along with the enyim client library. We haven't rolled it out on production yet, but in our Dev environment things actually look quite good - no exceptions, good Cache-hit rates. I'll post the entire experience with memcached and integrating with asp.net in a separate blog.

The other thing that technically, I've been looking into is reverse-proxying our web servers. I compared both Squid & Varnish, and zeroed in on Varnish based on the online material available. We already have Varnish set-up before our solr-slaves (because it's quite easy to configure varnish to do that and you get about 40-50% reduction in response times) on production, and we haven't seen any issues with it. Our Varnish cache-hit rates have been actually quite low ~around 30%, which is something that we need to look at - cause to benefit from a reverse-proxy, you would want to have your cache-hits to be around 60-70% on an average. One drawback with Varnish is the documentation available on VCL (Varnish Configuration Language) but I guess if you are clear about your caching strategy, it's not that hard to write one.
Varnishing solr is actually a piece of cake (as you don't have to worry about anonymous v/s authenticated users, session cookies, persistent cookies etc) but when it comes to Varnishing your public web-servers, things can get quite tricky. I'll post in detail, the experiences & things that I learned while Varnishing our web front-ends in another post. By the way, we've been running Varnish before the IIS on our dev environment and things have been looking quite alright so far.
One last thing on the technical stuff - we've been monitoring the performances of our memcached & varnish using munin - more on this in some other post as usual. I know this is lot of Linux stuff for a .net shop but the fact is - I didn't find any good Windows alternative for the above stuff which was free and proved itself to be quite scalable.

Moving on - as some of you would know, photography has been a hobby & a stress-reliever for me. On the photography front - after quite a bit of contemplation I got myself the Cactus triggers so that I can move my Vivitar 285HV off-camera. Well, so far they have been working like a charm. The first shot that I tried; armed with my new off-camera flash capabilities was freezing the water splash - and after 54 shots in the darkness, I did get on keeper -

lemonade!

I'll post about the setup and how I went about this in another post. The other news on the photography front is that I did manage to sell on of my images on fotolia (no I am not a millionaire yet!).

Lastly, I have started playing the guitar again - it's been so many years since I last strummed so it's like I am learning anew!

Well, that pretty much sums up what I've been up to all this while, hopefully, I'll be a little bit more frequent with my posting habits.