Wednesday, November 20, 2013

A system for recommending similar reviews

If you have been a regular visitor to Solomon Says in the past, you may have noticed that the similar reviews section (for the uninitiated, on the right bar) was a little disappointing. The reviews shown as similar were all too often not similar at all. When they were, the similarity was either too general (all fantasy books are similar to all other fantasy books) or more rarely, accurate by chance. All in all, the "Similar reviews" section wasn't something you could trust.

No more! Today I have put out the first version of a recommendation system that shows genuinely similar reviews so you can discover more, and more importantly, better content. I have been surfing the site all evening, and the quality of recommendations on every page looks solid.



 Now for the details. Recommendation system theories describe two kinds of systems - one that gives suggestions based on the nature of the items involved, and one that gives suggestions looking at how users interact with different items. The older recommendation generator was meant to be the second kind of system. It made the simple assumption that if a visitor goes from one review to another, the reviews were linked in some way. The nature of the link was not known, except for its existence and the assumption that it was unidirectional. So we maintained mappings of all such source-destination pairs along with a count of the number of times this transition happened. Given any review, the top n most heavily visited reviews from that page were shown as similar reviews.

The assumptions of this system are justifiable, but as it turns out, only at a very large scale. If the majority of the traffic is new visitors with a high bounce rate (as ours is), what happens is that visitors keep surfing to random reviews to explore the site instead of systematically exploring it. This creates all kinds of source-destinations mappings, most of which do not mean anything. My assumption was that large amounts of traffic will weed out the anomalies and strengthen genuinely similar relationships (it has), but this hasn't worked well enough with my current visitor stats.

The new system falls in the first category of systems described above. The approach is simple. We tag each reviewed item with its characteristics (e.g. Books might be tagged with 'epic fantasy', 'light read', 'capitalism' etc.). These tags are shared between items sharing similar characteristics. This is not too difficult since the scale is not very large and the tagging is done while creating the review itself. Some amount of discipline in creating and assigning tags suffices to maintain good item-to-tag relationships. Since an item can have arbitrarily man tags, it allows me to describe them in a fine grained way ("Modern George R. R. Martin style fantasy" instead of just "Fantasy"). All the new system has to do for a given item is to find other items that have the maximum number of tags in common with it. These are shown in the 'Similar Reviews' section.

The tags themselves can be seen at the end of each review as "Related Topics" and can be clicked to see all the items they apply to. This is further means of surfing niche corners of the website's content


To be fair, this isn't rocket science backed by vast amounts of data such as what an Amazon or a GoodReads might run. But even a lukewarm recommendation generator is better than none, and the difference in the results shown to the user is extremely striking. Content that was hitherto invisible (because nobody ever saw or read those reviews) now appears in many places. This gives the users a powerful new avenue to explore and discover content which didn't exists before. For Solomon Says, it means (hopefully) increased user retention and engagement.

As an example, check out the recommendations for "A Wizard of Earthsea". As of this writing, two belong to the same series(which is good), and others belong to four different fantasy series', all of which have something in common with "A Wizard of Earthsea" (which is fantastic). Some of these recommended reviews haven't received much traffic on the site. The new system increases their visibility in front of the visitors.

Check out the new gizmo, dear reader, and drop me a line in the comments or here about what you think.

Thursday, May 16, 2013

YOU’VE BEEN……memcached!

Listen to this song. This is a great song.


That was in no way relevant to this post.

Further in pursuit of making SolomonSays faster, I have been looking into caching solutions for a while now. After going through a ton of blog posts, I decided to go with using memcached as a caching back-end. I started this yesterday, and owing to extreme ease of installation and use (and my own, personal awesomeness), SolomonSays today runs on memchached.
The expected benefits are:
  1. Fewer queries being run means snappier performance. This will matter more and more as the site gains visitors because Django doesn’t support database connection pooling out of the box.
  2. A direct consequence of #1 is that the load on our MYSQL data server reduces. This is pertinent because the site runs on EC2 micro instance (free tier) and computational resources are minimal.
On the LINUX production system, the process was simple as:
  1. yum install memcached
  2. memcached –d –m  128 (to run memcached as a daemon with 128MB of memory)
  3. Configure memcached as the caching backend for Django as described here.
After that it was just a matter of analyzing what needed to be cached in the application and  using the cache for this. Currently I cache popular review (for right panel of most screens), the data for building the top menu, and reviews by their ids.

The tricky bit was setting memcached up for my development environment which is Windows. As Zurmo.org mentions:
Memcache was designed with Linux in mind and not windows, so it has posed some installation issues because Windows users are not so familiar with having to compile code from source as memcache does not come with any installation software.
However, it all worked out in the end with the help of the link above and this.

Hopefully you are now experiencing a website which is faster than it was before.
Thoughts?  Still think it’s too slow? Feel free to drop me a line.

Wednesday, May 15, 2013

Solomon says : Need for Speed

Long time no see!!!

I have been away from SolomonSays for most of this year (the reasons for which will soon be discussed elsewhere). Over the last week or so, however, the mists have lifted and I have returned to the fun and games with a vengeance.

The speed of the website has been one of the biggest concerns for me over the last few months.  Speed tests at WebPageTest showed that the home page was taking ~11 seconds to load completely. Not cool at all! So this was the first order of business.


Two of the biggest sluggards on the site were:

1. The auto-completing search box – A JQuery UI autocomplete component which took the complete list of reviews as JSON input at page load. Basically _everything_  in the system was queried on each page request. To make matters worse, this was a blocking call (synchronous request) due to some other implementation issues.  So the page loading couldn’t progress till this part was complete.

Instead of tweaking my implementation of the search, I chose to replace it with Google site search. This gives a twofold benefit:
  1. All the performance overhead described above goes away.
  2. The search functionality becomes much more powerful. The older search worked if you typed the exact name of the item in it. The new component provides full blown Google search functionality.
2. The site uses a bunch of JavaScript components and loads a whole bunch of .js and .css files. Optiomization-101 says to combine them all into a single file .js and .css file. So this is what I finally did using the django-compress. I love the simplicity of usage – just put whatever you want to compress between {% compress js/css %}  and {% endcompress %}  tags and voila, you are good to go. Almost entirely non-intrusive.

Not everything worked as expected (of course):

  1. The scripts being loaded from external sources like Google, Addthis etc. are not compressed and have to be loaded as before.
  2. Some of the javascript components like TinyMCE (used for accepting user reviews) and carouFredSel (used for the scrolling image gallery in each review) didn’t like being compressed independent of the rest of their packages. So I was obliged keep them out of the great squeeze.
Even so, I am now serving 1 js file instead of 4 and 1 css file instead of 7.

Web page test now reports a complete load time of ~6 seconds. Hurray!!!