If you have been a regular visitor to Solomon Says in the past, you may have noticed that the similar reviews section (for the uninitiated, on the right bar) was a little disappointing. The reviews shown as similar were all too often not similar at all. When they were, the similarity was either too general (all fantasy books are similar to all other fantasy books) or more rarely, accurate by chance. All in all, the "Similar reviews" section wasn't something you could trust.
No more! Today I have put out the first version of a recommendation system that shows genuinely similar reviews so you can discover more, and more importantly, better content. I have been surfing the site all evening, and the quality of recommendations on every page looks solid.
Now for the details. Recommendation system theories describe two kinds of systems - one that gives suggestions based on the nature of the items involved, and one that gives suggestions looking at how users interact with different items. The older recommendation generator was meant to be the second kind of system. It made the simple assumption that if a visitor goes from one review to another, the reviews were linked in some way. The nature of the link was not known, except for its existence and the assumption that it was unidirectional. So we maintained mappings of all such source-destination pairs along with a count of the number of times this transition happened. Given any review, the top n most heavily visited reviews from that page were shown as similar reviews.
The assumptions of this system are justifiable, but as it turns out, only at a very large scale. If the majority of the traffic is new visitors with a high bounce rate (as ours is), what happens is that visitors keep surfing to random reviews to explore the site instead of systematically exploring it. This creates all kinds of source-destinations mappings, most of which do not mean anything. My assumption was that large amounts of traffic will weed out the anomalies and strengthen genuinely similar relationships (it has), but this hasn't worked well enough with my current visitor stats.
The new system falls in the first category of systems described above. The approach is simple. We tag each reviewed item with its characteristics (e.g. Books might be tagged with 'epic fantasy', 'light read', 'capitalism' etc.). These tags are shared between items sharing similar characteristics. This is not too difficult since the scale is not very large and the tagging is done while creating the review itself. Some amount of discipline in creating and assigning tags suffices to maintain good item-to-tag relationships. Since an item can have arbitrarily man tags, it allows me to describe them in a fine grained way ("Modern George R. R. Martin style fantasy" instead of just "Fantasy"). All the new system has to do for a given item is to find other items that have the maximum number of tags in common with it. These are shown in the 'Similar Reviews' section.
The tags themselves can be seen at the end of each review as "Related Topics" and can be clicked to see all the items they apply to. This is further means of surfing niche corners of the website's content
To be fair, this isn't rocket science backed by vast amounts of data such as what an Amazon or a GoodReads might run. But even a lukewarm recommendation generator is better than none, and the difference in the results shown to the user is extremely striking. Content that was hitherto invisible (because nobody ever saw or read those reviews) now appears in many places. This gives the users a powerful new avenue to explore and discover content which didn't exists before. For Solomon Says, it means (hopefully) increased user retention and engagement.
As an example, check out the recommendations for "A Wizard of Earthsea". As of this writing, two belong to the same series(which is good), and others belong to four different fantasy series', all of which have something in common with "A Wizard of Earthsea" (which is fantastic). Some of these recommended reviews haven't received much traffic on the site. The new system increases their visibility in front of the visitors.
Check out the new gizmo, dear reader, and drop me a line in the comments or here about what you think.
No more! Today I have put out the first version of a recommendation system that shows genuinely similar reviews so you can discover more, and more importantly, better content. I have been surfing the site all evening, and the quality of recommendations on every page looks solid.
The assumptions of this system are justifiable, but as it turns out, only at a very large scale. If the majority of the traffic is new visitors with a high bounce rate (as ours is), what happens is that visitors keep surfing to random reviews to explore the site instead of systematically exploring it. This creates all kinds of source-destinations mappings, most of which do not mean anything. My assumption was that large amounts of traffic will weed out the anomalies and strengthen genuinely similar relationships (it has), but this hasn't worked well enough with my current visitor stats.
The new system falls in the first category of systems described above. The approach is simple. We tag each reviewed item with its characteristics (e.g. Books might be tagged with 'epic fantasy', 'light read', 'capitalism' etc.). These tags are shared between items sharing similar characteristics. This is not too difficult since the scale is not very large and the tagging is done while creating the review itself. Some amount of discipline in creating and assigning tags suffices to maintain good item-to-tag relationships. Since an item can have arbitrarily man tags, it allows me to describe them in a fine grained way ("Modern George R. R. Martin style fantasy" instead of just "Fantasy"). All the new system has to do for a given item is to find other items that have the maximum number of tags in common with it. These are shown in the 'Similar Reviews' section.
The tags themselves can be seen at the end of each review as "Related Topics" and can be clicked to see all the items they apply to. This is further means of surfing niche corners of the website's content
To be fair, this isn't rocket science backed by vast amounts of data such as what an Amazon or a GoodReads might run. But even a lukewarm recommendation generator is better than none, and the difference in the results shown to the user is extremely striking. Content that was hitherto invisible (because nobody ever saw or read those reviews) now appears in many places. This gives the users a powerful new avenue to explore and discover content which didn't exists before. For Solomon Says, it means (hopefully) increased user retention and engagement.
As an example, check out the recommendations for "A Wizard of Earthsea". As of this writing, two belong to the same series(which is good), and others belong to four different fantasy series', all of which have something in common with "A Wizard of Earthsea" (which is fantastic). Some of these recommended reviews haven't received much traffic on the site. The new system increases their visibility in front of the visitors.
Check out the new gizmo, dear reader, and drop me a line in the comments or here about what you think.
Hey Kislay,
ReplyDeleteYa tags are the most intuitive signal/feature to be used for recommendations. You might want to experiment with a couple of other features/signals in addition to tags to further fine tune the recommendations. In my opinion the following features can also be included with some weights :
1) Document similarity of two reviews. A naive way is to find the number of common words between two reviews. Similar books should have similar words.
2) A user who likes books from a single author might also like other books from the same author.
3) Keep a list of most popular books in your collection. This will be useful when you don't have enough data for a specific tag and you still want to make recommendations. This is called Cold Start problem (http://en.wikipedia.org/wiki/Recommender_system)
Thanks for the suggestions Shishir.
ReplyDelete1. I have been looking into nltk for running proper semantic correlation analysis between reviews to aid in the recommendations. But the damn thing is large and is taking a lot of time. I'd rather do that than simply count common words, but your suggestion might have some merit for v1.2. I'll try to play with it and see the results.
2. It's a good idea, but specific only to books (admittedly the majority of my content). But I think it's a better fit for a section like "Other books by this author" more than a recommendation stream.
3. This is already the default. Both the old and the new system show the most viewed reviews of the same category when they can't generate enough recommendations.