Archive for April, 2008

Connected Attribute Lists for Improved Collaborative Filtering

Tuesday, April 22nd, 2008

The following idea applies to reddit.com, and other sites using collaborative filtering.  (Such as Netflix)

I propose that collaborative filtering sites quiz users on their beliefs, feelings, opinions, about many random things, much like a dating site, or a “What X are YOU?” meme.  After quizzing a user, they can utilize their answers to build secondary attributes based on feature extraction of the questionnaire across all users.  They then find what secondary attributes correlate to primary attributes, through a feature extraction of users primary attributes utilizing the secondary attributes as the training set.

In other words, if a question asks:  “I think Ubuntu is better than OSX” and options are “Strongly agree?  Somewhat Agree?  No Opinion?  Somewhat disagree?  Strongly Disagreee?”  If I answer “somewhat agree”, then we make a comparison of the reddit.com attributes.  Say we find an attribute we name “Apple News Story Preference attribute” (Note that these attributes aren’t actually defined in the primary dataset, they are generated through feature extraction), then my result may show that I dislike apple news stories given my answer.  Or it may show I don’t hate them, because due to providing an opinion (and not strongly agreeing), I showed that I actually care about technology in general.  (Therefore the answer no opinion should be on a different dimension, and not part of the 1-5 scale applies to preference)

In contrast, some attributes will NOT have correspondences, such as the answer to whether or not you agree with the statement “I like the letter 352.”  It is arbitrary and random, and few attributes will relate to it in the primary attribute set, so therefore, stating any opinion on this quiz question provides no adjustment to the recommendation algorithm.  Note that correspondence of answers to primary set preferences will change as culture changes, so we shouldn’t compare 2008 answers to 2010, etc.

In summary, I want the recommendation algorithm to pair me with other people who have similar primary dataset attributes, and I believe we can use a secondary dataset to boost correspondences in a sparse matrix.

PS

From a user interface standpoint, one random question could appear on the sidebar every day for users to vote up or down (Either one random new one, or one random one from a set of 50).  This would give the learner a very nice, steady building, less sparse dataset, and if you wanted to decay the strength of “old opinions” in the dataset, this’d be a great method.  Basically “which half of the reddit users online today am I like? (And does that correspond to any attribute in the primary dataset)”

Gimp & Usability

Tuesday, April 15th, 2008

I’m really excited right now, because I just found a Gimp internal website geared towards Human-Computer Interaction for the GUI.  I think The Gimp GUI needs a lot of work.  It hasn’t changed very much, but it’s been slowly improving.  They’ve allowed some widgets to connect.  This is unexpected for me, I didn’t think the creators were interested in entertaining differing viewpoints.

The biggest Gimp flamewar always occurs when people discuss single document interface versus multiple document interface. (Note the lack of a container application in SDI)  Windows users tend to like MDI.  The Gimp programmers often argued that it was something people just liked because of windows failure of a window manager.  To me, this was always an Ad Hominem fallacy, not really addressing the real issue as to why each was better than the other.  SDI provides something important: if you “miss-clicked” an image you click the application behind it in gimp.  I find this fustrating.  The wikipedia contains strange arguments against MDI, such as “Cannot be used successfully on desktops using multiple monitors.”

Anyways, it looks like The Gimp is investigating the benefits, and hopefully they may even integrate SDI optionally into The Gimp.  In the past, developers always claimed you could get that functionality with “Gimpshop”, or deweirdifier, but these applications and plugins are outdated and hard to get working.  Personally, I wish The Gimp’s interface was similar to paint.net, I find it incredibly easy to use.