Connected Attribute Lists for Improved Collaborative Filtering

April 22nd, 2008

The following idea applies to reddit.com, and other sites using collaborative filtering.  (Such as Netflix)

I propose that collaborative filtering sites quiz users on their beliefs, feelings, opinions, about many random things, much like a dating site, or a “What X are YOU?” meme.  After quizzing a user, they can utilize their answers to build secondary attributes based on feature extraction of the questionnaire across all users.  They then find what secondary attributes correlate to primary attributes, through a feature extraction of users primary attributes utilizing the secondary attributes as the training set.

In other words, if a question asks:  “I think Ubuntu is better than OSX” and options are “Strongly agree?  Somewhat Agree?  No Opinion?  Somewhat disagree?  Strongly Disagreee?”  If I answer “somewhat agree”, then we make a comparison of the reddit.com attributes.  Say we find an attribute we name “Apple News Story Preference attribute” (Note that these attributes aren’t actually defined in the primary dataset, they are generated through feature extraction), then my result may show that I dislike apple news stories given my answer.  Or it may show I don’t hate them, because due to providing an opinion (and not strongly agreeing), I showed that I actually care about technology in general.  (Therefore the answer no opinion should be on a different dimension, and not part of the 1-5 scale applies to preference)

In contrast, some attributes will NOT have correspondences, such as the answer to whether or not you agree with the statement “I like the letter 352.”  It is arbitrary and random, and few attributes will relate to it in the primary attribute set, so therefore, stating any opinion on this quiz question provides no adjustment to the recommendation algorithm.  Note that correspondence of answers to primary set preferences will change as culture changes, so we shouldn’t compare 2008 answers to 2010, etc.

In summary, I want the recommendation algorithm to pair me with other people who have similar primary dataset attributes, and I believe we can use a secondary dataset to boost correspondences in a sparse matrix.

PS

From a user interface standpoint, one random question could appear on the sidebar every day for users to vote up or down (Either one random new one, or one random one from a set of 50).  This would give the learner a very nice, steady building, less sparse dataset, and if you wanted to decay the strength of “old opinions” in the dataset, this’d be a great method.  Basically “which half of the reddit users online today am I like? (And does that correspond to any attribute in the primary dataset)”

Gimp & Usability

April 15th, 2008

I’m really excited right now, because I just found a Gimp internal website geared towards Human-Computer Interaction for the GUI.  I think The Gimp GUI needs a lot of work.  It hasn’t changed very much, but it’s been slowly improving.  They’ve allowed some widgets to connect.  This is unexpected for me, I didn’t think the creators were interested in entertaining differing viewpoints.

The biggest Gimp flamewar always occurs when people discuss single document interface versus multiple document interface. (Note the lack of a container application in SDI)  Windows users tend to like MDI.  The Gimp programmers often argued that it was something people just liked because of windows failure of a window manager.  To me, this was always an Ad Hominem fallacy, not really addressing the real issue as to why each was better than the other.  SDI provides something important: if you “miss-clicked” an image you click the application behind it in gimp.  I find this fustrating.  The wikipedia contains strange arguments against MDI, such as “Cannot be used successfully on desktops using multiple monitors.”

Anyways, it looks like The Gimp is investigating the benefits, and hopefully they may even integrate SDI optionally into The Gimp.  In the past, developers always claimed you could get that functionality with “Gimpshop”, or deweirdifier, but these applications and plugins are outdated and hard to get working.  Personally, I wish The Gimp’s interface was similar to paint.net, I find it incredibly easy to use.

Microsoft Arghffice, docx

March 31st, 2008

So I’ve been dealing this week with the new Microsoft Office 2007.  The interface is the most usable interface I’ve ever dealt with.  It needs to be implemented horizontally across all of vista.  Apple seems to be able to manage this task, microsoft should too.

The problem I’ve been having with office is docx conversions.  Docx conversion to word 2003 doc or to rtf does not work as well as it should.  Instead of removing any features that don’t exist, it butchers features that do exist.  For example, tables are all completely malformed.  To be fair, this may be partially an issue with open office’s way of parsing Office 2007’s “office 2003 doc” type.

Microsoft’s document specification, OOXML is about to be ratified by ISO, and I’m not sure that this fact is good.  My insecurity is not borne of reasons mentioned above, but on fears of future inability for other programs to implement OOXML parsers.  Essentially, we need a standard doc type that can be parsed by any document parser.  Additions to the document should be extensions, and parsers should be able to parse the base document and output them without the extensions.

Ubuntu Community Theme Competition, Scope Change

March 10th, 2008

 Canonical is running an Ubuntu 8.10 Intrepid Ibex Theme Competition on DeviantArt.  This idea came from the digg-esque site brainstorm.  The original idea called for a general theme competition, however, they’ve decided to limit the scope of the competition to just wallpapers.  A wallpaper competition is an interesting idea, but there’s a huge interest in creating new themes for Ubuntu.  A lot of good ideas were made for Hardy Heron.

I posted a new idea to brainstorm, refuting their change in the old idea.

If I had to design the competition, the rules would be something like this:

  • Users submit a part of the look and feel of ubuntu, any part.
  • Mockups are encouraged, ranging from highly prototypical, to mockup themes, to fleshed out themes to wallpapers
    In other words, this should be a mockup competition, setting the plans of Canonica’s Ubuntu look and feel.
  • Allow Multiple kinds of winning: You could win for innovative ideas that aren’t implementable with current software, and you could win for an idea that will make it into the next ubuntu.

Many mockups wouldn’t make it into Ibex, but would potentially set a path to a more competative OS look and feel with other operating systems.

C-Sharp

February 21st, 2008

I’ve been studying C# for a hobby, and I’ve found that it is actually a lot cooler than I originally thought.  At first, I thought it was just Java, with all that and the kitchen sink thrown in.  Now I see that it is actually a fairly serious attempt at a well designed programming language that closely hugs Microsoft’s Common Language Interface.  C# does have a lot of random junk in it, but I think that it is mostly good junk, so it should be interesting to use.  If Java and C++ had a love child, C# would be it.

I am planning on using XNA to produce some output for my AI course projects.  XNA seems pretty simple, but I wish it worked on Visual Studio 2008.

Game mode for Halo3

November 13th, 2007

I just uploaded a game mode for Halo 3 called “cursed runes”, you can find it at this link, it’s pretty fun, it’s called cursed rune:

Cursed Rune is a slayer gametype that utilizes oddball’s skulls as power-ups. There are two skulls. The skull acts as a powerful melee weapon, you get instant kills while holding it, and holding it makes you much faster and somewhat more armored. Yet, for every second you hold it, you lose a score point! Do NOT hold it when you are not getting kills!

Fear not! Kills with the skull nets you 3 times as many points as a normal kill.”

Cool looking laptop

November 5th, 2007

For a while, I’ve been looking for a craptop, a laptop I can bring with me to class or to a campus cafeteria and just browse the web on, take notes, send emails, and look at slides.

Also play a couple basic 2d crappy games.

The closest thing so far for me has been the Apple Macbook, for it’s size and value, but finally Asus just came out with a PC called the Eee pc 701, notebook review put out an october editor’s choice review for it. I plan on buying an Eee pc some day in my life, but the main thing I’m worried about is that I don’t like the battery life. I’d like a PC that lasts at least 6 hours, if not more. That way, I can use it on bus trips from where I live to where my family lives.

I might go for the second generation if this thing takes off. Maybe by then they’ll have the battery life up, and expanding the screen real-estate an inch or so would be nice.

Update: Wikipedia says the Intel Merom version with lower battery consumption comes in april.  Why isn’t it the Intel Penryn (one generation after Merom) version?  I don’t know, maybe penryn is too new?…

Whoops

November 1st, 2007

Just realized the script for my little hyperlinks sub-site was totally broken. Fixed now, let me know if any links aren’t working.

Captcha Sample

October 23rd, 2007

If you see this text, either you're reading this from a system that has images off, or you loaded the captcha too many times, and must wait 15 minutes to see it again.

This dynamic image will only load 8 times, then you have to wait 15 minutes to be able to see it again. I tried to get the image to distort like facebook’s login captcha, but I need to work more on my rasterized distortion functions. The image will probably be taken down in a few days as I make more site adjustments at the site that uses it.

Most modern captcha put a line through the center of the image, to prevent more advanced algorithms from using image segmentation, I plan to do this, but not until I also get distortions working.

Captcha

October 19th, 2007

I totally just programmed a captcha system for my loopPlay Games site (unfinished). It generates an image with a garbled string of text to force new users to prove they are humans. Some important things I was considering while programming it were: Limiting the number of catchpa generation requests, Limiting the number of failed attempts at solving it, and putting restrictions on the time between captcha requests.

I haven’t implemented an email verification system for my website, and I don’t really like them, I’m sure bots can break them, and it just delays them the same way my captcha does. Yet even still, I wonder if I’d still be better off with both?

On Facebook Applications

October 16th, 2007

I’ve been playing around with facebook application development. Facebook applications can be really neat.  Yet the privacy restrictions facebook imposes are way more strict than other social networking sites.  Livejournal is less restrictive, for example.  (Although to be fair, I haven’t read livejournal’s TOS lately).

The Facebook Terms of Service make it impossible to make any application that shows one friend aggregated statistical data about your friends overall. You can look at that data yourself, but if your friends see that data, then you are breaking the TOS. In otherwords, if I want to implement a find-max-clique algorithm, I would not be breaking facebook tos by allowing the user who wants to see his max-cliques, but I would be if I made a neat little list of friends within their max-clique, and pushed it onto the website.

This is incredibly restrictive.  I’m not really sure how top friends, and friend sets, and a lot of other applications on facebook don’t break the TOS. Maybe facebook just hasn’t cracked down on them yet? I’m not saying this restriction isn’t a good thing. Honestly, I don’t want a company storing my facebook profile just because I’m friends with some jerk who installed a profile data collector.  For example, if there were some evil spyware application collecting emails. The restriction certainly hurts my ability to create viral facebook applications though.

Also, some other things of note with me, and kaddar.net:

1) I’ve been applying for some jobs

2)Because I am working on messing around with facebook applications, I upgraded the backend to this site to php5, so I’ve also been playing with that a little bit.

3) The java web start 2d game engine doesn’t have too much progress, but I’ve been working on the skeleton template for sound clip loading and playing.

Random Links Move

October 10th, 2007

I find a lot of random links to cool websites on the intarwebs, but I decided to move the links to the sidebar, so that they aren’t blog post items, and I can still keep track of them.