Friday, July 23, 2010

Algorithmic culture and the bias of algorithms

Via Alan Jacobs, I came across a thought-provoking blog-post by Ted Striphas on "algorithmic culture."  The issue is the algorithm behind Amazon's "Popular Highlights" feature.  (In short, Amazon collects all the passages in its Kindle books that have been marked, collates this information and displays it on its website and/or on the Kindle.  So you can now see what other people have found interesting in a book and compare it with what you found interesting.)

Striphas brings up two problems, one minor, one major.  The minor one:
When Amazon uploads your passages and begins aggregating them with those of other readers, this sense of context is lost. What this means is that algorithmic culture, in its obsession with metrics and quantification, exists at least one level of abstraction beyond the acts of reading that first produced the data.
This is true but it could easily be remedied.  Kindle readers can also annotate passages in the text and if they feel like it, they could upload their annotations along with the passages they have marked.  That should supply the context of why the passages were highlighted. (Of course, this would bring up another thorny question: what algorithm to use to aggregate these annotations.) 

But he brings up another far more important point:
What I do fear, though, is the black box of algorithmic culture. We have virtually no idea of how Amazon’s Popular Highlights algorithm works, let alone who made it. All that information is proprietary, and given Amazon’s penchant for secrecy, the company is unlikely to open up about it anytime soon.
This is a very good point and it brings up what I often call the "bias" of algorithms.  Algorithms, after all, are made by people and they show all the biases that their designers put into them.  In fact, it's wrong to call them "biases" since these actually make the algorithm work!  Consider Google's search engine.  You type in a query and Google claims to return the links that you will find most "relevant."  But "relevant" here means something different from the way you use it in  your day-to-day life.  "Relevant" here means "relevant in the context of Google's algorithm" (a.k.a. PageRank). 

The problem is that this distinction is lost on people who just don't use Google all that much.  I spend a lot of time programming and Google is indispensable to me when I run into bugs.  So it is fair to say that I am something of an "expert" when it comes to using Google.  I understand that to use Google optimally, I need to use the right keywords, often the right combination of keywords along with the various operators that Google provides.  I am able to do this because:
  1. I am in the computer science business, and I have some idea of how the PageRank algorithm works (although I suspect not all that much) and 
  2. because I use Google a lot in my day-to-day life.  

I suspect that (1) isn't at all important but (2) is.  

But (2) also has a silver lining.  In his post, Striphas comments:
In the old paradigm of culture — you might call it “elite culture,” although I find the term “elite” to be so overused these days as to be almost meaningless — a small group of well-trained, trusted authorities determined not only what was worth reading, but also what within a given reading selection were the most important aspects to focus on. The basic principle is similar with algorithmic culture, which is also concerned with sorting, classifying, and hierarchizing cultural artifacts. [...] 

In the old cultural paradigm, you could question authorities about their reasons for selecting particular cultural artifacts as worthy, while dismissing or neglecting others. Not so with algorithmic culture, which wraps abstraction inside of secrecy and sells it back to you as, “the people have spoken.”
Well, yes and no.  There's a big difference between the black box of algorithms and the black box of elite preferences. Algorithms may be opaque but they are still rule-based.  You can still figure out how to use Google to your own advantage by playing with it.  For any query you give to it, Google will give the exact same response (well, for a certain period of time at least).    So you can play with it and find out what works for you and what doesn't.  The longer you play with it, the longer you use it, the more you become familiar with its features, the less opaque it seems.

Not so with what Striphas calls "elite culture," which, if anything, is far more opaque and far less amenable to this kind of trial-and-error practice.  (That's because the actions of experts aren't really rule-based.)

I am not sure where I am going with this and I am certainly not sure whether Amazon's Kindle aggregation mechanism will become as transparent as Google's search algorithm by trial-and-error but my point is that it's too soon to give up on algorithmic culture.


Postscript: My deeper worry is that when we actually reach the point when algorithms are used far more than they are now, the world will be divided into two types of people.  Those who can exploit the biases of the algorithm to make it work well for them (like I do PageRank).  And those who can't.  It's a scary thought although since I have no clue about how such a world will look like, this is still an empty worry.

No comments: