Algorithmen

Bestseller-Code

Foto: sirtravelalot/Shutterstock.com
Foto: sirtravelalot/Shutterstock.com

Matthew Jockers, Englischprofessor der University of Nebraska, hat gemeinsam mit Lektorin Jodie Archer das Rätsel um den Bestseller gelüftet. Mithilfe eines Algorithmus zerlegen sie Inhalte von Büchern in über 2800 Einzelteile – das nennt sich Text Mining. In ihrem Buch “The Bestseller Code“ erklären die beiden, welche Merkmale ein Buch haben muss, um zu einem Bestseller zu werden. Wir haben mit Matthew Jockers gesprochen.

Von Anna Palm und Jil Frangenberg

What is your favorite book? Is it a bestseller?

Matthew Jockers: I’ll tell you what the last book is I read. “Melmoth the Wanderer” by Charles Robert Maturin, it is an old 19th century novel, so early Gothic. It is not a bestseller, it is a classic.

But your algorithm measures the bestseller-potential, right?

That’s correct and if I were to test it on the book that I’m reading right now, it would score very badly. That is because the bestseller-ometer identifies books that are likely to be bestsellers in today’s markets – measured by examples on the New York Times Bestseller List. “Melmoth the Wanderer” is a book that was published in 1820 in a very different style, not at all what the current market desires.

How did you develop the idea of creating an algorithm to analyze the bestseller-potential of books?

For a long time I’ve been working on tools combining text analysis and machine learning. One day my former student at university and current coauthor Jodie Archer walked into my office and said: “You know, I’ve worked in publishing for a number of years and I’ve always had that suspicion that there are common patterns to books that hit the bestseller list.”
Jodie doesn’t have any technical background but she had read many bestsellers in her work as an editor and so she had intuitive ideas, she had noticed things in reading many of these books. And so the challenge for me was: Could we operationalize what she had noticed? Could we find a way to notice with computers those things that mark the bestsellers different from the mass of other books?

So what does your algorithm analyze?

The algorithm analyzes 2800 features and it begins by reading a book into its memory. Then it cuts the book apart into different ways and so it looks for example at certain features that have to do with words, it looks at other features that have to do with sentences and paragraphs, it looks at other features that have to do with themes. So it may notice that there are a lot of romantic themes in the book or a lot of people travelling to outer space. There are 500 different themes, and then it notices different styles. So it notices for example the relation of nouns and verbs.

Jockers

Gibt es einen Bestseller-Code? Matthew Jockers kommt ihm mit seinem Algorithmus zumindest ziemlich nahe. Bild: Matthew Jockers/Jodie Archer

Concerning the results of your research: What is necessary for a book to become a bestseller?

There are some themes that do very well and there are some that do very badly. Themes about monsters and goblins and trolls, they don’t do well. They may occasionally – like in the bestseller “Game of Thrones” – but they are not consistent bestsellers. The one theme we found to be the most important theme for bestsellers is something we describe as human closeness. It could be romantic love but it’s not romantic sex; it is a topic that involves human relationships, and they could be between a mother and a daughter, a husband and a wife or two friends. Bestsellers have these moments where there is a pause in the action and characters connect with each other with this sort of personal human closeness level. In “The Da Vinci Code” for example, there are situations like when Sophie and Robert are in the middle of a chase scene ¬– and then Dan Brown stops the action in order to give us a setting where the two share some intimate moments.

What phrases and words need to be part of a potential bestseller?

Well it is complicated because it has to be contextualized along with all these other features. But we analyzed for example action verbs associated with characters doing things. One of the things we found is that bestselling characters have more agency. They act in a more confident and knowing way. So for example the verbs “want” and “need” are more important in bestsellers whereas in non-bestsellers we have characters who “wish” or “desire”. To use the words “want” and “need” demonstrates the character knows what he wants, knows what he needs and is going to do things to get it. In the non-bestsellers, we have much more characters we call wishy washy. They’re wishing for things rather than acting and getting things.

Your algorithm makes a clear statement about Do’s and Dont’s. So does the algorithm endanger the creativity of the authors?

Well, that’s a good question. If you’re an author and you give your manuscript to a friend to read it – do you endanger your creativity if you make changes based on what your friend said?
I don’t think that any real author would think that the creativity is endangered. And here is why. Two Reasons: First of, I think serious authors want to get as much information as possible about their product. And sometimes they ask for that information from their friends or from their editor. They all seek feedback. Reason two: I believe in the magic of creativity. And I believe in the magic that some authors simply have some special genius. And there is nothing my computer can do to recreate that genius. My computer can see that genius but it can’t recreate it.

The New York Times Bestseller List shows the books with the highest sales rates. But can your algorithm actually measure literary quality?
I think we will never agree on a definition here but the version that we used in the book: Quality is what sells millions of copies (laughs).

So can literary quality be measured at all?

Well, yes and no. No because quality is ultimately just a subjective term. But yes we can measure this term. If you for example give me a collection of your 40 favorite books, I could develop your personal book-signal. And with this signal I could find other books that you would like.

Can you imagine that agents and editors will consult your algorithm before making a call on a manuscript?

If I am honest, I can say Jodie and I have had some of those discussions already. Neither one of us would recommend a book purely by some numbers that a machine spreads out. Even though we go through our data over and over again, we still know that there is a twenty percent rate of error.
But imagine the case of a major publishing house that receives a thousand manuscripts every month. They can’t possibly read those. And if you’re a new author and you send your book to one of these houses, what are the odds that they are actually reading it? Now imagine it went through a machine and got flagged, with a little star on it that says “89 percent”. You as an editor might say: “Oh this one has a lot of qualities like books that have been on the bestseller list. I better take a second look at it.”

How should big data technology – like your algorithm – be used in literature?

That’s what I build my whole career on, so obviously I think it is a good thing. I am not concerned about any ethical issues at this point. I don’t think that there is inherently an ethical question associated with text-mining. If you are talking about mining people’s e-mail or mining their personal information, obviously that is an ethical one.

What if the algorithm creates a situation where we lose diversity in the marketplace?

What will happen of course is that once the market becomes saturated with that theme, it will no longer be interesting so the market will evolve. This is what happened with vampires. And we’re tired of it now. You cannot keep doing the same thing over and over again – if you are an artist. But if you are a romance writer who wants to make a hack lot amount of money, then you keep writing the same story with different characters over and over again.

You do not see harmful consequences of big data technologies in literature. But how much can big data actually calculate human life?

Well we don’t know yet but this is a frontier and we keep pushing it. I am not at all afraid of that science and I can’t tell you how curious I am about this.
There is a numerical way of understanding everything. And of course that is provocative, it is scary and science-fictiony but it is also beautiful to go on search of those patterns and to discover what makes us tick.

After all this time you spent analyzing books: How different do you read them today?

On one hand nothing has changed, on the other hand everything has changed because now I read a book very conscious of the things the machine has taught me. We just published an article on how certain verbs get associated with male and female characters. So every time I see a verb in a book, I have to highlight it in either pink if it is a female inflected word or blue if it is male. And so now my book looks very colorful.

Thank you for your time!

Und das sind die Autorinnen

Anna Palm: Geschichten erfinden und Geschichten von anderen festhalten, literarisches 
und journalistisches Erzählen; beides sind für mich Herzensangelegenheiten.
Ich möchte Romanautorin und Journalistin sein. Nach meinem Volontariat beim WDR 
und meinem nun bald endenden Journalistik-Studium in Dortmund freue ich mich gerade 
am meisten auf drei Monate "Work and Travel" in
Kanada. Und danach schaue ich, wie ich meine beiden Herzensangelegenheiten kombiniert
bekomme.

Jil Frangenberg: Jahrgang 1994. Aus der Heimat im Bergischen Land ging es fürr mich zum
Journalistikstudium in den Ruhrpott nach Dortmund. Angefangen bei der Rheinischen
Post, habe ich mein Volontariat bei der Hessisch Niedersächsischen Allgemeinen
in Kassel absolviert. Ich kann mich besonders begeistern für außergewöhnliche
Geschichten, Film und Fernsehen und Schildkröten.