EagerEyes.org

Subscribe to EagerEyes.org feed
Updated: 4 days 16 hours ago

Review: Manuel Lima, The Book of Trees

Mon, 04/14/2014 - 03:57

Categories:

Visualization

Trees. They’re everywhere. And not just in the physical world, but in data visualization and knowledge representation as well. This is not a new phenomenon, it goes back thousands of years. Manuel Lima’s new book, The Book of Trees, gives an overview.

Setting Expectations

This review is an example of priming. The first time I learned of the book was when Ben Shneiderman mentioned it to me as we talked at IEEE VIS in Atlanta last year. In our conversation, he referred to it as “a coffee-table book.” I don’t think he did this on purpose, but that did set my expectations.

There are many similarities between The Book of Trees and Lima’s previous book, Visual Complexity, which I reviewed for Science two years ago . The major difference is that Lima doesn’t attempt the same taxonomy he did in Visual Complexity, and which ended up being mostly disappointing. There are also no over-the-top endorsements on the back of the book that promise way too much. The result is a book that feels more coherent and complete.

Beyond The Coffee Table

Having been primed to think of it as a coffee-table book, I did not expect a deep theoretical treatment, but lots of pictures. And that is what I got. In addition, the book has a very nice introduction that describes the importance of trees throughout all cultures and religions, both in terms of their physical uses and as metaphors for knowledge, life, etc.

There is also a short chapter titled Timeline of Significant Characters, which consists of 12 short bios, starting with Aristotle and ending with Ben Shneiderman (Ben also wrote the foreword, and the book includes many examples of treemaps). It seems a bit misplaced early in the book, and might have made more sense as an appendix.

In the introduction, Lima argues that we need to look at a much longer history of visual representations than just information visualization (and “not be overly infatuated by the work created in the last decade alone”). I agree with that. However, a clearer line could have been drawn between actual data visualizations and trees that depict ideas of structure (like Darwin’s illustration for On the Origin of Species, which did not describe the evolutionary history of any particular species, but the general idea of evolution).

Chapters

In addition to the introduction, there are eleven chapters talking about different kinds of tree diagrams:

  • Figurative Trees. These are the most tree-like in the way they are drawn, and many of the oldest examples are in this chapter. This is also the longest chapter.
  • Vertical Trees. Upside-down trees, the way they are commonly drawn in computer science. It turns out that there is quite a bit of precedent for these, going back many hundreds of years.
  • Horizontal Trees. All but one of these is drawn left-to-right, and there are a few that grow in both directions. This chapter also includes, at the very end, tree-browser concepts similar to the Mac Finder and Windows Explorer, respectively. Lima credits these to himself, which seems an odd choice.
  • Multidirectional Trees. Trees drawn in different directions are included here. The most obvious examples are the result of force-directed layouts, but there are also historical examples and also more modern hand-drawn ones (like Stephanie Posavec’s Writing Without Words)
  • Radial Trees. Trees laid out on concentric circles are a common idea in visualization for a variety of reasons. This chapter seems like a bit more of a mish-mash, because the layout within the circles can be very different, affording different ways of reading, interaction, etc.
  • Hyperbolic Trees. Giving these their own chapter is an interesting choice, because they are really a subset of multidirectional trees. This is a nod to interaction, which is otherwise missing in the book. It’s a short chapter, since hyperbolic trees never really took off (partly because they were patented), and never really proved to be all that useful.
  • Rectangular Treemaps. This is the first of three treemaps chapters. It starts with some historical precedents, though I doubt anybody would have recognized them as part of one class before Shneiderman’s paper. Then it’s treemaps: the original slice-and-dice treemap, squarified teemaps, the Map of the Market, cushion treemaps, and many examples of using treemaps for different kinds of data.
  • Voronoi Treemaps. It was a surprise to see a whole chapter on this niche treemap type, but they are of course very attractive. Surprising is also the number of examples Lima has managed to dig up.
  • Circular Treemaps. Calling these treemaps is at least a stretch, since they are not actually space-filling. In the introduction to the chapter, Lima first refers to them as space-filling, but then complains about their waste of space. I’d rather have seen a chapter on interaction than these mostly useless visualizations.
  • Sunbursts. Speaking of useless: sunburst diagrams are one of those neat ideas that don’t really work out in practice. The examples are all weak, in particular the 3D Sunburst.
  • Icicle Trees. Icicle trees are clearly more useful than sunbursts, since they are easier to label and navigate than the circular sunbursts. The latter are also arguably just icicles laid out in a circle. It’s kind of difficult to compete with the treemap, so these last two chapters feel a bit forlorn.

Each chapter has a little diagram showing how the type of visualization is constructed for a tree with one, two, and three levels. This is surprisingly effective, and similar to some of the illustrations in Isabel Meirelles’ book.

What’s Missing

Is this a visualization book? Not really. It doesn’t go into any detail on the actual techniques, doesn’t compare them, and more than half of its pages are devoted to tree diagrams that aren’t useful for visualizing data today.

It also entirely ignores interaction. The only time Lima talks about it is in the introduction to the Hyerbolic Trees chapter, where he says that these don’t appear much in print because they are useful when there is interaction (and only then, I might add), and are thus confined to “their natural digital domain.”

It’s too bad Lima didn’t venture a but further into that domain to illustrate more of the really interesting interactive tree visualization tools. Tamara Munzner’s TreeJuxtaposer is never mentioned, and neither is the SpaceTree, etc. There are many other examples of work that is missing, and I don’t think that Lima was going for completeness here. But ignoring interaction entirely seems like a big gap, even if it doesn’t lend itself that well to a printed book.

Conclusions

The book provides plenty of good material. Lima has unearthed many examples that most people likely have never seen before, both ancient and relatively recent. His introduction has also given me a new appreciation of trees as a structural metaphor. I hope that somebody will use all the examples Lima has collected for both of his books to develop a deeper understanding of the design space, beyond a list of examples.

The book succeeds as a coffee-table book, and I mean that in the best possible way. It provides a beautiful, visual overview of a large and important part of our cultural and intellectual heritage, and thus is a fantastic resource to draw inspiration from. The visualization examples are not complete, but there are many lesser-known ones that can be great starting points when researching tree visualization work – or when simply wanting to understand the history and context of tree metaphors when depicting information.

Manuel Lima, The Book of Trees. Princeton Architectural Press, 2014.

The publisher sent me a free copy of the book for this review.

Story: A Definition

Mon, 04/07/2014 - 03:52

Categories:

Visualization

What makes a story? What does a story do? In part one of this little series, I argued that stories and worlds are not opposites, but complements. In this part, I try to explain the differences between worlds and stories, and present a definition.

What Is a Story?

Lynn Cherny has written a great summary of some of the research about narrative, and in particular implied stories.

I will take the two very brief stories she quotes and use them to illustrate the difference between exploring a world and telling a story. Lynn starts with the well-known “shortest story ever” by Ernest Hemingway:

For sale: baby shoes, never worn.

It’s powerful stuff, and you can easily make up stories. Many of them may be sad, but I can actually think of positive stories too (the parents got too many shoes at a baby shower, they bought both boys and girls shoes and are selling the ones they don’t need, okay I’ll stop now).

But the point is: this is not a story. There is no narrative, no characters, no conflict, no inciting incident, no arc, nothing. It’s not a story. It’s a situation, a vignette. It’s very evocative, and that clouds people’s judgment. But look at it again and tell me where you see the story. There is none, you get to make up your own.

Now let’s look at an actual story that isn’t much longer. It’s not as exciting, but it is a story:

The baby cried. The mommy picked it up.

Now we have a story. There are characters (the baby and the mommy), there’s a narrative, there’s an inciting incident (the crying), there is action (the baby being picked up), etc. This is a story.

What is the difference? And what does this have to do with visualization?

Stories in Visualization

The Hemingway vignette is the typical way visualization works, and what Moritz Stefaner described in his posting a while ago: give people a world to explore rather than a path through it. They have to do the work themselves, but are also free to do whatever they want.

The actual story of the baby and the mommy is the equivalent of a story in visualization. It guides you. The author has done the work of building an actual narrative for you. It also means a particular point of view: the facts have been arranged to tell a particular story. You may not agree. But what you are told is not just a list of disconnected facts, but a path through them.

There are of course differences when looking at stories in journalism, for example. But that is beyond this posting, I will come back to this at some other time.

Two Definitions of Story

So here’s a definition. A  story consists of:

  • Facts. These are the atoms the story is made of. And they’re typically related and coherent, not just randomly thrown together.
  • Causal relationships. These tie the facts together. Now it’s important not to get hung up here on the exact causal mechanisms: causation can be implied, it can be hypothesized, it can be claimed. But causal relationships are crucial for a story to work – in other words: in a story, everything happens for a reason.
  • Narrative sequence. The actual telling of a story puts the facts and their interactions into some sequence. That sequence does not have to mirror the temporal ordering of the facts, in fact many stories are told out of order. But no matter the order the story is told in, the temporal and causal direction must always be clear.

Now that may seem very theoretic, but once you start looking at actual stories, you will find these elements everywhere. And they do apply to well-crafted stories about data just as they apply to traditional stories about people.

Here is another attempt, this time looking at what a story does, rather than what it is. A story

  • ties facts together. There is a reason why this particular collection of facts is in this story, and the story gives you that reason.
  • provides a narrative path through those facts. In other words, it guides the viewer/reader through the world, rather than just throwing them in there.
  • presents a particular interpretation of those facts. A story is always a particular path through a world, so it favors one way of seeing things over all others.

That last point will surely get people up in arms, but give it a moment: the point is not that the story misleads or is biased, but it’s simply one point of view. The strength of visualization is not just to give you a story, but also give you a world. If you don’t agree with the story, or if you want to explore further, you can. Take the visualization and the data and explore for yourself.

Teaser image by Bethany King, used under Creative Commons.

Stories Are Gateways Into Worlds

Mon, 03/24/2014 - 03:58

Categories:

Visualization

Moritz Stefaner recently wrote a posting titled Worlds, not stories. He basically argues that while there is a clear role for the designer of a visualization, the result should be a world that users can explore, rather than a story that they’re told. I have a few things to say about this, and will do so in two parts. This is part one.

Moritz views an audience that watches a story as mere consumers, while he wants them to be active. He has also said that visualizations don’t need punchlines. I don’t disagree with any of that. However, I think Moritz has a view of what a story is and how it interacts with the visualization that is much too narrow.

Interactive Worlds

Worlds that the user can explore are nothing new in visualization. This is the way visualization has worked since it got that name: show the user data, give them some tools to navigate around (whether in 3D space or by using filters, etc.), and let them explore.

Exploration and analysis work like that, and they’re obviously useful. There is nothing wrong with using a tool to open up such a world, or with providing such a world to a user who has a sense of what to do there.

But many times, a bit more guidance is helpful. Perhaps the user is reading a news story and has never thought about this type of data before, and doesn’t know what questions to ask or where to start exploring. Perhaps the user is a colleague who doesn’t know the specific data you’re working with and has no idea which of the fifty visualizations you’re sending him is the most important, and where to start. Perhaps the user doesn’t actually know much about visualization, so you have to provide some introduction and guidance not just for the data, but also how to read the visualization, how to interact, etc.

The Power of Story

There is no better way to illustrate what it means to tell a clear and powerful story using data than Hans Rosling’s famous TED 2007 talk. If you haven’t seen it before, watch it. If you have seen it before, and you think you know it, watch it again. It’s a revelation every time I watch it, and I have seen it dozens of times.

Notice what he does: he tells his story entirely based on numbers. There are no pictures of starving children or empty deserts. Not a single photograph! It’s amazing how powerful this story is, despite its distance and lack of a clearly identified individual (which you’d typically find in stories because we relate much better to individuals than to abstract groups).

And where does he start from? The problem is not that people haven’t heard about poverty or the difference in life expectancy. The problem is not ignorance, but preconceived ideas that are outdated and wrong. Would you explore this data if you thought you already knew what was in it? The story here grabs you, shakes you awake, and makes you pay attention even though you thought you already knew the answer.

Again, watch it. You won’t find many better uses for 18 minutes of your time.

Stories That Lead Into Worlds

It is often important to lead people into a world. Rosling does that so people question their preconceived ideas and pay attention. Journalists do that when they try to tell you about something they want you to know and care about, but which you may never have heard about. And activists and non-profits do that when they want you to pay attention to the cause they are pursuing.

Right now, we mostly get one or the other: a great story with very little exploration at the end, or an exploratory tool with little or now introduction. That makes it a bit more difficult to see where things are headed, but I am sure that we’ll soon see good examples that are strong on both ends.

Since I mentioned Hans Rosling above: gapminder World (requires Flash) actually lets you explore the data yourself, and you can take his talk as the introductory story leading you into it. It’s not quite the same, since the talk is much better than the tool, but it illustrates the idea.

Stories That Guide and Support

Stories are great vehicles to get people interested, to give them some orientation, and to guide them far enough into a world so that they can do their own exploration.

There is no contradiction between stories and exploration. Not only can they coexist, they enhance each other. The story pulls you in and gently pushes you along, the exploratory visualization lets you uncover new findings and stories yourself.

Teaser image by Anthony Albright, used under creative commons.

NewsVis.org, The Directory of News Visualizations

Mon, 03/03/2014 - 04:49

Categories:

Visualization

When I was in Portland over the holidays a few weeks ago, I noticed a visualization in the local newspaper, The Oregonian. I had never heard of that before, nor of Mark Friesen, who created it. Wondering how many visualizations I might be missing, I decided to build a website that would collect them all: newsvis.org.

There are of course the usual suspects who we all know, and who do a lot of great work: The New York Times, The Washington Post, etc. But is that it? Are there not many others that create data visualizations for journalism?

Also, it is close to impossible to find news visualizations. I remember that scatterplot-like thing showing groups of voters who were going to vote for Romney vs. McCain in the Republican primaries in 2008, but where was it? And when? For a while, The New York Times was downright hiding its graphics: you’d see them on their front page for a short time, and then you’d never be able to find them again. Too bad, you’re too late; it’s gone! This has changed, and there are now Twitter accounts and tumblrs to follow, but none of them are searchable in any reasonable way.

There are also many other questions you might ask about news visualizations. When was the first scatterplot published? How many timelines have there been about sports in the last five years? Does The Washington Post create more bar charts or line charts?

Enter NewsVis.org

NewsVis.org can’t answer all those questions quite yet, but it’s a start. The site is fairly basic right now, but in the spirit of kaizen, I have decided to publish it and start collecting material and feedback for improvements.

There are three main parts to it:

  • The front page, which lists visualizations in reverse chronologic order (by their publication date).
  • The sidebar, with filters to pick particular visualization types, media, etc.
  • The submission form – easily the most important part of the site

The sidebar is currently quite ugly, but it serves its purpose. It allows you to see how many items there are in each category (by clicking on the drop-downs), and to filter to just one or a combination of them by picking them and then hitting the Search button.

This is the part I want to replace the most, but I decided to prioritize releasing the site over redoing it. I have some ideas for what I want that to look like, but that will have to wait until after the InfoVis deadline (end of March).

Submissions

The key to making this work is the submission form. I can’t possibly populate the site with all the work out there by myself. I also depend on readers to find the hidden gems that I’m not aware of.

There is a trade-off between making this form too complicated and collecting enough data to make the site useful. While it may seem a bit overwhelming at first, it’s actually quite quick to fill out and submit a graphic.

The required information currently is the following:

  • The title of the piece
  • The byline, which is split into two parts. The first part contains a search field that has a few people already in its list. This will be expanded over time, so it will be easier to submit work by the same people. For authors who are not yet listed there, there is a separate input field. I will add all the missing names to the top field when I publish a piece.
  • Publication date. When was this published? If you can’t figure it out, a reasonable guess also works.
  • The link to the piece.
  • The medium. Similar to the above, there’s a quick search field and a field for media that are not yet listed.
  • The topic. This is a taxonomy that I’ve built fairly ad-hoc and that I intend to keep as small as possible. I will expand it if necessary, and I certainly take suggestions. But I’m not trying to build The Ultimate Taxonomy of News here.
  • The visualization technique. Same applies as above, especially since news visualizations often don’t nicely fit into particular chart types.
  • The language. This is also a bit of a proxy for the country/region. I’m still debating if it makes sense to include countries, states, regions, political bodies (European Union, etc.), continents, etc. This can easily snowball into an unwieldy mess, so I’m sticking to languages right now.
  • Interactivity. Since this is meant to provide inspiration, I also want to be able to filter to more or less interactive pieces.
  • A notes field. This is mostly to suggest things that don’t fit anywhere else (like new topics). It won’t be included in the actual published visualization page.

There is no limit on how much you can submit or whose work you submit. Submit stuff you like, or stuff you hate. Submit your own work! No reason to be shy, just submit it. You can provide a name, but there is no requirement. Provided submitter names are also not shown for now, but that might change.

Gatekeeping

The goal of this site is to be as complete as possible in a very narrowly-defined area: visualizations used in the news. I have some rules listed on the About page about what I consider news, but it’s pretty simple: if it’s published by a news medium, it’s news. If not, things get a bit more complicated and ad-hoc.

Every submission will get some loving hand-tweaking from me, and I will only publish submissions that fit the spirit of the site. I intend for this to be a high-quality site, with consistent standards for the images (cropping, resolution, etc.) and metadata. That’s really the only way to make this useful and not drown in noise.

How to Contribute and Follow

Contributing is easy: just go to the submission form and submit stuff. It’s much simpler and faster than it looks.

You can follow the site via the RSS feed and on Twitter. Both will get every new submission. Since I use the publication date of the visualization as the date of the posting, you will see items appear in the feed that seem to be coming from the past. By having just one date, I avoid confusion, and the date the item was published on newsvis isn’t really all that interesting. This also makes it much easier to always keep the list sorted in chronological order of publication date (of the original), rather than submission date.

While the visualizations are their own content type on the site, there is also a blog. Blog posts will appear in the feed and on Twitter. I don’t intend to write much there though, just notes about house-keeping and major changes or additions.

Under The Hood

I built this site using WordPress, even though Drupal is probably a more logical choice for this sort of database-centric site. After discovering Gravity Forms and seeing some documentation on Custom Post Types in WordPress, I decided to go with that, though. It wasn’t exactly a walk in the park, the WordPress documentation can easily compete with Drupal in terms of disorganization and lack of reasonable navigation. There is also an incredible amount of noise when searching for answers, with lots of people simply repeating the same bits of information but never digging any deeper. But I think overall the model is still simpler, even if also much more limited than in Drupal.

Either way, I will keep improving and growing the site, and I hope that you will find it useful and contribute!

NewsVis – The Directory of News Visualizations

The Mirrored Line Chart Is A Bad Idea

Wed, 01/29/2014 - 05:28

Categories:

Visualization

The mirrored line chart is a pet peeve of mine. It’s very common close to elections when there are two parties or candidates: one’s gains are at the other’s expense. But it becomes even more egregious when there are two categories that have to sum up to 100% by their very definition.

In her coverage of President Obama’s State of the Union address, The Guardian finance and economics editor Heidi N. Moore tweeted the following chart, which came from a report by the National Institute on Retirement Security (which, despite its official-sounding name, is a think tank):

What do the two lines here show? Or rather, what does the second line add? Nothing, that’s what. Each of the labeled pairs of values sum up to 100.0%. The two lines mirror each other exactly.

It’s obvious even without looking at the lines. The two categories here are “employer sponsors plan” and “employer does not sponsor plan” – that doesn’t exactly leave room for a third option. They either do or they do not (insert obligatory Yoda joke here).

So what is the motivation for the second line here? Why add that when it contributes nothing? My guess is that the chart simply looked too empty and uninteresting with just a single line. It’s the same reason many visualizations get overloaded with too much data. If it looks like there just isn’t enough substance, even if it shows exactly what is needed, people often feel a need to add more to make it look more serious.

But what, you might ask, does it hurt? It’s not made-up data, it’s just the other category. The problem is that it adds clutter, and that it creates the impression of a strong inverse correlation when there is none. The two categories have to sum up to 100% by definition, there is no third option.

When the data is coming from polling results, at least there are undecided voters who add a bit of interest. Though even that is often misrepresented or downright hidden. But here, there isn’t even uncertainty. It’s a simple sum of two numbers. It’s redundant information.

Data Stories Podcast: 2013 in Review, Outlook to 2014

Mon, 01/27/2014 - 05:12

Categories:

Visualization

The Data Stories podcast starts the new year with Andy Kirk and me as guests. With the hosts, Enrico Bertini and Moritz Stefaner, we discuss the major developments of 2013 and look ahead to what 2014 has in store.

You can listen to the podcast episode directly on its page, but you should really subscribe using your favorite podcast app. If you don’t have one, at least subscribe to the datastori.es feed.

Andy and I are Data Stories veterans, with Andy having appeared four times now, and this being my third time (the first one was a year ago, and Enrico and I recorded one episode at VIS). Pros that we are, we were able to seamlessly talk over some of the glitches and lost connections. Mostly.

Many of the things we discussed are also covered in my State of Information Visualization posting from two weeks ago. And while we were obviously discussing matters of great seriousness, there were also some lighter moments.

Peer Review, Part 5: The Importance of Gatekeepers

Fri, 01/24/2014 - 05:11

Categories:

Visualization

The purpose of peer review is to separate the wheat from the chaff, the good from the bad, the brilliant from the clinically insane – you get the picture. But why? Why filter and not just let anybody publish whatever they want?

Why Gatekeepers? And Why Gates?

In the old days, there was the resource argument. A journal that’s published six times a year with so many pages per issue can only print some limited number of papers. The same is true for a conference: there are only so many sessions in which to present, and printing the proceedings also puts limits on the number of papers that can be included.

In the new era of electronic proceedings, online libraries, etc., none of these is true anymore. There are no limits to paper length because it doesn’t matter if your paper is 10 pages long or 100. It also doesn’t matter if a journal issue contains 10 papers or 100. So why insist on limiting the number of papers?

The answer is simple: time. It’s the only resource we can’t make more of, and it is the one that limits what we can possibly consume. Being able to find work that has been vetted, rather than having to vet it all yourself, is hugely valuable. You can now trust that the work has at least a minimum quality level, and is not just a video of somebody’s sleeping cat.

If you want to see what kinds of stuff you end up getting when there is no gatekeeping, just take a stroll around YouTube. Most of the content there is simply awful, pointless dreck. New stuff is also added at a rate that ensures that you could never possibly watch it all, even if you wanted to. And while there is no formalized system for it, most people (like me) don’t waste their time watching random videos, because they are mostly bad. Instead, we wait for the masses of users to find the ones that are worthwhile, and post those on Twitter and other social media. But the end result is the same: somebody has to wade through the flood of crap to find the few gold nuggets.

That is also what can make reviewing so frustrating. Your job as a reviewer is to weed out the bad 75–80% of papers, so the good 20–25% will be accepted. That means that for every one good paper, you will see three to four bad ones. But the result is that the papers in the journal or conference be of a much higher quality.

Alternatives

Today, anybody can publish whatever they want, at no cost. It’s called the web. It’s accessible and easy. So why bother with the gatekeepers and their walled gardens? The answer is, again, quality and time. If you just randomly search for things and don’t pick authoritative sources, you’re likely to end up with things that are not very good.

That is not to say that there isn’t great work out there that isn’t going through peer review. Bret Victor has never published an academic paper, and his work is amazing. But he is an exception. Work does not have to be vetted to be good. But if you’re looking for good work, you end up going to the places that rigorously select and filter.

The science world is looking for some sort of middle ground, too. There are places like arXiv.org, which provides a reasonably structured (though entirely non-reviewed) place to make papers available and discuss them. PLOS ONE, which the Quilt Plots paper was published in, has less stringent reviewing criteria and tends to err on the side of accepting more, rather than less. And there are certainly more.

The issue is not that there aren’t alternatives, it’s that they mostly only work in addition to the established journals, not really as actual alternatives. A paper that is only published on arXiv will not get a lot of attention (unless somebody trustworthy spots it and can make a strong case for it).

Does the Peer Review System Work?

Yes, it does. You can always complain about this paper or that paper getting in or not getting in. But overall, it certainly works. The visualization conferences are competitive enough to produce new and interesting work every year, yet not so insanely competitive that acceptance comes down to luck. There is a lot of variety, and people who have been publishing for a long time don’t automatically get their work accepted.

Reviewers don’t always spot problems, as Stephen Few helpfully points out. But there are few published papers with egregious errors in them. Sure, there should be more rigor in reviewing, and there should be ways of retracting papers when things go wrong. We haven’t had a big plagiarism or other ethics scandal in visualization, but I doubt that we have the mechanisms in place to get such papers out of the respective digital libraries if and when they happen.

If anything, reviewers in visualization (and computer science in general) are some of the harshest and finicky reviewers around. Maria Zemankova once said, “We are computer scientists, we’re trained to find bugs!” I also remember Robert Moorehead speaking at the opening or closing sessions of Vis/VisWeek a number of years ago and asking people not to be so “cut-throat” in their reviews. It sometimes seems like a miracle that anything gets published at all.

Wrapping Up

What I hope this little series has shown is that peer review is a complex process that has many things going for it. While the paper that got me to write over 4,000 words on the topic was clearly not one of the brighter spots in the history or peer review, it’s a good system overall. There are many ways a paper can be bad, but there are also good reasons why those get submitted. I’ve glanced over many issues and details in describing the process, and there are easily ten times as many words to be written about all the issues in academic publishing, especially outside of the mostly well-functioning technical sciences.

Ultimately, the quality of the papers that get accepted are the measure of whether the process works or not. And while I’m the first one to jump on a paper I don’t like, the visualization field as a whole is producing good work and moving forward at a steady pace. But it only does that thanks to a functioning peer review system.

This is the last part of a five-part series on peer review in visualization. One posting a day was posted throughout this week. One or two further parts may follow in the coming weeks.

Teaser image by Paolo Del Signore, used under Creative Commons.

Peer Review, Part 4: Good Reasons for Bad Papers

Thu, 01/23/2014 - 04:22

Categories:

Visualization

As a reviewer, you might sometimes ask yourself why people write so many bad papers. And why they bother submitting them. I certainly do. But where do they come from? Who submits bad papers? And why? It may come as a surprise, but there are good reasons to submit bad papers for review.

To Get Feedback

This is one you will hear advisors talk about quite a bit. “I didn’t expect the paper to be accepted, but I figured we’d get some useful feedback!” It’s not unreasonable to do this, though the question clearly is: when you think there is actually value in the feedback you’ll likely get from reviewers?

If you know that the paper is unfinished or even bad work, it makes no sense to send it in. But when you’re looking for direction or want to get an idea if you’re on the right track, reviews can be helpful. Of course, they can also be brutal.

The key is to act on the feedback, not just send the same bad or unfinished paper to another conference. I’ve seen people shop papers around without making any significant changes. In a field like visualization, where there is a lot of overlap in reviewers between different conferences, and between conferences and journals, this can bite you in the ass. I have no patience with people who do this, because they just waste the reviewers’ time. Respect the reviewers and the reviews, and work on your paper before you submit it again!

To Get It Out

There’s only so much you can do as an advisor to get a student to improve a paper before the deadline, short of just rewriting the whole thing. At some point, either the deadline is here, or you just decide that the remaining improvements are not worth the time. So send it in and hope for the best.

That’s not the best reason to submit a paper, to be sure, but it can help get a student back on track. It also provides him or her with a sense of accomplishment – even if that may be short-lived, only to be crushed by the negative reviews a few weeks later.

The realities of conference deadlines also contribute to this. A deadline can be a great motivator to get a lot of work done in a short amount of time, or to finish that project that has been dormant for months. But they also lead to a lack of reflection and time for refinement, which can sometimes be quite obvious. I once reviewed a paper where the first half was really well written, but the results, discussion, and conclusions had tons of typos, factual mistakes, and even notes the authors had put in for themselves to rewrite some parts. That paper did not get accepted.

To Just Give It A Shot

In theory, a paper should be submitted when it’s ready and its authors think it can’t be improved anymore. But the reality is that most people chase deadlines and need to get papers published at some reasonable rate if they want to advance in their jobs (or be considered relevant).

There are also small pieces of work that would be nice to publish: a class project, a master’s thesis, etc. Those are often overlooked in visualization, where it can be hard to find a place for something small but worthwhile. The short papers track at EuroVis might help with that, and there are always posters. A small contribution does not mean a bad paper, but the risk is certainly higher that it might be considered trivial.

Where to Draw the Line

The above was written with the assumption that the paper is (mostly) written by a student, and the advisor has a good sense of the quality of the paper. That might not always be the case, but I think it often is.

Getting work rejected is no fun, even when it is expected. There is a certain social contract between authors and reviewers though that should require authors to be very careful not to submit work they don’t reasonably believe could be accepted. If it’s obviously bad, don’t submit it.

Unfortunately, there is no good way to get feedback on a paper before it is ready. Sending it to colleagues might work, but they might never read it. And they will be much nicer when asked to review a paper for a colleague (or colleague’s student) than when writing an anonymous review.

Ultimately, there are still valid, if not always very good, reasons to submit papers that the authors know aren’t good enough.

This is part of a five-part series on peer review in visualization. One posting a day will be posted throughout this week.

Teaser image by bubbletea1, used under Creative Commons.

Peer Review, Part 3: A Taxonomy of Bad Papers

Wed, 01/22/2014 - 05:40

Categories:

Visualization

Reviewing is great when you get a good paper where you can make some suggestions to make it even better, and everybody’s happy. Bad papers are much less fun, but they are also much more common. Here are some examples I’ve seen and that I keep seeing.

  • The completely insane. I once got a paper to review that was two pages long, with the second page not even used completely. There was a table on the second page, and the rest was completely unintelligible writing. Tamara Munzner likes to refer to such papers as “written in crayon” – a mental image I find particularly amusing. The good thing about these is that they don’t usually take much time to spot and reject.
  • The deeply flawed study. These are becoming more and more common as more people are doing studies (user studies, perceptual studies, etc.). While more studies are a good thing, they need to be done with care. I’ve seen quite a few papers lately where the study design is terrible, or where the study doesn’t actually prove the claim.
  • The marketing fluff piece. There are actually several variants of this. One is essentially a marketing piece about some piece of software that is disguised (often badly) as a systems or application paper. There’s nothing wrong with a good application paper (in fact, there should be more), but there needs to be content and a contribution. Also, the language needs to be a bit more academic and a bit less over the top.
  • The complete lack of details fluff piece. Another variation of this is the one where everything is written with such a complete lack of any specific details, that it’s impossible to know if any of it is real or not. Like a wet bar of soap, the paper slips through your fingers and won’t let you get a clear grasp on what it’s trying to tell you. These can be maddening, because they make you feel like you’re too dumb to get them, when it’s really the paper’s fault. Authors sometimes don’t want to be too specific when they’re not sure or they’re afraid that they’ve made a bad decision – but trying to paper over these issues just makes them a bigger target for reviewers.
  • The bait-and-switch. Make promises in the title, abstract, and introduction that you then don’t end up keeping: that’s a major no-no, and a guarantee for bad reviews. You’re setting expectations, and you get people excited. If you then don’t deliver, you really only have yourself to blame for getting rejected. Things change as a paper gets written and evolves, but that needs to be reflected throughout the paper.
  • The reinvention of the wheel. This happens more than you’d think. The Quilt Plots paper is just a particularly egregious example, buy there are many more. Sometimes, people get carried away working on something without realizing that it has been done, or the project changes direction in a way that leads them down the same path somebody else has gone before. And sometimes, people think they have invented something new for a field they’re not familiar with, and want to publish that work there. But none of these reasons make the work acceptable.
  • The math graveyard. This is less common in information visualization, though it does happen: a paper filled with lots of gratuitous math that not only doesn’t help explain much, but makes the whole thing much harder to read. There is a lot more math in scientific visualization papers, and there it is also usually more justified. Sometimes, however, authors think they will look smart if everything is expressed in equations and mathematical symbols. That doesn’t work though, people see through that and will call you out on it.
  • The good idea. I find myself writing lots of reviews that say: good idea, bad paper! Many ideas start with good questions, but then veer off somewhere insane or useless or the study doesn’t work, etc. These are the most disappointing reviews to write, but they are also incredibly common. Take a break from paper writing, get some fresh air, and then look at your paper as if it were written by somebody else: does it do what you think it does?

Some of these are easier to avoid than others. I totally get the bait-and-switch, for example: you’re trying to do something big and awesome, but you just didn’t get it done in time, or your study didn’t give you the results you had hoped for. But then you need to go back and make sure the different parts of your paper fit together. Or do more work.

But the key is that bad papers are not usually the result of malice, but are often the result of a lack of judgment and quality control. There are also good reasons to submit bad papers, which I will get to in the next installment.

This is part of a five-part series on peer review in visualization. One posting a day will be posted throughout this week.

Teaser image by Troy Tolley, used under Creative Commons.

Peer Review, Part 2: How It Works

Tue, 01/21/2014 - 05:12

Categories:

Visualization

Peer review is one of the central pillars of academic publishing. But how does it actually work? What is blind review, and what is it good for? This part will answer those questions, and then tell you how to be a good reviewer yourself.

The Process

The basic process is this: you have some work written up that you want to publish at a conference or in a journal. In some fields, you might submit an abstract to a conference, but in visualization, they all want full papers. So you submit your paper, usually through some sort of review management system that lets you upload a PDF, potentially supporting files (video, datasets, etc.), and fill in some data like the kind of work you’re presenting (research vs. application vs. study, etc.).

The paper goes through some minimal checks and is then sent out to reviewers. In the case of a journal, an associate editor picks those reviewers and asks them for the review. In the case of a conference, this is done by the papers chairs. For the VIS conferences and EuroVis, there is an extra layer here, where the papers chairs pick two reviewers for each paper (members of the program committee), who then each pick one additional reviewer. This helps offload some of the work, since the papers chairs have to organize reviews for hundreds of papers in a short amount of time.

The result is that a paper is usually reviewed by three to five reviewers. Some low-quality conferences make do with fewer, but this number is typical. In some cases, there might also be more, for example when outside reviewers are asked because a paper makes claims the usual reviewers don’t feel comfortable judging (e.g., in a particular application domain, field of science, etc.)

Once the reviews come back, the associate editor or primary reviewer make a recommendation that the editor or papers chairs can follow or not. They typically do, though in the case of a conference there can be some back and forth when trying to balance the number of papers accepted with the demands of the program.

There are basically two outcomes: accept the paper or reject it (journals can also require revisions). The way the decision is made is democratic in a way, because the reviewers weigh in, but is ultimately up to the editor or papers chair (who are senior people in the field).

Blind and Double-Blind Review

Almost all reviews are blind, which means that as the paper author, you don’t know who is reviewing your papers. There is also double-blind review, which means that the reviewers don’t know who wrote the paper, either. To do this, you remove all identifying information from the submitted paper. While that’s obvious for things like the list of authors, it becomes a bit tricky when referencing your own prior work. Some people simply ignore this, and it’s generally not required in visualization (i.e., you are free to no do it if you don’t care).

The goal of “blinding” the reviewers is that it helps judge the work on its own merits, rather than having the reviewers be biased by who wrote the paper, or the institution they are from. That bias can be both positive and negative, and neither is good. Since you don’t know who’s reviewing your work, you can’t retaliate against bad reviews, either. This is not a big issue in visualization, where most people are fair and reasonable, but it certainly helps keep things a bit more objective.

Of course, if you know your way around the field and know the people, you can often tell who was likely involved in a particular paper you’re reviewing. You might even find some hints who some of the reviewers are by looking at their style or if they suggest you read lots of papers by a particular author. But in general, there aren’t enough clues, and that’s a good thing.

The Peers

The reason for peer reviewing is that there is no central authority to decide which work is good and which isn’t. You can’t leave it to some arcane committee of the elders of the field or similar, that would be bad (plus they want to publish too!). Instead, every member of a particular field is expected to not just publish papers, but also assist in the review process.

Not having a single person (or a small group) make all the decisions is clearly a good thing, and it works better in some areas than in others. In visualization, the main conferences change papers chairs every year (they’re on two-year terms, so there’s always a new one and one who has done it before), and program committee members get rotated off after three years or so. Journal associate editors and editors also have limited terms. As a result, the field is able to change and evolve, with new people moving up the ranks and taking over the reigns.

Being A Reviewer

Anybody can sign up to be a reviewer, though don’t expect to be given work to review if nobody knows you. You typically have to have something published before people will have a sense of what you do and will ask you to do reviews. Once you’ve done a few, you will get more requests than you probably want, though.

It’s important to be realistic about reviewing workload and to be able to say no. Decide on a number of reviews you’re willing to take on per year or per semester, and then simply say no when you get more. The key is to say no right away, rather than sit on a request forever. By saying no, you give the associate editor or primary reviewer a clear signal that he or she has to look for another reviewer. It’s not a big deal, people say no all the time. But if you drag the decision out because you don’t want to do the review but also don’t want to say no, you’re just wasting everybody’s time. So make a quick decision and click the appropriate link in the email.

Reviewing can be interesting and it can be annoying. But either way, it’s an important part of the scientific process. Without peer review, there is no way to judge the quality of work, and to decide which papers are worthy of publication, and which ones are not.

This is part of a five-part series on peer review in visualization. One posting a day will be posted throughout this week.

Teaser image by Dirk Schaefer, used under Creative Commons.

Peer Review, Part 1: Quilt Plots

Mon, 01/20/2014 - 05:14

Categories:

Visualization

What is peer review? How does it work? And is it really as flawed as people claim it is? In this little series, I will talk about all that, and then end up arguing that peer review does, in fact, work – at least in visualization. But first an example where it didn’t.

A paper made the rounds last week for its poor quality: Quilt Plots: A Simple Tool for the Visualisation of Large Epidemiological Data. It was peer-reviewed and accepted by an editor at PLOS ONE, which is an online science journal (covering all the sciences). PLOS ONE is an open-access online journal with the goal of publishing work faster and without trying to assess importance (which is difficult). That’s not a bad idea in principle, but this example shows that their rigorous peer review might need some work.

The paper simply presents a way to create a heatmap. It’s not just that the reviewers should be expected to know what a heatmap is, they should also see through the odd way the whole thing is argued: the heatmap function the authors were using in R had too many options, so they stripped out the dendrogram and clustering, and presented just the color-coded table as a new thing.

How anybody could think this was a valid contribution is beyond me. You can do this in Excel or Tableau with a few clicks, and it’s pretty easy even in R. What’s even more annoying is that the authors provide their implementation of their “technique” as R code – as screenshots inside a Word document.

This would be okay as a posting to an R mailing list perhaps, or as a short blog posting. And those are perfectly valid ways of publishing this sort of thing, without having to go through review. The point of peer review is to filter out the bad, nonsensical, and trivial stuff, so that you can expect to find good work when reading a journal or conference proceedings. It doesn’t always work, but it mostly does.

This is part of a five-part series on peer review in visualization. One posting a day will be posted throughout this week.

The State of Information Visualization, 2014

Mon, 01/13/2014 - 05:22

Categories:

Visualization

2013 was another exciting year for visualization. Between many new developments in data storytelling, a new wave of news graphics, new visualization blogs, better automated infographics, and visuals designed to hit you hard, it is difficult to decide what was most important. Here is a look back, and some ideas about where we’re going.

Storytelling

If there was a topic that clearly left its mark on 2013, it’s storytelling. Coming from me, this may not be a big surprise – after all, I predicted this at the beginning of the year. But if you’re a doubter, 2013 gave you lots of reason to doubt your, um, doubts.

First, the academic world. There were a number of papers at InfoVis on storytelling and related topics. Memorability, narrative, and real-world scales all made an appearance. I believe this was also the first time a session was actually named Storytelling in InfoVis. And who could forget the paper I wrote with Jock Mackinlay for IEEE Computer: Storytelling: The Next Step in Visualization? That paper has just been republished in a special issue of IEEE Computing Now on current trends in visualization, which also includes four other great papers (including Mike Bostock and Jeff Heer’s paper on D3). There is also a video of me talking about storytelling from an industry perspective and demoing story points.

Will this continue? Yes, of course! I know that there are a number of papers on storytelling under review for EuroVis right now, and I have no doubt that there will be a good number of submissions on the topic to InfoVis and VAST. Alberto Cairo also just announced that he will be the keynote speaker at IEEE VIS in Paris in November, which should be very interesting. No pressure, Alberto!

Then, there are the conferences. 2013 saw the first Tapestry conference, the first conference specific to storytelling with data; its second incarnation is just over a month away. Others have picked up on the topic, for example all the speakers at Visualized are now storytellers, and the new OpenVis conference led with Amanda Cox last year and will have Mike Bostock headlining this year. It’s great to see the cross-pollination between visualization people, designers, and news graphics folks. This is going to lead to many exciting new things.

And finally, products. Last year saw the introduction of the GEDViz storytelling tool which, while limited in scope and not a commercial product, is certainly pointing in the right direction. Tableau also announced the Story Points feature, which will be part of the upcoming 8.2 release. The competition isn’t asleep either, many of them have announced storytelling features for upcoming releases. Not all of them really are storytelling features in the way I understand the term, but the word is certainly being thrown around a lot.

I don’t see any of this losing steam, quite the opposite. 2014 is the year when a large number of people will have access to these new tools for the first time, and will start building stories. That is a qualitatively new thing, and it will be exciting to see what people end up creating.

Automated Infographics

An idea that I first saw a few years ago really took off in 2013: automated infographics. Rather than just visualizing the data on a sort of dashboard, why not make an information graphic with nice production values and fill in people’s own data? This was the original idea behind Visual.ly, and the possibilities are endless. Vizify creates a multi-page information graphic from your LinkedIn and other social network data, WordPress sent out a custom annual report to users of wordpress.com and Jetpack (for self-hosted blogs), Google created a personal video for Google+ users, etc.

WordPress’ annual report is quite neat. It uses the common scrolling format to show what are essentially pages of information: overall stats, the most popular postings, search terms, commenters, etc. at the top of the page (and behind the information pages as you scroll down), animated fireworks represent the blog postings, etc. A lot of work went into this, and it looks great. Many companies would kill for an annual report as slick and well-designed as this one.

What is interesting about this is that there is apparently a need to make things more interesting than just plain visualizations. I use two different site trackers, but I still found the little summary in my WordPress annual report quite interesting. Plus, it’s much more fun to look at. Of course, that approach would not work for regular site stats: I want more depth for those, and I certainly don’t want to watch fireworks going off in the background whenever I want to see my site stats. But once a year, this is great.

I don’t believe that infographics are going to replace generic visualization, but I have no doubt that we will see a lot more context-specific graphics like that in the future. Not only are they more interesting, by standing out, they are also much more likely to be remembered.

News Graphics

Last year also saw some interesting new developments in news graphics. In particular, we’re seeing more use of relatively complex visualizations. I remember Matthew Ericson talking about scatterplots in his Vis keynote in 2007, and essentially saying that the New York Times would not print scatterplots.

How things have changed. Last year, we saw slope graphs/parallel coordinatesscatter plots, networks, as well as more elaborate visual data stories using the usual line and bar charts on the jobs reportspending, etc.

I particularly liked this scatterplot when I happened across it while in Portland recently. It’s sort of a meta-visualization: it shows the effect The Oregonian’s reporting has had on the shift lengths of bus and tram drivers in Portland.

This past year has also seen some very interesting movements of visual and data journalists, between different media as well as from media to other companies. Andy Kirk listed a good number of them in a posting on significant events of the latter half of last year (his posting on the first half is also worth a read).  This is a good sign, because it means that there is an active labor market, and it also means a transfer of knowledge. People staying in the same place don’t help ideas move from place to place. I think we should see the results of many of these moves in the next months.

More Thinking About Visualization

I just wrote about WTFViz, ThumbsUpViz, and HelpMeViz, and how I see those new websites as a sign of a richer online visualization culture. You may not agree with all of them, or any of them, but they finally show some new directions. It will be interesting to watch them evolve over this year and see where they are going. They certainly provide a lot of food for thought about visualization best practices.

I’m also throwing in Isabel Meirelles book, Design for Information, here. I really liked the approach she took in her book, and I hope that this will finally get people to write more thoughtful, informative, and interesting books about visualization. The reliance on pretty pictures without much depth is getting old.

Visualization That Punches You In The Gut

Perhaps my favorite development of 2013 is what some have called emotive visualization. Behind that anemic term hides a category of visual storytelling that doesn’t just state facts, but wants you to feel them. Sure, they are based on numbers. But the point is not just to give you the numbers, but to hit you with them, and hard.

The pieces on drone strikes and gun deaths published last year achieved that goal and got a lot of people talking, as did a video discussing wealth inequality. I hope we will see more of this kind of work going forward. This is the wilder side of storytelling, and the more visceral one. While numbers may seem boring, making them visual makes them real, in a very powerful way.

2014 And Beyond

What all these developments have in common is that while they are visualizations at their heart, they add significant context to it. Whether there’s a story, a deeper concern, or infographic elements, there is a lot of added value. This goes against the established wisdom of minimalism and starkly empty visualizations, but it’s also a completely different use case. I hope that 2014 will be the year people will finally realize that presentation and analysis are vastly different, and that we need to understand those differences, and establish good criteria for work in presentation and storytelling.

Beyond that, I’m just looking forward to more exciting work. It’s a good time to be in data visualization.

WTFViz, ThumbsUpViz, and HelpMeViz

Mon, 01/06/2014 - 05:31

Categories:

Visualization

I have complained, repeatedly, about the lack of good online resources for visualization; in particular, when it comes to discussion and critical reflection. Also, where can you go to get help with a visualization project? A few recent websites are tackling these issues in different ways.

First, Drew Skau started WTFViz, which quickly became hugely popular. It collects small snippets from infographics that are bad in some way: they misrepresent data, they obscure the message, they dress up numbers as if there were more of them, etc. Posting bad examples to laugh about is entertaining and useful, if obviously not always appreciated by the people who created them. But it can be quite educational for people to look through and see if they find things that are similar to their own work there.

But in response, Ann Emery, Stephanie Evergreen, Jonathan Schwabish, and Rob Simmon started the much more positive ThumbsUpViz, which collects good examples. While bad examples are kind of easy and interesting even when they’re not very bad, good examples are typically expected to be exceptional. I don’t think that should be the criterion, though, simple but good examples should qualify just as well. Anyway, the blog, also hosted on tumblr, is slowly posting good examples.

A slight variation on this is accidental aRt by Kara Woo and Erika Mudrak, which collects the little mistakes that happen on the way to creating visualizations. Rather than fix the problem right away, spend a moment to consider if you’ve created something neat or interesting by mistake, and submit it there. This is a bit similar to Kevin Quealey’s fascinating and hilarious chartsnthings.

Finally, Jon Schwabish recently started HelpMeViz, which is meant to provide feedback and inspiration. You can send him a description of your data and your attempts, and readers can then suggest alternatives and even create visualizations for you. It’s a good idea, and the interaction so far is very promising, considering that the site is only a month old or so.

Where is this going? I have a feeling that we’re starting to see more interesting new websites dealing with visualization, both in terms of criticism (finally!) and in terms of hands-on help. It’s heartening to see that, and I hope that these sites will thrive and attract lots of visitors and followers.