EagerEyes.org

Subscribe to EagerEyes.org feed
Updated: 2 weeks 18 hours ago

Video: Nigel Holmes on Humor in Visualization and Infographics

Wed, 02/11/2015 - 15:17

Categories:

Visualization

In this talk, Nigel Holmes talks about the value of and use of humor in communicating visualization. He also has some interesting criticism of academic visualization research (and also some more artistic pieces). It’s a fun and interesting talk, as always with Nigel Holmes.

Link: Becksploitation: The Over-Use of a Cartographic Icon

Wed, 02/04/2015 - 15:17

Categories:

Visualization

The paper Becksploitation: The Over-Use of a Cartographic Icon by Kenneth Field and William Cartwright (free pre-print PDF) in The Cartographic Journal describes the Harry Beck’s famous map of the London Underground and what makes it great. It also offers a collection of misuses of the superficial structure, and critiques them. I wish we’d had papers (and titles!) like this in visualization.

The paper is available online for free for the next twelve months, along with a selection of other Editor’s Choice papers (including Jack van Wijk’s Myriahedral Projections paper – watch the video if you haven’t seen it).

Among the many interesting tidbits in the paper is this clever image by Jamie Quinn. The subway map to end all subway maps (other than for subways).

Spelling Things Out

Tue, 02/03/2015 - 04:55

Categories:

Visualization

When visualizing data, we often strive for efficiency: show the data, nothing else. But there can be tremendous value in redundancy to make a point and drive it home. Two recent examples from news graphics illustrate this nicely.

The first is this animated chart of global temperatures from 1881 to 2014. It shows more data than is really needed. Why show monthly data when talking about the yearly average? Why the animation of all those lines when you could just show a bar chart of the yearly averages?

But that is exactly what makes this chart work. By watching the yearly average increase, you get a much clearer (and more urgent!) sense of how temperatures are rising. The little indicator of when a new record is set doesn’t show up often at first, but then keeps going off. It’s a smart piece that takes the data and turns it into a statement.

If you haven’t seen the animated version, it’s well worth spending a minute watching. This is the difference between data analysis and communication.

The other example is a static image comparing two numbers. The numbers aren’t terribly difficult to understand or compare. They’re not even particularly big. One number is 4, the other 644. There’s clearly a difference between them, but just reading them you might not think that much of it. However, the point is driven home by actually showing the number as little icons of people.

The point this article about politicians’ health priorities becomes much more urgent through this type of information graphic than just throwing around abstract numbers. You can ignore a number you read, but you can’t ignore this visual comparison.

If anything, I think it’s a mistake to overlap the icons, which compresses them and makes it harder to appreciate the actual number. Spelling it out even more with neatly aligned, non-overlapping figures would make this point even more clearly.

Efficiency clearly has its place in visualization, in particular in analysis. But knowing when the right choice is not the efficient one is what makes all the difference when it comes to communication.

Link: Tapestry 2015

Wed, 01/28/2015 - 15:17

Categories:

Visualization

Tapestry 2015 will take place March 4 in Athens, GA. This is the third time we are holding the conference, and it is again taking place on the day before NICAR. As in the past years, have a kick-ass line-up of speakers. The keynotes will be given by Hannah Fairfield (NY Times), Kim Rees (Periscopic), and Michael Austin (Useful Fictions). We also have a great set of short stories speakers: Chad Skelton (Vancouver Sun), Ben Jones (Tableau Public), Katie Peek (Popular Science), RJ Andrews (Info We Trust), and Kennedy Elliott (Washington Post).

On the website, you can watch a brief video summary (click on See what Tapestry is all about) or see all the talk videos from last year. We have also posted information on how to get there, and there will be a bus to take you to NICAR after the event.

Time is running out to apply for an invitation if you want to attend. Attendance is limited, and we’re trying to keep the event small and focused.

Seminal InfoVis Paper: Treisman, Preattentive Processing

Mon, 01/26/2015 - 04:26

Categories:

Visualization

A paper on a specific cognitive mechanism may seems like an odd choice as the first paper in this series, but it is the one that sparked the idea for it. It is also the one that has its 30th birthday this year, having been published in August 1985. And it is an important paper, and could play an even bigger role in visualization if properly understood and used.

Preattentive Processing in Vision

Anne Treisman’s work is unfortunately misunderstood about as often as her name is misspelled (it’s not i before e). Her paper, Preattentive Processing in Vision in the journal Computer Vision, Graphics, and Image Processing (August 1985, vol. 31, no. 2, pp. 157–177), describes a mechanism in our perceptual system that allows us to perform a small set of well-defined operations on certain, well-defined visual properties without the need for conscious processing or serial search.

This is often demonstrated with the so-called pop-out effect. Count the 9s in the following image (this example is stolen from Stephen Few).

Not easy, and in particular it requires scanning the image line by line. You can’t quickly find a shape like a 9 among very similar ones (like 8s, 6s, and 3s). Now let’s try this in a way that activates your preattentive processor.

Much easier! The 9s pop out. You can’t not see them. They’re there, easy to count. What you can do now is

  • detect their presence (or tell their absence) and point to where they were
  • estimate how many there were as a fraction of the total number of objects
  • detect boundaries between groups of objects that have similar properties (i.e., if the 9s were grouped in some way, you would perceive that as a shape).

All of this is possible even if you only saw this image for a fraction of a second (50–200ms) and your precision does not change significantly if we were to increase the number of objects (up to a reasonable limit).

There are a number of visual properties that this works for, including color, size, orientation, certain texture and motion attributes, etc. Chris Healey has a great webpage with demos and a more complete list.

Combining preattentive features is problematic: if the numbers were, say, blue and orange, and I wanted you to just count the bold orange ones, you’d still have to search serially (this is called conjunctive search). If they were combined so that the combination was unique (i.e., all bold digits were also orange, but there was nothing else that was orange or bold), that would make things easier and would still be preattentive (disjunctive search).

This is a very interesting effect for a number of reasons, and it can be used quite effectively in visualization. But it’s also important to understand its significant limitations. Used with great restraint, preattentive processing can be used to great effect. But not every use of a strong color contrast means that you’re using a preattentive feature.

Taking Things Further

Treisman’s paper is cited quite a bit in visualization, but it doesn’t always extend beyond lip service. One of the key issues seems to be a misunderstanding of what preattentive features really are, and what sorts of tasks preattentive processing can perform.

But more than that, it’s about restraint. Most visualization systems have way too much going on to be able to make use of preattentive features. A system could conceivably drop all its colors to gray when it wants to point something out using color, and only use color on those parts. Or it could provide certain types of filters or highlights that make use of specific features and are smart about not creating conjunctive searches. Or perhaps even just use it to highlight similar things when hovering.

I don’t believe that we have seen the real power of preattentive processing in visualization yet. What about using it to help people look for clusters in scatterplots? How about dense representations like heat maps? Perhaps there are even specific new techniques that could capitalize on these properties in ways existing ones can’t.

Thirty years after the discovery of the effect, there is still tremendous opportunity to unpack it, understand it, and make use of it in visualization.

Seminal InfoVis Papers: Introduction

Mon, 01/26/2015 - 03:01

Categories:

Visualization

Some of the most fundamental and important papers in information visualization are around 30 years old. This is interesting for several reasons. For one, it shows that the field is still very young. Most research fields go back much, much further. Even within such a short time frame, though, there is a danger of not knowing some of the most important pieces of research.

While 30 years is not much, it is also a lot. Some papers get cited over and over again, but more for convenience than with an eye towards truly building upon them and questioning them. They are treated as gospel a bit too much.

The goal of this little series is to describe a few of the most fundamental papers (not just ones that are that old, but also a few more recent ones). I don’t just want to summarize the papers though, but show the way forward: what work has been done since, what questions remain open, what new work could be done based on them?

A paper’s publication is only the beginning. Its value comes from the work that is built on top of it, questioning it, improving upon it – and, sometimes, proving it wrong.

Link: Data Stories Podcast 2014 Review

Thu, 01/22/2015 - 15:17

Categories:

Visualization

Episode 46 of the Data Stories podcast features Andy Kirk and yours truly in an epic battle for podcast dominance a review of the year 2014. This complements well my State of Information Visualization posting, and of course there is a bit of overlap (I wrote that posting after we recorded the episode – Moritz and Enrico are so slow). There are lots of differences though, and the podcast has the advantage of not just me talking. We covered a lot of ground there, starting from a general down about the year, to end up finding quite a few things to talk about (just check out the long list of links in the show notes!).

Link: Data Viz Done Right

Wed, 01/21/2015 - 15:17

Categories:

Visualization

Andy Kriebel’s Data Viz Done Right is a remarkable little website. He collects good examples of data visualization and talks about what works and what doesn’t. He does have bits of criticism sometimes, but he always has more positive than negative things to say about his picks. Good stuff.

Why Is Paper-Writing Software So Awful?

Mon, 01/19/2015 - 03:58

Categories:

Visualization

The tools of the trade for academics and others who write research papers are among the worst software has to offer. Whether it’s writing or citation management, there are countless issues and annoyances. How is it possible that this fairly straightforward category of software is so outdated and awful?

Microsoft Word

The impetus for this posting came from yet another experience with one of the most widely used programs in the world. Among some other minor edits on the final version of a paper, I tried to get rid of the blank page after the last one. Easy, just delete the space that surely must be there, right? No, deleting the space does nothing. It doesn’t get deleted, or it comes back, or I don’t know what.

So I select the entire line after the last paragraph and delete that. Now the last page is gone, but the entire document was also just switched from a two-column layout to a single column. Great.

People on Twitter tell me that Word stores formatting information in invisible characters at the end of paragraphs. That may the case, I really do not care. But that it’s possible for me to delete something I can’t see and thus screw up my entire document has  to be some sort of cruel joke. Especially for a program that has been around for so long and is used by millions of people every day.

Word has a long history (it was first released in 1983, over 30 years ago), and carries an enormous amount of baggage. Even simple things like figure captions and references are broken in interesting ways. Placing and moving figures is problematic to say the least. Just how poorly integrated some of Word’s features are becomes apparent when you try to add comments to a figure inside a text box (you can’t) or replace the spaces before the square brackets inserted by a citation manager with non-breaking ones (Word replaces the entire citation rather than just the opening bracket, even though only the bracket matches the search).

In trying to be everything to everybody, Word does many things very, very poorly. I have tried alternatives, but they are universally worse. I generally like Pages, but its lack of integration with a citation manager (other than the godawful Endnote) makes it a no-go.

LaTeX

We all know that you write serious papers in LaTeX, right? Any self-respecting computer scientist composes his formula-laden treatises in the only program that can insert negative spaces exactly where you need them. LaTeX certainly doesn’t have the issues Word has, but it has its own set of problems that make it only marginally better (if at all).

It is also starting to seriously show its age. TeX, which is the basic typesetting system LaTeX is based on, was released in 1978 – almost 40 years ago. LaTeX made its debut in 1984, over 30 years ago. These are some of the oldest programs still in widespread use, and LaTeX isn’t getting anywhere near the development resources Word does.

While a lot of work has been done to keep it from falling behind entirely (just be thankful that you can create PDFs directly without even having to know what a dvi file is, or how bad embedded bitmap fonts were), there are also tons of issues. Need a reference inside a figure caption? Better know what \protect does, or the 1970s-era parser will yell at you. Forgot a closing brace? Too bad, you’ll have to find it by scanning through the entire document manually, even though TeX’s parser could easily tell you if it had been updated in the last 20 years. Want to move a figure? Spend 15 minutes moving the figure block around in the text and hope there’s a place where it’ll fall where you want it. And the list goes on.

And then there are the errors you can’t even fix directly. The new packages that insert links into the bibliography are great, except when the link breaks over a column boundary, which causes an error that you can’t avoid. All you can do is add or remove text so the column boundary falls differently. Great fun when this happens right before a deadline.

Citation Managers

In the old days, putting your references together was a ton of work: you had to collect them in one place, keep the list updated when you wanted to add or remove one, then sort and format them, and maybe turn the placeholder references in the paper text into numbers. Any time you’d add or remove one, you had to do it over again.

bibTeX

Enter bibliography software. In the dinosaur corner, we have bibTeX. As the name suggests, it works with (La)TeX. And it’s almost as old, having been released in 1985. It uses a plain text file with a very simple (and brittle) format for all its data, and you have to run LaTeX three times to make sure all references are really correct. This puts even the old two-pass compilers to shame, but that’s how bibTeX works.

There are programs that provide frontends for these text files, and they’re mostly ugly and terrible. A notable exception here is BibDesk, especially if you’re in the life sciences. It works really well and doesn’t get in the way. It’s an unassuming little program, and it gets updated pretty continuously. What it does, it does really quite well.

But the rest of the field is as horrifying a train wreck as the writing part.

Mendeley

I can’t quite share in the doomsday-is-here wailing that started when Elsevier bought Mendeley, and I haven’t seen any terrible decisions yet. What drives me up the wall are simply the bugs and the slowness and the things you expect to work but don’t.

Why does All Documents not include all documents? Why do I have to drag a paper I imported into a group into All Documents so it shows up there? Why are papers in groups copies instead of references, so that when I update one, the other one doesn’t get updated? The most basic things are so incredibly frustrating.

To be fair, Mendeley is constantly improving and is nowhere near as terrible as it was a year or two ago. It still has a ways to go, though. And I really hope they get serious about that iPad app at some point.

Papers

I’m trying to love Papers. I really do. It’s a native Mac app (though there’s now also a Windows version). It looks good. But it manages to be buggy and annoying in many places where Mendeley works well.

For one, the search in Papers is broken. I cannot rely on it to find stuff. It’s an amazingly frustrating experience when you search for an author and can’t see a particular paper you’re sure is there, and then search for its title and there it is! The ‘All Fields’ setting in the search also doesn’t seem to include nearly all fields, like the author. And matching papers against the global database has its own set of pitfalls and annoyances (like being able to edit fields in a matched paper only to have your edits cheerfully thrown away when you’re not looking). The list goes on (don’t open the grid view if you have large PDFs in your collection, etc.).

Endnote

Listed only for completeness. Beyond terrible. Written by some sort of committee that understands neither paper writing nor software. I really can’t think of any non-academic commercial software that’s worse (within the category of software for academic users, it’s neck and neck with that nightmare that is Banner).

A Better Way?

How is it possible that the tools of the trade for academics are so outdated, insufficient, and just plain terrible? Is there really nothing in writing that is smarter than treating text as a collection of letters and spaces? Can’t we have a tool that manages reasonable layout (the stuff that LaTeX is good at without the parts it sucks at) with a decent reference manager?

This isn’t rocket surgery. All these things have well-known algorithms and approaches (partly due to the work that went into TeX and other systems). There have also been advances since the days when Donald Knuth wrote TeX. Having classics to look back at is great, but being stuck with them is not. And it’s particularly infuriating in what is supposed to be high technology.

What I understand even less is that there are no tools that consider text in a way that’s more semantic. Why can’t I treat paragraphs or sections as objects? Why doesn’t a section know that its title is part of it and thus needs to be included when I do something to it? Why don’t word processor allow me to fold a paragraph or section or chapter, like some source code editors do? Why can’t figures float while I move them and anchor to only certain positions given the constraints in a template?

There are so many missed opportunities here, it’s breathtaking. There has to be a world beyond the dumb typewriters with fancy clipart we have today. Better, more structured writing tools (like Scrivener, but with a reference manager) have got to be possible and viable as products.

We can’t continue writing papers with technology that hasn’t had any meaningful updates in 30 years (LaTeX) or that tries to cover everything that contains text in some form (Word). There has got to be a better way.

Links: 2014 News Graphics Round-Ups

Wed, 01/14/2015 - 15:17

Categories:

Visualization

In the past, it used to be difficult to find news graphics from the main news organizations. In the last few years, they have started to post year-end lists of their work, which are always a treat to walk through. With the new year a few weeks behind us, this is a good time to look at these as collections of news graphics.

Slightly different, but worth a special mention, is NZZ’s amazing visualization of all their articles from the year, Das Jahr 2014 in der «Neuen Zürcher Zeitung» (in German).

The State of Information Visualization, 2015

Mon, 01/12/2015 - 05:14

Categories:

Visualization

It seems to be a foregone conclusion that 2014 was not an exciting year in visualization. When we recorded the Data Stories episode looking back at 2014 last week (to be released soon), everybody started out with a bit of a downer. But plenty of things happened, and they point to even more new developments in 2015.

If this was such a boring year, how come Andy Kirk have  a round-up of the first six months, and another posting for the second half of the year with many good examples? Or how about Nathan Yau’s list of the best data vis projects of the year? So yeah, things happened. New things, even.

Academic InfoVis

I’m still awed by the quality of InfoVis 2014. It wasn’t even just the content of the papers that was really good, it was the whole package: present interesting new findings, present them well, make your data and/or code available. This had never happened with such consistency and at that level of quality before.

The direction of much of the research is also different. There were barely any new technique papers, which is largely a good thing. For a while, there were lots of new techniques that didn’t actually solve any real problems, but were assumed to be the way forward. Now we’re seeing more of a theoretical bent (like the AlgebraicVis paper), more basic research that looks very promising (e.g., the Weber’s Law paper), and papers questioning long-held assumptions (the bar charts perception paper, the error bars paper, the paper on staged animation, etc.).

Thoughtfully replicating, critiquing, and improving upon oft-cited older papers should be a valid and common way of doing research in InfoVis. The only way forward in science is to keep questioning beliefs and ideas. It’s good to see more of this happening, and I hope that this trend continues.

Storytelling

I talked about storytelling at the beginning of last year, and 2014 was clearly a big one for that. Besides the Story Points feature in Tableau 8.2, there have been many interesting new approaches to building more compelling stories from data.

Some new formats are also emerging, like Bloomberg View’s Data View series (unfortunately, there doesn’t seem to be a way to list all of them). I’m not yet convinced by the ever more common “scrollytelling” format, and have seen some really annoying and distracting examples. I don’t entirely agree with Mike Bostock’s argument that scrolling is easier than clicking, but he at least has some good advice for people building these sorts of things.

There was also a bit of a discussion about stories between Moritz Stefaner and myself, with Moritz firing the first shot, my response plus a definition of story, and finally a Data Stories episode about data stories where we sorted it all out.

There is no doubt that we’ll see more of this in the coming years. The tools are improving and people are starting to experiment and learn what works and what doesn’t. I hope that we will also see more and deeper academic work in this area.

Non-Academic Conferences

Speaking of conferences, like InfoVis, only different: these may not be new, but they are continuing. Tapestry, OpenVis, Visualized, eyeo, etc. are all connecting people from different disciplines. People talking to each other is good. Conferences are good.

That all these conferences are viable (and eyeo is basically impossible to get into) is actually quite remarkable. There is an interest in learning more. The people speaking there are also interesting, because they are not all the usual suspects. Journalists in particular did not use to speak much outside of journalism conferences. They have interesting things to say. People want to hear it.

The Rise of Data Journalism

FiveThirtyEight. Vox. The UpShot. They all launched (or relaunched) last year. Has it all been good? No. Nate Silver’s vow to make the news nerdier is off to a good start, but there is still a long ways to go. Vox has gotten too many things wrong and, quite frankly, needs to slow down and rethink their approach of publish-first-check-later. There is also a bit of a cargo cult going on, where every story involving numbers is suddenly considered data journalism.

But even with some of the false starts and teething problems, it’s clear that data in journalism is happening, and it is becoming more visible.

What Else 2015 Will Bring

In addition to the above, I think it’s clear that the use of visualization for communication and explanation of data will continue outside of journalism as well. Analysis is not going away of course, but more of its results will be visual rather than turned into tables or similar. The value of visualization is hardly limited to a single person staring at a screen.

This is also being picked up on the academic side. I think we will see more research published in this direction, more focused on particular ideas and more useful than what has been done so far (which has been mostly analysis).

Finally, I’m looking forward to more good writing about visualization. Tamara Munzner’s book came out last year, but since I haven’t read it yet, I can’t say anything other than that I have very high expectations. Several other people are also working on books, including Cole NussbaumerAndy Kirk, and Alberto Cairo (the latter two are slated to come out in 2016, though).

I didn’t think that 2014 was a bad year for information visualization. And I think 2015 and beyond will be even better.

The Island of Knowledge and the Shoreline of Wonder

Mon, 01/05/2015 - 04:17

Categories:

Visualization

In his keynote at IEEE VIS in Paris two months ago, Alberto Cairo talked about journalism, visual explanations, and what makes a good news visualization. But mostly, he talked about curiosity.

When I wrote my IEEE VIS report for Tuesday of that week, I knew that I could either do a shoddy job of describing the keynote and get the posting done, or have to push the entire thing back by a few days. So I decided to turn this into a separate posting.

The goal of writing up the talk here is not to provide a full recap – even though I could probably give the talk for him now, having seen variations of it three times in as many months. Instead, I want to pick out a few topics I find particularly interesting and universally relevant.

Curiosity

He started the talk with questions his kids ask him, like one from his 7-year-old daughter: why don’t planets stop spinning? That’s an amazingly deep question when you think about it, and even more so for a 7-year-old.

Alberto then went through some explanations, at the end of which he drew an interesting comparison: he likened the momentum of a planet’s rotation to the way answers can set his daughter’s mind in motion to produce more questions. Both keep spinning unless there’s a force to slow them down.

I particularly like the succinct way he put it: Good answers lead to more good questions. That sounds a lot like data analysis to me. And also to science. It’s quite satisfying to see a unifying theme between explanation and analysis: curiosity.

More knowledge leading to more questions is a fascinating idea. Cairo uses a quote by Ralph W. Sockman (also the basis for a book by Marcelo Gleiser), The larger the island of knowledge, the longer the shoreline of wonder. The island of knowledge is surrounded by an infinite sea of mystery. As the island grows, so does its shoreline, which is where wonder and new ideas happen.

I love this because it describes exactly the way science works. More knowledge always leads to more questions. Curiosity feeds itself. And it goes contrary to the idea that science takes away the mystery or beauty of nature by explaining things.

It’s More Complicated Than That

Getting back to journalism, Alberto lists a series of principles for a good visualization. It has to be…

  • Truthful
  • Functional
  • Beautiful
  • Insightful
  • Enlightening

This set of criteria is strongly based on journalistic practice and principles, and I think it makes a great package for the evaluation of any kind of visualization. Some of the criteria will look odd to the typical visualization person, such as the inclusion of beauty. But this is also what makes Alberto’s book so useful in teaching visualization courses: it goes beyond the typical limited horizon of the technical and largely analytical (rather than communication-oriented) mindset that is still prevalent in visualization.

Another part of this section was my final take-away, another great little sentence that I think needs to be appreciated more when working with data: it’s more complicated than that. Many times, it’s hard to appreciate the complexity and complications in the data, especially when things look convincing and seem to all fit together. But simple explanations can often be misleading and hide a more complex truth. The curious mind keeps digging and asking more questions.

Images from Alberto Cairo’s slides, which he kindly allowed me to use.

eagereyes will be bloggier in 2015

Tue, 12/30/2014 - 04:17

Categories:

Visualization

I always mess with my site around the new year, and this year is no exception. In addition to a new theme, I’ve also been thinking about content. Here are some thoughts on what I want to do in 2015.

I don’t know what it is, but I always start hating my website theme after about a year. We’ll see if this one is any different. Either way, it’s new. If you’re curious, this is the new Twenty Fifteen Theme that’s part of WordPress 4.1, with some minor tweaks. It’s nice, simple, clean, and has a few subtle little features.

It’s also decidedly a blog theme, with a focus on images. I’ve been using teaser images for most postings for a while now, and will make a bigger effort to find good and fitting ones. These may not even show up in your newsreader, especially for link posts (though you will see them on facebook and in the Twitter cards). But they make the site a lot nicer to look at and navigate.

As for content, there are mainly two things. One is that I want to make some more use of the post formats in WordPress, in particular links. These are different in that their title link goes to the page I want to link to, rather than a posting. The text that goes with each will also be short, so you’ll be able to see the entire thing on the front page. If you care to comment, you can click on the image to go to the posting page.

I already posted the first one recently, and have a few more scheduled for the coming weeks. The idea is to post a few of these a month, in addition to the regular content. If you’re following me on Twitter, it’s likely that you will have seen these links there before, but there will be a tad more context here, and there won’t be nearly as many.

As for the other content, my plan is to make a clearer distinction between blog postings and articles. I already have that in the way the categories are set up, but that isn’t very visible. I’m aiming for more consistent posting (i.e., one posting a week, every week), with the blog postings being shorter and more informal, while the articles will be longer and more organized.

Link titles will start with “Link:” from now on, but I don’t want to do that for blog postings or articles. I’m not sure yet how I will indicate the distinction, but it should at least be clear from the length and maybe the tone.

The goal is to make the content easier to consume, since I know that anything beyond a few paragraphs is much less likely to be read in its entirety (or at all). And perhaps I’ll even find a use for those other post types, like quote, image, and aside.

Review: Wainer, Picturing the Uncertain World

Tue, 12/23/2014 - 06:01

Categories:

Visualization

Picturing the Uncertain World by Howard Wainer is a book about statistics and statistical thinking, aided by visual depictions of data. Each article in the collection starts by stating a question or phenomenon, which is then investigated further using some clever statistics.

I bought the book after Scott Murray pointed me to it as the source of his assertion that in order to show uncertainty, the best way was to use blurry dots. I was surprised by that, since my own work had shown people to be pretty bad at judging blurriness, so that didn’t seem to be a particularly good choice (at least if you want people to be able to judge the amount of uncertainty).

The Author

I had never heard of Howard Wainer before reading this book. It turns out that he has been an outspoken critic of bad charts for a long time, much longer than blogs have been around to do that. In fact, Wainer wrote an article for American Statistician in 1984 that could have been the blueprint for blogs like junk charts.

And it turns out that there is even a connection between Wainer and Kaiser Fung, who runs junk charts.

@eagereyes Howard introduced me to Tufte principles in my first stats course almost 20 yr ago!

— Kaiser Fung (@junkcharts) December 9, 2014

This is also interesting because the book reminded me of Kaiser’s Numbers Rule Your World and Numbersense. It all makes sense.

The Book

After Scott pointing it out, the book immediately intrigued me: had somebody figured out how to show uncertainty well? How did I not know about this? Well, it turns out he hasn’t. But there is a lot of other good stuff in this book that makes it very worthwhile.

Wainer’s idea of uncertainty is much broader than the usual error metrics (though he addresses those as well). In fact, he describes statistics as the science of uncertainty. That makes a lot of sense, and he makes the case repeatedly about how statistics provides means of dealing with uncertainty about facts and observations.

As a consequence, the book is really about statistical thinking, aided by visual depictions of the data. In several chapters, Wainer takes data and either redraws an existing chart, or argues that by simply looking at the data the right way, it becomes much easier to understand what is going on.

The key chapter from my perspective was chapter 13, Depicting Error. Wainer shows a number of ways to depict error, from tables to a number of charts. Some of these are well-known, others not. They are all interesting, though there isn’t much that is surprising (especially after having seen the Error Bars Considered Harmful paper by Michael Correll and Michael Gleicher at InfoVis earlier this year).

There is a lot of other good stuff in the book too, though. Chapter 16, Galton’s Normal, talks about the way the normal distribution drops to very, very small probabilities in the tails. It’s a short chapter, but it really drove home a point for me about how hard it is to intuitively understand distributions, even the ubiquitous normal distribution.

The final chapter, The Remembrance of Things Past, is probably the best. It’s the deepest, most human, and I think it has the best writing. It describes the statistical graphics produced by population of the jewish ghetto in Kovno, Lithuania, during the Holocaust. It’s chilling and fascinating, and the charts they created are incredible. Wainer does an admirable job of framing the entire chapter and navigating between becoming overly sentimental and being too sterile in his descriptions.

The book is really a collection of articles Wainer wrote for Chance Magazine and American Statistician in the mid–2000s (with one exception from 1996). As a result, it isn’t really more than the sum of its parts: it doesn’t have any cohesion between the chapters. But on the other hand, each chapter is a nicely self-contained piece, easy to read, and it’s easy to pick the book up to read a chapter or two. Wainer also writes very well. The chapters are easy to read, and his explanations of statistical phenomena and procedures are very good and easy to follow even if you don’t know much about statistics.

Ultimately, my question about the blurry dots was not answered, because Wainer points to Alan MacEachren’s book How Maps Work as the source of the blurriness argument. I can’t find my copy of that book at the moment though, so following this lead further will have to wait for another day.

VIS 2014 Observations and Thoughts

Tue, 11/18/2014 - 03:19

Categories:

Visualization

While I’ve covered individual talks and events at IEEE VIS 2014, there are also some overall observations – positive and negative – I thought would be interesting to write down to see what others were thinking.

I wrote summaries for every day I was actually at the conference: Monday, Tuesday, Wednesday, Thursday, and Friday. VIS actually now starts on Saturday with a few early things like the Doctoral Colloquium, and Sunday is a full day of workshops and tutorials.

Just to be clear: my daily summaries are by no means comprehensive. I did not go to a single VAST or SciVis session this year, only saw two out of five panels, did not go to a single one of the ten workshops, attended only one of the nine tutorials, and didn’t even see all the talks in some of the sessions I did go to. I also left out some of the papers I actually saw, because I didn’t find them relevant enough.

Things I Don’t Like

I’m starting with these, because I like a lot more things than I don’t, and listing the bad stuff at the end always makes these things sound like they are much more important and severe than they really are.

The best paper has been quite odd at InfoVis for a while. Some of the selections made a lot of sense, but some were just downright weird. This year’s best paper was not bad, but I don’t think it was the best one that was presented. Even more, some of the really good ones didn’t even get honorable mentions.

While it’s easy to blame the best paper committee, I think we program committee members also need to get better at nominating the good ones so they can be considered. I know I didn’t nominate any of the ones I was primary reviewer on, and I really should have for one of them. We tend to be too obsessed with criticizing the problems and don’t spend enough time making sure the good stuff gets the recognition it deserves.

Another thing I find irritating is the new organization of the proceedings. I don’t get why TVCG papers need to be in a separate category entirely, that just makes finding them harder. It also only reinforces the mess that is the conference vs. journal paper distinction at VAST. Also, why are invited TVCG papers listed under conference rather than TVCG? How does that make any sense? There has to be a better way both for handling VAST papers (and ensuring the level of quality) and integrating all papers in the electronic proceedings. There is just too much structure and bureaucracy here that I have no interest in and that only gets in the way. Just let me get to the papers.

Speaking of TVCG, I don’t think that cramming presentations for journal papers into an already overfull schedule is a great idea. That just takes time away from other things that make more sense for a conference (like having a proper session for VisLies). While I appreciate the fact that VIS papers are journal papers (with some annoying exceptions), I think doing the opposite really doesn’t make sense. Also, none of the TVCG presentations I saw this year were remarkable (though I admittedly only saw a few).

The Good Stuff

On to the good stuff. This was the best InfoVis conference in a while. There were a few papers I didn’t like, but they were outweighed by a large number of very strong ones, and some really exceptional ones. I think this year’s crop of papers will have a lasting impact on the field.

In addition to the work being good, presentations are also getting much better. I only saw two or three bad or boring presentations, most were very solid. That includes the organization of the talk, the slides (nobody seems to be using the conference style, which is a good thing), and the speaking (i.e., proper preparation and rehearsals). A bad talk can really distract from the quality of the work, and that’s just too bad.

Several talks also largely consisted of well-structured demos, which is great. A good demo is much more effective than breaking the material up into slides. It’s also much more engaging to watch, and leaves a much stronger impression. And with some testing and rehearsals, the risk that things will crash and burn is really not that great (still not a bad idea to have a backup, though).

A number of people have talked about the need for sharing more materials beyond just the paper for a while, and it is now actually starting to happen. A good number of presentations ended with a pointer to a website with at least the paper and teaser video, and often more, like data and materials for studies, and source code. After the Everything But The Chart tutorial, I wonder how many papers next year will have a press kit.

The number of systems that are implemented in JavaScript and run in the browser is also increasing. That makes it much easier to try them out without the hassle of having to download software. Since many of these are prototypes that will never be turned into production software, it doesn’t matter nearly as much that they won’t be as easily maintained or extended.

VIS remains a very friendly and healthy community. There are no warring schools of thought, and nobody tries to tear down somebody else’s work in the questions after a talk. The social aspect is also getting ever stronger with the increasing number of parties. That might sound trivial, but the main point of a conference are communication and the connections that are made, not the paper presentations.

There is also a vibrant community on Twitter, at least for InfoVis and VAST talks. I wonder what it will take to get some SciVis people onto Twitter, though, or help them figure out how to use WordPress.

VIS 2014 – Friday

Fri, 11/14/2014 - 15:46

Categories:

Visualization

Wow, that was fast! VIS 2014 is already over. This year’s last day was shorter than in previous years, with just one morning session and then the closing session with the capstone talk.

Running Roundup

We started the day with another run. Friday saw the most runners (six), bringing the total for the week to 15, with a count distinct of about 12. I hereby declare the first season of VIS Runnners a resounding success.

InfoVis: Documents, Search & Images

The first session was even more sparsely attended than on Thursday, which was really too bad. The first paper was Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool For Investigative Journalists by Matthew Brehmer, Stephen Ingram, Jonathan Stray, and Tamara Munzner, and it was great. Overview is a tool for journalists to sift through large collections of documents, like those returned from Freedom of Information Act (FOIA) requests. Instead of doing automated processing, it allows the journalists to tag and use keywords, since many of these documents are scanned PDFs. It’s a design study as well as a real tool that was developed over a long time and multiple releases. This is probably the first paper at InfoVis to report on such an extensively developed system (and the only one directly involved in somebody becoming a Pulitzer Prize finalist).

The Overview paper also wins in the number of websites category: in addition to checking out the paper and materials page, you can use the tool online, examine the source code, or read the blog

How Hierarchical Topics Evolve in Large Text Corpora, Weiwei Cui, Shixia Liu, Zhuofeng Wu, Hao Wei presents an interesting take on topic modeling and the ThemeRiver. Their system is called RoseRiver, and is much more user-driven. The system finds topics, but lets the user combine or split them, and work with them much more than other systems I’ve seen.

I’m a bit skeptical about Exploring the Placement and Design of Word-Scale Visualizations by Pascal Goffin, Wesley Willett, Jean-Daniel Fekete, and Petra Isenberg. The idea is to create a number of ways to include small charts within documents to show some more information for context. They have an open-source library called Sparklificator to easily add such charts to a webpage. I wonder how distracting small charts would be in most contexts, though.

A somewhat odd paper was Effects of Presentation Mode and Pace Control on Performance in Image Classification by Paul van der Corput and Jarke J. van Wijk. They investigated a new way of rapid serial visual presentation (RSVP) for images, which continuously scrolls rather than flips through page of images. It’s a mystery to me why they only tried sideways scrolling, which seems much more difficult than vertical scrolling.

Capstone: Barbara Tversky, Understanding and Conveying Events

The capstone was given by cognitive psychology professor Barbara Tversky. She talked about the difference between events and activities (events are delimited, activities are continuous), and how we think about them in when listening to a story. She has done some work on how people delineate events on both a high level and a very detailed level.

This is interesting in the context of storytelling, and particularly in comics, which break up time and space using space, and need to do so at logical boundaries. Tversky also discussed some of the advantages and disadvantages of story: that it has a point of view, causal links, emotion, etc. She listed all of those as both advantages and disadvantages, which I thought was quite clever.

It was a very fast talk, packed with lots of interesting thoughts and information nuggets. It worked quite well as a counterpoint to Alberto Cairo’s talk, and despite the complete lack of direct references to visualization (other than a handful of images), it was very appropriate and useful. Many people were taking pictures of her slides during the talk.

Next Years

IEEE VIS 2015 will be held in Chicago, October 25–30. The following years had already been announced last year (2016: Washington, DC; 2017: Santa Fe, NM), but it was interesting to see them publicly say that 2018 might see VIS in Europe again.

This concludes the individual day summaries. I will also post some more general thoughts on VIS 2014 in the next few days.

VIS 2014 – Thursday

Fri, 11/14/2014 - 07:16

Categories:

Visualization

Thursday was the penultimate day of VIS 2014. I ended up only going to InfoVis sessions, and unfortunately missed a panel I had been planning to see. The papers were a bit more mixed, but there were agains some really good ones.

InfoVis: Evaluation

Thursday was off to a slow start (partly because of the effects of the party the night before that had the room mostly empty at first), but eventually got interesting.

Staggered animation is commonly understood to be a good idea: don’t start all movement in a transition at once, but with a bit of delay. It’s supposed to help people track the objects as they are moving. The Not-so-Staggering Effect of Staggered Animated Transitions on Visual Tracking by Fanny Chevalier, Pierre Dragicevic, and Steven Franconeri describes a very well-designed study that looked into that. They developed a number of criteria that make tracking harder, then tested those with regular motion. After having established their effect, they used Monte-Carlo simulation to find the most best configuration for staggered animation of a field of points (since there are many choices to be made about which to move first, etc.), and then tested those. It turns out that the effect from staggering is very small, if it exists at all. That’s quite interesting.

Since they tested this on a scatterplot with identical-looking dots, it’s not clear how this would apply to, for example, a bar chart or a line chart, where the elements are easier to identify. But the study design is very unusual and interesting, and a great model for future experiments.

Another unexpected result comes from The Influence of Contour on Similarity Perception of Star Glyphs by Johannes Fuchs, Petra Isenberg, Anastasia Bezerianos, Fabian Fischer, and Enrico Bertini. They tested the effect of outlines in star glyphs, and found that the glyph works better without it, just showing the spokes. That is interesting, since the outline supposedly would help with shape perception. There are also some differences between novices and experts, which are interesting in themselves.

The only technique paper that I have seen so far this year was Order of Magnitude Markers: An Empirical Study on Large Magnitude Number Detection by Rita Borgo, Joel Dearden, and Mark W. Jones. The idea is to design a glyph of sorts to show orders of magnitude, so values across a huge range can be shown without making most of the smaller values impossible to read. The glyphs are fairly straightforward and require some training, but seem to be working quite well.

InfoVis: Perception & Design

While there were some good papers in the morning, overall the day felt a bit slow. The last session of the day brought it back with a vengeance, though.

Learning Perceptual Kernels for Visualization Design by Çağatay Demiralp, Michael Bernstein, and Jeffrey Heer describes a method for designing palettes of shapes, sizes, colors, etc, based on studies. The idea is to measure responses to differences, and then train a model to figure out which of them can be differentiated better or worse, and then pick the best ones.

The presentation that took the cake for the day though was Ranking Visualization of Correlation Using Weber’s Law by Lane Harrison, Fumeng Yang, Steven Franconeri, and Remco Chang. It’s known that scatterplots allow people to judge correlation quite well, with precision following what is called Weber’s Law (which describes which end of the scale is easier to differentiate). In their experiments, the authors found that this is also true for ten other techniques, including line charts, bar charts, parallel coordinates, and more. This is remarkable because Weber’s law really describes very basic perception rather than cognition, and it paves the way for a number of new ways to judge correlation in almost any chart.

The Relation Between Visualization Size, Grouping, and User Performance by Connor Gramazio, Karen Schloss, and David Laidlaw looked at the role of mark size in visualizations, and whether it changes people’s performance. They found that mark size does improve performance, but only to a point. From there, it doesn’t make any more difference. Grouping also helps reduce the negative effect of an increase in the number of marks.

Everybody talks about visual literacy in visualization, but nobody really does anything about it. That is, until A Principled Way of Assessing Visualization Literacy by Jeremy Boy, Ronald Rensink, Enrico Bertini, and Jean-Daniel Fekete. They developed a framework for building visual literacy tests, and showed that this could work with an actual example. This is just the first step certainly, and there are no established visual literacy levels for the general population, etc. But having a way to gauge visual literacy would be fantastic and inform a lot of research, use of visualization in the media, education, etc.

The Podcasting Life

Moritz and Enrico asked me to help them record a segment for the VIS review episode of the Data Stories podcast. You can listen to that in all its raw, uncut glory by downloading the audio file.

VIS 2014 – Wednesday

Thu, 11/13/2014 - 14:29

Categories:

Visualization

Wednesday is more than the halfway point of the conference, and was clearly the high point so far. There were some great papers, the arts program, and I got to see the Bertin exhibit.

InfoVis: Interaction and Authoring

Revisiting Bertin matrices: New Interactions for Crafting Tabular Visualizations by Charles Perin, Pierre Dragicevic, and Jean- Daniel Fekete was the perfect paper for this year. They implemented a very nice, web-based version of Bertin’s reorderable matrix, very closely following the purely black-and-white aesthetic of the original. They are also starting to build additional things on top of that, though, using color, glyphs, etc.

The reason it fits so well is not just that VIS is in Paris this year (and Bertin actually lived just around the corner from the conference hotel), but it also ties in with the Bertin exhibit (see below). They also made the right choice in calling the tool Bertifier, a name I find endlessly entertaining (though they clearly missed the opportunity to name it Bertinator, a name both I and Mike Bostock suggested after the fact – great minds clearly think alike).

iVisDesigner: Expressive Interactive Design of Information Visualizations by Donghao Ren, Tobias Höllerer, and Xiaoru Yuan is a tool for creating visualization views on a shared canvas. It borrows quite a bit from Tableau, Lyra, and other tools, but has some interesting ways of quickly creating complex visualizations that are linked together so brushing between them works. They even showed streaming data in their tool. It looked incredibly slick in the demo, though I have a number of questions about some of the steps I didn’t understand. Since it’s available online and open-source, that’s easy to follow up on, though.

VIS Arts Program

I saw a few of the papers in the VIS Arts Program (oddly abbreviated VISAP), though not as many as I would have liked. There were some neat projects using flow visualization to paint images, some more serious ones raising awareness for homelessness with a large installation, etc.

The one that stood out in the ones I saw was PhysicSpace, a project where physicists and artists worked together to make it possible to experience some of the weird phenomena in quantum physics. The pieces are very elaborate and beautiful, and go way beyond simple translations. There is a lot of deep thinking and an enormous amount of creativity in them. It’s also remarkable how open the physicists seem to be to these projects. It’s well worth watching all the videos on their website, they’re truly stunning. This is the sort of work that really shows how transcending art and science can produce amazing results.

InfoVis: Exploratory Data Analysis

This session was truly outstanding. All the papers were really good, and the presentations matched the quality of the content (almost all the presentations I saw yesterday were really good). InfoVis feels really strong this year, both in terms of the work and the way it is presented.

The Effects of Interactive Latency on Exploratory Visual Analysis by Zhicheng Liu and Jeffrey Heer looks at the effect latency has on people’s exploration of data. They added a half-second delay to their system and compared to the system in its original state. It turns out that the delay reduces the amount of interaction and people end up exploring less of the data. While that is to be expected, when asked people didn’t think the delay would affect them, and a third didn’t even consciously notice it.

Visualizing Statistical Mix Effects and Simpson’s Paradox by Zan Armstrong and Martin Wattenberg examines Simpson’s Paradox (e.g., median increases for entire population, even though every subgroup decreases) in visualization. They have built an interesting visualization to illustrate why the effect occurs, and make some recommendations for mitigating it in particular techniques. This is an important consideration for aggregated visualization, which is very common given today’s data sizes.

Showing uncertainty is an important issue, and often it is done with error bars on top of bar charts. The paper Error Bars Considered Harmful: Exploring Alternate Encodings for Mean and Error by Michael Correll and Michael Gleicher shows why they are problematic: the are ambiguous (do they show standard error or a confidence interval? If the latter, then which one?), asymmetric (points in the bar appear to be more likely than points over the bar, at the same distance from the bar’s top), and binary (a point is either within the range or outside). Their study demonstrates the issue and then tests two different ways, violin plots and gradient plots, which both perform better.

My Tableau Research colleagues Justin Talbot, Vidya Setlur, and Anushka Anand presented Four Experiments on the Perception of Bar Charts. They looked at the classic Cleveland and McGill study of bar charts, and asked why the differences they found occurred. Their study is very methodical and presented very well, and opens up a number of further hypotheses and questions to look into. It has taken 30 years for somebody to finally ask the why question, hopefully we’ll see more reflection and follow-up now.

I unfortunately missed the presentation of the AlgebraicVis paper by Gordon Kindlmann and Carlos Scheidegger. But it seems like a really interesting approach to looking at visualization, and Carlos certainly won’t shut up about it on Twitter.

Bertin Exhibit

VIS being in Paris this week is the perfect reason to have an exhibit about Jacques Bertin. It is based on the reorderable matrix, an idea Bertin developed over many years. The matrix represents a numeric value broken down by two categorical dimensions, essentially a pivot table. The trick, though, is that it allows its user to rearrange and order the rows and columns to uncover patterns, find correlations, etc.

The exhibit shows several design iterations Bertin went through to build it so it would be easy to rearrange, lock, and unlock. Things were more difficult to prototype and animate before computers.

The organizers also built a wooden version of the matrix for people to play with. The basis for this was the Bertifier program presented in the morning session. While they say that it is a simplified version of Bertin’s, they also made some improvements. One is that they can swap the top parts of the elements by attaching them with magnets. That way, different metrics can be expressed quite easily, without having to take everything apart. I guess it also lets you cheat on the reordering if you only swap two rows.

They also have some very nice hand-drawn charts from the 1960s, though not done by Bertin. They are interesting simply because they show how much effort it was to draw charts before computers.

Note the amount of white-out used above to remove extraneous grid lines, and below to correct mistakes on the scatterplot.

I was also reminded of this in the Financial Visualization panel, where one of the speakers showed photos of the huge paper charts they have at Fidelity Investments for deep historical data (going back hundreds of years). Paper still has its uses.

In addition to being interesting because of Bertin’s influence and foresight, this exhibit is also an important part of the culture of the visualization field. I hope we’ll see more of these things, in particular based on physical artifacts. Perhaps somebody can dig up Tukey’s materials, or put together a display of Bill Cleveland’s early work – preferably without having to wait for him to pass away.

Running and Partying

The second VIS Run in recorded history took place on Wednesday, and that night also saw the West Coast Party, which is becoming a real tradition. The first session on Thursday morning was consequently quite sparsely attended.