EagerEyes.org

Subscribe to EagerEyes.org feed
Updated: 11 hours 22 min ago

Why the Obsession with Tables?

Thu, 05/02/2013 - 04:12

Categories:

Visualization

Lots of data are still presented and released as tables. But why, when we know that visual representations are so much easier to read and understand? Eric Newburger from the U.S. Census Bureau has an interesting theory.

In a short talk on visualization at the Census Bureau, he describes how in the 1880s, the Census published maps and charts. Many of those are actually amazingly well done, even by today’s standards. But starting with 1890 census, they were replaced with tables.

This, according to Newburger, was due to an important innovation: the Hollerith Tabulating Machine. The new machines were much faster and could slice and dice the data in a lot of new ways, but their output ended up in tables. Throughout the 20th century, the Census created enormous amount of tables, with only a small fraction of the data shown as maps or charts.

Newburger argues that people don’t bother trying to read tables, whereas visualizations are much more likely to catch their attention and get them interested in the underlying data. We clearly have the means to create any visualization we want today, and there is plenty of data available, so why keep publishing tables? It’s a matter of the attitudes towards data, and these can be hard to change after more than 100 years:

We were producing analysts who knew how to make tables. Really really good tables. But what we’re doing is making tables.

There are three short talks in this recorded webinar, which also go into some detail on the visualization efforts inside the Census, their visualization gallery, etc. It’s an interesting insight into the way the Census Bureau works and how a small group of people is trying to change the way the Census communicates information to the public.

Continuous Values and Baselines

Mon, 04/29/2013 - 05:23

Categories:

Visualization

One of the most common mistakes people make when creating charts is to cut off the vertical axis. But why is that a problem? And what can you do when you need to show data where the amount of change is small compared to the absolute values?

When we think of continuous data, we almost always think of values that have a meaningful zero. There is no question what an amount of money is measured from, we understand the meaning of zero money. The same is true for most other things: length, weight, volume, etc. all have an obvious zero. It doesn’t matter what unit you use, zero meters is zero feet is zero furlongs is zero lightyears.

As a consequence, we can think in terms of multiples, without even caring about units. Something being twice as heavy as something else is meaningful independently of whether you weigh using pounds or kilograms, and something is twice expensive whether you pay in Euros or Dollars or Yen.

Bars: Length Is Just Another Unit

When data gets mapped to visual variables for visualization, we tend to make the same assumptions. A bar that is twice as long represents a value that’s twice as big. But that is only true if that bar starts from zero. If it was cut off, that is no longer true.

The following image shows the monthly sales of a fictitious coffee chain over a few months. The left bar chart starts at zero, the right one at $29K. Notice the difference?

In the right-hand chart, the bar for February appears to be roughly twice as high as the one for January. Twice the bar size means twice the value, right? But looking at the chart on the left, it’s obvious that the change is rather small.

The first thing to do when looking at a chart, therefore, is to make sure you understand the vertical axis. If it starts at 0, it is much easier to read the chart without being misled.

Lines Don’t Need Baselines?

Some people suggest that in contrast to bar charts, line charts are not sensitive to the baseline problem. However, I disagree. Look at the same data as before, this time shown as a line chart.

Is the change not much more dramatic in the right-hand part of this image? The line chart maps the value to vertical position rather than length, which is less obviously connected to the axis. But when the points are connected, we tend to think in terms of the distance from the axis, not in terms of a few points floating in space.

Line charts with a non-zero baseline are very common. They are still problematic, however, because the apparent change can be deceiving. Having to look at the numbers on the axis to figure out the amount of change requires a lot more mental work and partly defeats the point of the chart.

Mapping Change

So what alternative do we have when we want to create a chart that makes the change visible, but the amount of change is small compared to the absolute values? One way is to plot the change separately. This could be done as percent or absolute difference, here it is absolute difference (same values shown as lines and bars).

Now the scale for the amount is independent of the scale for the change. This also makes it easy to see whether the change is positive or negative, because the relation with the zero line is very visually salient (especially when using bars). Also, the rate of change is much more obvious. While that can be seen in the bar and line charts, it is much harder to get a good sense of it.

Showing small changes in large values is a challenge, but it helps to ask, what do we care about here? What do we need to know? That should guide the way the data is shown.

Meet @InfoVis_Ebooks, Your Source for Random InfoVis Paper Snippets

Mon, 04/22/2013 - 04:21

Categories:

Visualization

Are you looking for inspiration while writing a paper or grant? Do you feel that there is a lack of information visualization content on Twitter? Is your timeline too empty and slow? Follow @InfoVis_Ebooks, a Twitter account that posts random pieces of text from infovis papers.

Related Work

Accounts that tweet more or less random snippets of text have become a genre in themselves. If you’ve spent any time on Twitter, you’ve probably seen the one that started it all: Horse ebooks. Despite being a spam account, it has almost 170,000 followers who presumably enjoy its random and often nonsensical tweets. Following in its footsteps are more or less serious accounts, like Bogost ebooks, which tweets pieces of Ian Bogost‘s writing.

Materials and Method

InfoVis Ebooks takes a random piece of text from a random paper in its repository and tweets it. It has read all of last year’s InfoVis papers, and is now getting started with the VAST proceedings. After that, it will start reading infovis papers published in last year’s EuroVis and CHI conferences, and then work its way back to previous years.

Each tweet contains a reference to the paper the snippet is from. For InfoVis, VAST, and CHI, these are DOIs rather than links. Links get long and distracting, whereas DOIs are much easier to tune out in a tweet. If you want to see the paper, google the DOI string (keep the “doi:” part). You can also take everything but the “doi:” and append it to http://dx.doi.org/ to be redirected to the paper page. For other sources, I will probably have to use links.

As the name suggests, InfoVis Ebooks is about infovis papers. If you want to do the same for SciVis, HCI, or anything else, the code is available on github.

Results

InfoVis Ebooks currently tweets roughly once every two hours. The time is randomized, and there can be much more (and less) than two hours between tweets; it all depends on how chatty the bot is feeling.

The results are sometimes nonsensical, sometimes funny, and sometimes pieces of code or formulas. Despite the limited set of papers right now, there is a lot of variety in the tweets.

Conclusions and Future Work

This is clearly only the start, and further research is needed. The number of sources needs to be expanded, which is a slow, manual process. The goal is to eventually not only include papers (and maybe posters), but also have the bot follow visualization blogs.

In addition to the text, the document database knows the venue and year a paper was published. The idea is to be able to focus the tweets on papers from a particular venue (e.g., during a conference, only tweet from papers that were published in earlier years at that same place), or restrict to a time period (vintage papers from the early 90′s?).

The bot will be continue to get tweaked to create more interesting and entertaining tweets. It is currently based on some very simple heuristics and rules for what makes a snippet acceptable, but I plan on refining those over time.  Also, a user study.

Data: Continuous vs. Categorical

Thu, 04/18/2013 - 06:03

Categories:

Visualization

Data comes in a number of different types, which determine what kinds of mapping can be used for them. The most basic distinction is that between continuous (or quantitative) and categorical data, which has a profound impact on the types of visualizations that can be used.

The main distinction is quite simple, but it has a lot of important consequences. Quantitative data is data where the values can change continuously, and you cannot count the number of different values. Examples include weight, price, profits, counts, etc. Basically, anything you can measure or count is quantitative.

Categorical data, in contrast, is for those aspects of your data where you make a distinction between different groups, and where you typically can list a small number of categories. This includes product type, gender, age group, etc.

Both quantitative and categorical data have some finer distinctions, but I will ignore those for this posting. What is more important, is: why do those make a difference for visualization?

Quantitative Data: Values

Most data sets contain both types of data. It’s actually quite difficult to visualize data that is purely quantitative or purely categorical (parallel coordinates are a good way to show the former, parallel sets for the latter).

Let’s take the example of a hypothetical coffee chain and look at their profits. A simple bar chart can show this data broken down by product type.

As simple as this chart is, some decisions had to be made how to show the data. The quantitative Profit variable is shown well by position or length. The categorical Product Type naturally divides the data into individual items, hence the bars.

What if we picked a different variable for the second axis, one that is continuous? This changes the type of chart we want to a line chart.

Profit is now on the vertical axis, but it is still a continuous variable. We might treat time as categorical, which would give us another bar chart, perhaps with one bar per month (or whatever granularity we want). But I decided to treat time as continuous here, which results in a line chart. Time is a special case that can be either type, depending on the way you want to look at the data. To focus on individual months, treat time as discrete and use bars. To look at trends and the rate of change (and thus, the space in between the data points), use continuous time.

Line and bar charts can appear to be interchangeable, but they are usually not. The encoding is subtly different (length for the bars, position for the line), and there is a clear implication in the line that there is a continuum between the points. Using a line chart for the product type chart above would not make sense, since there is nothing in between Espresso and Herbal Tea. Even if we only have one data point for each month, though, time is still continuous, so we can treat it as such if we want.

Categorical Data: Breaking Things Down

We often want to see more than two data attributes at the same time. Categorical axes can be used to break data down further. Each category is subdivided by the categories of the additional dimensions. Adding two categorical dimensions, Market and Year to the initial chart gives us a lot more bars. 

Here, time is now categorical, which means we get separate bars for each year. We’ve also broken out the different regions to get individual bars for every combination of market, product type, and year. There are other ways to show the same data: we could stack the bars for the different product groups, for example. Which dimensions are nested, and in what order, is also important. We could decide that we want to see each product type broken down by market instead, rather than the other way around, or maybe break each year down into markets, and look at the products across those combinations.

Which is the right configuration depends on the question you want to ask. But the type of visualization has not changed, we are still looking at bars. Adding categorical dimensions to a visualization usually divides the visualization up rather than changing the type.

The same thing can be done for our line chart. Let’s break that one down by product type.

The axis mappings have not changed, they are still (continuous) time and profit. But adding the product type subdivides the total into four separate lines. We can now see how each of them have done over time, which ones are flat, which increasing, etc.

Adding color is not strictly necessary here, but it makes following the lines and identifying them much easier. Color works great for categories, at least as long as the number is reasonably small.

More Encodings

These examples are very straight-forward. Simple charts tend to work well for a small number of data dimensions. More unusual encodings should only be used when more variables are needed. As an example, let’s look at sales compared to profits in a scatterplot.

The scatterplot shows two numerical values using position along each axis. I’ve added two categorical ones: color and shape. This shows me that the West market had the highest sales in all but the Coffee category (look at the locations of the X marks compared to the other shapes of the same color), though not always the highest profits.

Like color, shape works well for a small number of categories, because we can really only tell a very limited number of them apart (10 is roughly the maximum for both).

If we wanted to add another quantitative dimension, we might use size, though that would start to overload the chart. It is usually a better idea to keep the number of visual variables (like color, shape, size, orientation, etc.) small, as they interact and become difficult to read. It is often more effective to create several different charts or rethink the question to make sure all these dimensions are really needed at the same time.

Data types play an important role in visualization because they determine what visualization types can or should be used. That doesn’t mean that there is only one chart for any combination of data types, but it does narrow down the possibilities.

How to Keep Following eagereyes After the End of Google Reader

Mon, 04/08/2013 - 03:14

Categories:

Visualization

With Google Reader shutting down July 1st, now is the time to find alternative ways to follow your favorite blogs. For this one, you can now get new postings on Facebook and through a dedicated Twitter feed, in addition to the RSS feed. See below for some RSS aggregator/reader alternatives to Google Reader.

Facebook and Twitter

I don’t use Facebook and Twitter to follow feeds, that’s what I have my RSS reader for. But for people who like doing that, perhaps using Flipboard or similar, I have now created pure feed accounts. No talking, just links to new postings.

There’s still my personal Twitter account, of course, where I will also retweet the new posting tweets.

Feedburner

Luckily, I never trusted feedburner, so almost 90% of you are subscribed to the feed URL on my website directly. This used to point to feedburner (via some redirect magic), but I switched that a few weeks ago, when Google announced the end of Reader. Feedburner is not on the chopping block yet, but it can’t be far behind.

Some time this or next week, I will completely phase out feedburner. If you’re following this site using the feedburner feed, you will see a posting appear that will tell you where to subscribe to the original feed. As the posting will point out, you will not see any further updates in that feed (other than maybe a few nag postings to remind you to change your subscription) after that.

If you’re among my 40 or so email subscribers, you will soon get an email asking you to resubscribe to the site with the new mechanism powered directly by WordPress. If you don’t want to wait, you can do this now using the subscribe field at the bottom of the navigation bar on the right.

Google Reader Alternatives

It’s important to understand that Google Reader is not just a website for reading feeds, but also the service that virtually all RSS readers currently talk to to get feed items from and synchronize status between devices. So even if you’re using a client and don’t remember ever having seen Reader, you’re almost certainly using it and will lose access to your feeds come July 1.

A number of alternatives have sprung up in the last few weeks. The service that has gotten the most attention from the people I talk to is feedly. They have very nice apps for iOS and Android, as well as plugins for Chrome and Safari. At this point, they are still talking directly to Google Reader and keep it in sync when you make changes (or mark things as read). That means you can easily try feedly and if you don’t like it, you can go back to your previous reader and not have to worry about having to wade through hundreds of items you’ve already seen.

Eventually, feedly will let you disconnect from Reader and then host their own feed aggregation service. They also seem to have some ideas that go beyond simple feed aggregation, which is good. It’s not clear whether their aggregation service will be free (like their apps currently are), but I hope that they will charge. That’s the only way they will be around for the long haul.

Feedbin looks like a good service if you’re prepared to pay money ($2/month, $20/year) from the start. Its web interface is quite nice and it will be one of the backends Reeder will talk to at some point in the future (Reeder is a beautiful RSS reader app for the Mac and iOS that acts as a frontend for Reader).

Other services worth mentioning are The Old Reader and Newsblur. I haven’t tried The Old Reader, and I’ve only played with Newsblur briefly, so I don’t have anything intelligent to say about them.

Google Reader is Dead, Long Live RSS!

There is a lot of value in simple feeds and Real Simple Syndication (which is what RSS stands for). I love Twitter, it’s incredibly useful and fun, and I spend way too much time there. But it doesn’t do what RSS does.

The end of Google Reader is unfortunate, but I think once we’ve all figured out what alternatives to use, it will be a good thing. Google pushed all the other RSS aggregators out of existence, and then mostly just sat there doing nothing. There is a good chance that the new crop of feed aggregators that is sprouting now will lead to some real innovation in this area.

The Revolution Will Be Visualized

Thu, 04/04/2013 - 04:11

Categories:

Visualization

In the 1970s, it was the protest songs. In the 1980s, it was the anti-war movies. Today, the protest is no longer happening in songs or movies. Today, it’s online, based on data, and using visualization.

Gun Deaths

It’s a very abstract and yet very clear image: something moves along a trajectory, is suddenly stopped, and drops to the ground. A gun has been fired, somebody has been killed. Periscopic’s U.S. Gun Deaths visualization is visceral and it doesn’t just show data: it makes an argument. People are being robbed of their lives. Hundreds of years are lost every day.

In the deleted slides from his Tapestry talk, Jonathan Corum criticizes the visualization because there are elements that don’t mean anything. The filtered views also don’t work nearly as well as the initial animation. But the point is made there. It’s the impact, the punch in the guts that makes this work.

Drone Strikes

Pitch Interactive’s Out of Sight, Out of Mind shows U.S. drone strikes in Pakistan. It breaks down the victims into high-profile targets, alleged combatants, civilians, and children. It’s essentially a stacked bar chart.

But the animation of the dropping bombs gives the strikes much more of a reality than a mere monthly number would. And the number of people killed is staggering when you see it as bars like that. These aren’t just bars, but they have segments, one for each person.

Switch to the Victims view and it gets even more personal. A small figure is drawn for every person killed. Continuous bars don’t give you a sense of individuals, but little figures do.

Guns Again

The Huffington Post’s Mapping the Dead: Gun Deaths Since Sandy Hook shows gun deaths since the elementary school shooting that got so much attention last December. It’s a simple map, but with a twist: it zooms out from Newtown, CT, to reveal the entire U.S. and all the gun deaths over the last few months. It’s breathtaking.

Hovering over the bars also gives you something else: names. These are not just numbers, they were real people. Listing them, similar to the figures for the drone strikes, makes them much more tangible and real.

The New Language of Protest

How do you make people notice an issue? How do you get them to care? What if we’re no longer moved by songs (and the artists too comfy and reluctant to take sides) and no longer want to see movies about real issues (and Hollywood won’t take the risk of offending anybody)?

What if the new way to get us to care is with a visceral, raw display of data?

Glimpses of Data: The CBO’s Snapshots

Sun, 03/24/2013 - 11:00

Categories:

Visualization

Arguments in data visualization are so fierce because the stakes are so low is a great zinger that I’ve heard a few times recently. But it’s not always true. Data visualization influences important decisions every day. The Congressional Budget Office’s new snapshots are but one example.

The role of the Congressional Budget Office (CBO) is to provide information to members of the U.S. Congress so they can make better decisions. The usual way of doing this is through reports that are prepared on a variety of topics.

Snapshots are like tweets: they contain a small amount of information, but are crisp, to the point, easy to consume, and link to more in-depth information to be found elsewhere.

The CBO does not make policy recommendations, which makes creating charts with a purpose and message a much bigger challenge. You won’t find any monsters here, but the points are still clear and easy to follow.

Rather than overwhelm the reader with numbers, snapshots are constrained on purpose, with exact numbers largely missing. That’s what the reports are for, after all, that can be found at the URLs at the bottom right of each snapshot.

Eventually, the idea is to print these onto 4“x6” index cards, which seems to me to be a crucial component of the campaign. Having them pop up on the CBO blog is nice and all, but they will have much more impact when they are clipped to congresspeople’s and senator’s memos, shoved into pockets, and just lying around on tables and desks to be picked up randomly.

It may well be true that many arguments in visualization are pointless and petty. But in some cases, the stakes are high.

I’m honored to have been asked to provide input during the design phase of this effort. Like with all my secret government work, the CBO will neither confirm nor deny my involvement.

Study on Creative Data Visualization

Fri, 03/22/2013 - 11:05

Categories:

Visualization

To explore how we can make it easier to create new visualization designs, we are running a study based on a new approach, called visualization primitives. It lets you map data to the properties of objects like rectangles and ellipses. Build something with data, have fun, and help us figure out if it works!

This being a study, it asks you a few questions before you start, but that takes less than a minute. Then there’s a brief tutorial, after which you’re free to play. Build interesting things with the data (we’re using the OECD Better Life Index data) and submit the ones you like. This is all anonymous, obviously. My student Drew Skau, who is running this study, will analyze the data to see what kinds of things you and others are building and how much of the design space you explore.

Don’t forget to hit Done when you don’t want to play any more. You then get asked a few more questions that also won’t take much time, but that are crucial for the study. Once you’re done with those, you can continue building or just close the window.

Here’s the link to the study: Visualization Creativity Study

A Better Definition of Chart Junk

Mon, 03/18/2013 - 04:32

Categories:

Visualization

Maximizing the data-ink ratio sounds like a good idea, but when actually followed to the letter produces terrible and nonsensical results. Here is a more reasonable definition of chart junk that does away with the pretense of a mathematical formula and puts some common sense back into the question of good chart design.

Much has been made of Tufte’s famous data-ink ratio, and many people like to rail, privately and online, against chart junk. In short, the data-ink-ratio defines the amount of information your chart elements (“ink”) are providing, with the goal of maximizing that ratio. Since we can assume that the information is constant, this means we need to minimize the amount of ink. Any ink on your chart that does not convey data is considered junk.

While this extremely reduced definition makes for great flame war fuel, it places the emphasis on the wrong question, and when property followed, leads to largely nonsensical charts (this example is from Stephen Few’s recreation of Tufte’s argument).

The first issue is the whole notion of ink. What does that even mean? If you live in a world of black ink on white paper, that may be a reasonable criterion. But add color and the whole thing breaks down. Color can be used well and can be terrible. Reducing ink does not tell us anything about that. The same is true for interactions like mouse-overs, sorting, and other conveniences our modern visualization machines afford us.

There is a parallel here with writing. While you might argue that using fewer and simpler words is generally preferable, nobody would argue that writing is merely a question of maximizing the information-to-letters ratio. Good writing needs clarity and simplicity just as it needs variation, voice, explanation, and many other things.

Which brings me to my alternative definition of chart junk:

Chart junk is any element of a chart that does not contribute to clarifying the intended message.

Do you have more bars than necessary? Get rid of them! Are you missing context that would help people understand the values better? Add it in! Is your use of color distracting from the message? Change it! Are people not able to figure out what you are telling them? Use highlights!

Do you see the difference? Instead of minimization at all cost, we are now asking questions about the purpose of this thing you are creating. We are no longer pretending that visualization design is a mathematical optimization, and instead thinking about what we want to achieve.

Chart junk is still chart junk. Don’t add meaningless nonsense to your charts! Don’t clutter them up! Reduce the impact of grid lines, etc. But also think about how you can clarify the message, how you want people to read your data, and what you want them to take away. Perhaps adding things will actually help. What was considered chart junk before might turn out to be useful.

Tableau Desktop Now Free For University Students

Thu, 03/14/2013 - 05:33

Categories:

Visualization

If you are a student at a university, you can now get a free license for the full version of Tableau Desktop. No matter if you use it in class or for research, this is the full version that does not restrict the amount of data or the kind of connectivity (like Tableau Public does). The license is good for one year and can be renewed as long as you are enrolled at university.

This has been in the works for a while, and I’m very happy to finally see it happen. Tableau’s roots are in academia, so it only makes sense to make it available to students and faculty. There was a poorly advertised way of getting a discounted version of Tableau before, but that still cost a bit of money and required you to jump through some hoops. The new program asks you a minimum of information for verification, and is automated, so in most cases you will get your license within minutes. This is not restricted to the U.S. either, though verification can take a bit longer in that case.

Tableau for Students is different from Tableau for Teaching (TFT). The latter is specific to a particular course where Tableau is used as part of the teaching (I’ve used Tableau to teach Visual Analytics several times, for example). The licenses for TfT are limited to one semester, and you were not supposed to use them for research. That restriction no longer applies, and in fact we hope that you will find Tableau useful for your data analysis!

Tableau for Teaching is not going away, though. If you are an instructor and are thinking about using Tableau for a course, get in touch! There are real, enthusiastic, awesome people who handle these emails, and they’re happy to give you licenses, answer questions, and help in any way they can. You can also contact me, if you want to talk to a grumpy ex-professor.

So if you’ve ever wanted to try Tableau, and you’re a student, this is your chance now.

Visualization Makes Things Real

Mon, 03/11/2013 - 04:36

Categories:

Visualization

Vision is the sense we most identify with: it tells us where we are, who we are talking to, what we are doing. It defines our world like no other sense. What we can see is real, for better or worse.

Reproductive Cloning

In 2003, Nigel Holmes was working on an information graphic on stem cells for Stanford Magazine. This was the result of extensive discussions with the scientists, in the course of which the subject of reproductive cloning had come up. Yet the graphic seemingly makes no mention of that topic.

Reading the copy carefully, you might notice an odd paragraph at the bottom that is very strongly worded for no obvious reason.

Stanford researchers adamantly oppose so-called “reproductive cloning” in which a blastocyst would be implanted in a woman’s uterus, perhaps leading to a pregnancy.

What had happened? In the first iteration of the diagram, Holmes drew figures of female and male humans at the bottom of this image, stating that, in theory, the process of cloning shown in the upper part of the diagram could produce a human being. Seeing this, the scientists at Stanford threatened in no uncertain terms to withdraw from the interview and disassociate themselves completely from anything that was written (or drawn).

But why? The “offending” human figures had a caption that clearly stated that the Stanford scientists had no intention of doing this, and rather believed that this should be banned. But they were so uncomfortable with the “reality” of the image, which seemed to overpower the text, that they insisted that it be removed.

Guns

In a similar way, this map of gun permit holders in Westchester County, NY, created a strong response. I am not interested in the criticism (the map ignores population density, permits don’t mean actual guns, lots of guns can be bought without permits, privacy concerns, etc.), but rather the immediate reaction when you see this.

There are a lot of dots on this map, and they are everywhere! We can argue about the size of the dots, etc., but the point remains: lots of (potential) guns. Everywhere.

The information behind this is supposedly public, the total number of permit holders certainly is. But a number doesn’t tell you anything, it’s just a number. We read lots of numbers all the time without even realizing the order of magnitude. Showing the same information on a choropleth map of census tracts would perhaps create a nice, accurate, colorful map, but would not have the same effect. The fact that each permit is represented separately makes for a much stronger impact.

Conclusions

Visual representation gives numbers and concepts a reality they don’t otherwise have. Choosing the right one, and even choosing whether or not to create the visualization, makes a difference.

Many thanks to Nigel Holmes for letting me use the cloning image and providing a considerable part of the explanatory text.

Data Storytelling in Video

Thu, 03/07/2013 - 05:22

Categories:

Visualization

I’m not a fan of video. I don’t spend time randomly surfing YouTube, and when given the choice between reading an article and watching a video, I’ll read. The reason is that videos often don’t work well for me: they’re too fast or too slow, they take a long time to get to the point, they don’t let me skip around and browse easily. I’d rather be in control than having the information pre-packaged for me. But two examples have surfaced in the last few days that show data visualization can tell a very effective story in well-designed, well-paced videos.

Inequality in America

This video called Inequality in America has made the rounds on social media in the last few days, and as of this writing has over three million views. This is a great example of how you walk people through a fairly complex set of data and explain things like quintiles quite clearly. Sure, it’s not high-dimensional or Big Data, but it’s complex enough that many people will struggle understanding it (comparing distributions is hard).

The video is paced well and has a nice dramatic structure: not all is revealed right away, so it builds the story up nicely and then makes its main point close to the end. This level of clarity would be needed for a lot more information, and it’s a shame that there aren’t many more examples like this.

The Economist: Diminuendo

The Economist just released a “video chart” about music sales. It is much simpler than the above one, but it does a fair bit of storytelling about a simple stacked bar chart. The trick here is that these charts can be easy to just skip over without really looking at what they’re telling you. By walking the viewer through the chart, you get a better sense of what’s in there and why they chose to make that chart.

(This video is not embedded because there doesn’t seem to be a way of turning off the rather obnoxious auto-play. Another reason I dislike videos ;)

Granted, this is really simple, and most people would be able to figure it out. But I’m guessing that The Economist is planning on also doing this with more complex charts and stories. Also, the comparison of downloads at the end could have used some more love, but overall it’s surprisingly effective and well-paced.

Hans Rosling: Human Development Index

Finally, no posting about video would be complete without the grand-daddy of all data-based communication videos, Hans Rosling’s famous talk at TED 2006 using the gapminder tool. The amount of information he gets across, the steps, the pacing, and the enthusiasm are still unmatched. If you have seen this before, it’s worth watching again. If you haven’t seen it, you have to watch it now.

The ISOTYPE

Mon, 03/04/2013 - 05:26

Categories:

Visualization

Communicating data visually is not only about perception and precision, but also understanding. ISOTYPE was developed to bridge the gap between showing data in a way that’s easy to read and at the same time easier to understand than unadorned bar charts.

The International System of Typographic Picture Education (which is what ISOTYPE stands for) was developed in the 1920s by Otto Neurath, his wife Marie Neurath, and Gerd Arntz. It came out of the philosophy of the Vienna Circle (think Freud, Wittgenstein, Schrödinger, etc.), with the goal of changing the world by educating people about the world around them (see The Changing Goals of Data Visualization).

ISOTYPE is really a larger system, which includes two ideas that nicely complement each other: a visual language for creating icons, and the idea of using multiples to represent quantitative data.

The visual language is the better-known one. If you have seen traffic signs, public restroom signs, or any kind sign really, in the last 80 years or so, you have seen either ISOTYPE designs or designs strongly influenced by ISOTYPE. The icon language is based on ideas like the extreme restriction of angles and shapes used, only using two colors, symmetry, etc.

What is less known is that these icons were not only used for signage, but also to construct visualizations of data that were meant to educate people about the world. The icons make these images easier to read and remember, because the visualizations themselves contain hints to their subject matter. The following example compares the numbers of cars, phones, and radios in four different countries in 1937. What could be more obvious than to use cars, phones, and radios to represent cars, phones, and radios?

Icons typically represent multiples, though in this case each object represents just one, though in terms of how many per 50 people.

Multiples are nice and all, but they can be hard to count and compare. Notice how the icons line up to create little bar charts, however, for easy direct comparison. Even the different object icons are of the same width, though comparison between them is of course not necessarily meaningful.

Comparisons, like between countries, are one thing. But ISOTYPE also lends itself to telling little stories, by showing a progression through time. The following example shows the change in employment during the industrial revolution in England.

Here, each figure stands for a multiple, in this case 10,000, workers, and each bale of textiles represents 50 million pounds of product. Notice how the red factories (with the little smokestacks) are swallowing up the workers over the course of the 19th century. What is interesting is that the total number of workers stays roughly constant, while I would have expected more of a decline in employment. The amount of production increased dramatically, though.

The ISOTYPE may be close to 100 years old, but many of the ideas behind it are still very relevant and visualization might benefit from considering some of them.

The images used here, and many others from classic visualization texts can be found in Michael Stoll’s wonderful flickr collections.

The Halfway House To Nowhere

Thu, 02/21/2013 - 06:22

Categories:

Visualization

What is visualization for? Is it a tool help us understand data and the world, and to make better decisions because of that? Or is it just a debugging tool, a stepping stone towards intelligent machines?

As machines are getting smarter and more capable, we expect them to make more decisions for us. While that is natural, it’s easy to walk into a trap: to think that we can hand things over to machines entirely and stop caring and understanding. But things don’t work like that.

This is not a new idea. In their enthusiasm about early successes in the 1960s, computer scientists thought that they would soon be able to build thinking machines, electronic brains, that could perform many human tasks as well as, if not better than, us. And today, we’re no closer to building a thinking machine than we were 50 years ago. Watch this fantastic documentation from 1992, The Machine That Changed The World, take a critical look. In particular, the guy talking at 18:33 about how translators will soon be out of a job.

I like to think of visualization as putting the human back into the decision-making process. Rather than trusting algorithms to figure things out, I want to see the data and make the decision myself. It’s not just the decision itself, it’s about knowing why things are done. Understanding the world around us is one of our most fundamental human urges, and one of the things that set us apart from animals and machines.

Michael Driscoll says that Visualization is a Halfway House. What he means is a halfway house on the way to fully automated systems. And look how terrible things are with visualization!

But data visualizations still require human analysts to react and kick off another action, if they are to be useful.

Tim O’Reilly picks up the story (which is based on a comment he made) and describes visualization as debugging and exception-handling:

There will be many areas where visualization interfaces enable exception handling. But more and more, expect services that used to require you to look at a screen and make a decision to just make that decision for you.

Sure, some decisions can be made better by a computer. But a surprising number of them require a lot more thinking than you’d expect. A machine might be able to make a decision, but is it a good one? Also, why did it make that decision? Have you ever wondered why Amazon recommended some nonsensical product to you that you had no use for? Why would you trust a machine to make a decision for you that’s actually important?

I also take issue with the comparison with maps and the assertion that they don’t serve a purpose beyond going from A to B (which your GPS unit and, eventually, your self-driving car will do much more efficiently). There are studies that show that kids that are driven everywhere instead of riding their bikes or walking lose their sense of direction and distance. People who only drive by GPS direction don’t know the areas they drive in, even if they do so regularly, and can’t reliably give directions.

Machines that make our lives easier are a good thing. Machines that move heavy objects, that let us move faster, that keep our houses at the right temperature. Good machines, useful machines.

But when it comes to thinking, machines aren’t nearly as helpful. Thinking is not a chore, and understanding the world is not a necessary evil. Opaque decisions made by machines that don’t explain remove us from the facts and make us stupid. If we lose our ability to navigate, we are no longer in control of the space. These machines make us lazy, complacent, and dependent. Bad machines.

If visualization is a halfway house, it’s a halfway house to a dystopia. And I’d rather stop halfway, where things work, where I can understand them, and where I give a damn. Giving up our ability to understand the world means giving up our humanity. And a little bit of convenience cannot possibly make up for that.

As an aside, if you haven’t seen The Machine That Changed The World, you owe it to yourself to watch it. It’s not just a fantastic introduction to the history of computing and many of its key players, it’s also a wonderful time capsule in itself (it came out in 1992). All five parts are on YouTube: part 1, part 2, part 3, part 4, part 5.

Review: Scott Christianson, 100 Diagrams That Changed the World

Fri, 02/15/2013 - 04:52

Categories:

Visualization

I recently came across this book that claims to collect the 100 most important diagrams in the history of mankind. It’s a good collection, with many wonderful examples, though it has its flaws.

To get the main issue out of the way: the title is misleading. The selection in the book is not based on the quality of the diagrams, but rather of the invention or cultural shift they are associated with. I didn’t bother to count, but there are many examples of diagrams in the book that, by themselves, are really not interesting, but which are attached to important: the mobile phone, Apple computers, cotton gin, and the camera obscura, to name just a few.

Having said that, however, there is still a lot of value in Scott Christianson’s 100 Diagrams That Changed the World: From the Earliest Cave Paintings to the Innovation of the iPod. There are many interesting examples of diagrams where the actual diagrams are the innovation, or where the innovation would not have happened with the diagram: the flow chart, the periodic table, line and bar charts, ancient “sheet music” (carved into a clay tablet), the Pioneer plaque, etc.

This is a great book to flip through and pick up pieces here and there. The descriptions go into as much depth as they can on a single page, and give a lot of interesting historical information. The print and reproduction of images is mostly excellent, with a few images being of slightly lower quality.

It’s easy to argue about inclusions (the Intel 4004 processor?) and omissions (how could he leave out ISOTYPE?), but overall this is a great collection. It works well as a coffee table book and for browsing, as well as to appreciate the fine detail and resolution of many of the pieces. Yes, this book is only available as a hardcover, and that’s a good thing.

Paper: Storytelling, The Next Step for Visualization

Mon, 02/04/2013 - 05:38

Categories:

Visualization

Visualization is often considered to consist of three phases: exploration, analysis, and presentation. While the former two topics are covered well in the literature, there has been very little work specifically on presentation. In an upcoming paper, Jock Mackinlay and I argue that presentation, and in particular storytelling and communication of data, are the logical next step for the field, and provide some research directions.

The paper is titled Storytelling: The Next Step for Visualization and was written for IEEE Computer, in particular the Special Issue on Cutting-Edge Research in Visualization, to be published in May. This is important context to understand what we were trying to achieve and why the paper’s style is a bit unusual.

That entire special issue is likely to be very interesting. The guest editors, Min Chen and Theresa-Marie Rhyne, asked people in the field to write about what they envision visualization research to do 5, 10, or even 15 years in the future. While I don’t know what other papers will be in the issue, I’ve heard rumors that there were some high-profile submissions. And the topic clearly lends itself to some interesting thought experiments.

Our paper first explores the history of presentation and  storytelling with visualization, though there really isn’t much there. Only a handful of papers can really count as presentation, so we had to reach back a bit and include some historical perspective that goes beyond the visualization literature. We then talk about our particular idea of how we think storytelling should be approached, which is strongly influenced by journalism. It is not limited to the typical passive consumer though, so we outline some scenarios in which presentation and storytelling could be used. We illustrate these points with three case studies: Minard, Gapminder, and a story from the New York Times. The final section then provides some ideas about further steps and research directions.

If the formatting looks a bit odd, it is because this is a pre-print, which is still going through its editing phase. IEEE now makes these available very early though, in their rough form. While some of the text is likely going to change (since Computer has a particular style), the meat of the paper will not. Also, the goal of this paper was not a technical contribution or particular depth, but to lay out the general idea and discuss research directions. I hope to have some more in-depth work published later this year.

In related news, I was on a panel on storytelling with Alberto Cairo last week at the Computation and Journalism Symposium. Alberto has posted the slides he used during his opening remarks, with a bit of narration of his thoughts and some questions to go with them. I won’t post my slides or thoughts here for the moment, since I talked about some things that are about to become part of a paper submission, but I will say this: there is plenty of interesting work to be done with regards to presentation, communication. explanation, and storytelling with data.

Robert Kosara, Jock Mackinlay, Storytelling: The Next Step for VisualizationComputer (Special Issue on Cutting-Edge Research in Visualization), May 2013. (to appear)