Visualization

Revealing the Impact of Super Bowl Advertising on Social Media

Information AestheticsFri, 02/03/2012 - 18:28

Categories:

Visualization


The interactive dashboard at Brandwatch Super Bowl [brandwatch.com] shows the true impact of the highly expensive advertising that is shown during the Super Bowl, in particular on social online media.

Each so-called 'worm' represents a unique sponsor (including brands like Pepsi, Mars, Walt Disney or H&M). The accompanying number stands for the number of tweets that were made about that brand or their products over the last 28 days (and, yes, the 'worm' who has possession of the ball is winning).

The additional display on the right reveals the complete ranking of all tracked brands, complete with time-based sparklines, the positive versus negative sentiment of the daily tweets, and the most popular keywords that were used.

As a result, one can already attempt to estimate what will be the most anticipated ads for this Sunday, in addition to their expected content.

US jobless data: how has unemployment changed under Obama?

Datablog (the Guardian)Fri, 02/03/2012 - 16:35

Categories:

Visualization

Unemployment in America is down but still high. See how it has changed over time
Get the data
Click here for interactive map

The US jobless figures are out today and show American unemployment getting better.

The US added another 243,000 jobs last month, the biggest gain since April. The jobless figure is the lowest it has been since February 2009.
Employment grew in December and the jobless rate 'dropped' to a near three-year low of 8.5% - although a fall of 0.1%-points is really a static figure, rather than a big fall.

The official data shows:

• 12.8m people are unemployed
• The unemployment rate has declined by 0.8%-points since August
• The unemployment rate for men is down to 7.7% in December
• Rates for other groups are:
Women, 7.7%
Teenagers, 23.2%
White, 7.4%
Black, 13.6%
Hispanic, 10.5%
Asian, 6.7%
• The number of long-term unemployed (for 27 weeks or more) was little changed at 5.5m and accounted for 42.9% of unemployed people

No US president since FDR has won an election with unemployment this high. This shows what's happened since January 2009, when President Obama took over:

Data released this month by the Labor Department showed 367,000 Americans claiming unemployment insurance, a decrease of 12,000 from the previous week's figure.

We've mapped the unemployment data by state - you can explore it here:

You can download the full data below. What can you do with it?

Data summary

Download the data

DATA: download the full spreadsheet

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Simon Rogers
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Royal Statistical Society Christmas quiz: get the answers

Datablog (the Guardian)Fri, 02/03/2012 - 10:55

Categories:

Visualization

Just before Christmas we published the Royal Statistical Society's annual - and fiendishly hard - quiz. Here are the answers

Q1. Quotations by British prime ministers were rewritten using only the first and last letters alphabetically in each word.

(a) William Pitt the Younger: 'I could eat one of Bellamy's veal pies'

(b) Harold Wilson: 'All the little gnomes in Zurich'

(c) Benjamin Disraeli: 'I never deny; I never contradict; I sometimes forget'

(d) Margaret Thatcher: 'We have become a grandmother'

Harold Macmillan: 'Never had it so good' would be written as:

ar aan: 'eve ha it so ood'

Q2. The question referred to people who had a connection with either fruit or nuts.

(a) Tchaikovsky's Sugar Plum Fairy; 'Herbie Goes Bananas'; 'James and the Giant Peach'; Eugene 'Pineapple' Jackson; and 'Strawberry Fields Forever' (The Beatles) all demonstrate that 'Oranges are not the only Fruit' (Jeanette Winterson).

(b) 'The Nutcracker' and Cyrus Chestnut might interest 'Nutty Professors' Jerry Lewis and Eddie Murphy. Marc Almond, Kid Creole and the Coconuts and 'Peanuts' refer to seedpods that botanically are not true nuts.

Q3. Solvers were invited to unmask the silver-snatching criminal Green. After 315 minutes, Holmes enters Green's room, where Watson and Wellington are already present. Wellington is the dog in Mark Haddon's 'The Curious Incident of the Dog in the Night-Time' (suggested by the question title), whose own title was inspired by the Sherlock Holmes story 'Silver Blaze'.

Q4. Solvers were required to spot that the names of seven prominent people all contain a Greek letter: rho, nu, alpha, eta, iota, xi, mu. Delta Goodrem released 'Innocent Eyes', Catherine Zeta Jones appeared in the film 'America's Sweethearts' (Mary Pickford, nicknamed 'America's Sweetheart', was also allowed), and Sir Philip Sidney wrote 'An Apology for Poetry'. The title provides a clue to the answer and contains another Greek letter, kappa.

Q5. The question gave the first names of people depicted on banknotes in various countries: Adam Smith (UK £20), Andrew Jackson (US $20), John Macdonald (Canada $10), Sir Edmund Hillary (New Zealand $5) and Charles Dickens (UK £10), so the corresponding Australian man ($10) is 'Banjo' Paterson. The title refers to Ulysses S Grant (US $50) and a lion, which appears on the South African 50 rand note.

Q6. The linked sequences 10, 9, 60, 90, 70, 66, 96,... and 1, 4, 3, 11, 15, 13, 17,... give, respectively, the largest and smallest integers that have 3, 4, 5, 6,... letters in their names.

Q7. In each of the four phrases given, the initial letters of the words correspond to the musical notes of a piece of patriotic music.

(a) USA: 'The Star Spangled Banner'

(b) Wales: 'Land of my Fathers'

(c) Russia: national anthem

(d) England: 'Land of Hope and Glory'

Q8. (a) The list of numbers and letters denotes films by Charlton Heston whose titles contain numbers: 'The Ten Commandments', 'Three Violent People', etc. The missing elements are '55 Days at Peking' and 'Airport 1975'.

(b) The expression denotes the atomic numbers of elements in Tom Lehrer's elements song, from antimony to sodium.

(c) The list denotes books by Agatha Christie whose titles contain numbers: 'The Big Four', 'The Seven Dials Mystery', etc. The missing element is '4.50 from Paddington'. Noting that '12BMS' refers to 'One, Two, Buckle My Shoe', solvers should have calculated that the sum of the numbers given in the question is 45, an order of magnitude greater than 4.50.

Q9. Solvers were required to pair members of two groups.

(a) Songs from musicals: Fate-Kismet; Married-Cabaret; Memory-Cats; Popular-Wicked; Tomorrow-Annie.

(b) The first three letters of each word in Group 1 are the same as the first three letters of months of the year: Marylebone-3; Juliet-7; Mayfair-5; Separate-9; Janet-1.

(c) People who were born and died in the same years: (George) Eliot-(Emperor) Norton (1819-80); (Henry) Longfellow-(Giuseppe) Garibaldi (1807-82); (Sean) O'Casey-(Douglas) MacArthur (1880-1964); (Ivan) Turgenev-(Karl) Marx (1818-83); (Virginia) Woolf-(James) Joyce (1882-1941).

(d) The only five pairs of players who won exactly two of the four tennis Grand Slam titles each in a single year: Capriati-V Williams (2001); Court-Bueno (1964); Emerson-Newcombe (1967); Navratilova-Evert (1982); S Williams-Henin (2003).

(e) Animals that appear on countries' flags:

Cow-Andorra; Crane-Uganda; Lion-Spain; Snake-Mexico. One commonly-used variant of the flag of Peru features a vicuña, or, according to some sources, a llama.

The winner is John Shrimpton

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Simon Rogers
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Malaria cases around the world: how many are there?

Datablog (the Guardian)Fri, 02/03/2012 - 10:24

Categories:

Visualization

What we thought we knew about Malaria deaths is wrong: the reality is much, much worse. See what the new detailed data says
Get the data

Malaria causes twice as many deaths as previously believed, according to the latest research out today from the highly respected Institute for Health Metrics and Evaluation (IHME), based in Seattle, and published in the Lancet medical journal.

That figure of 1.2 million deaths for 2010 is nearly double the 655,000 estimated in last year's WHO World Malaria Report, which we detailed on the Datablog here.

How were the figures so wrong? The assumption has always been that the majority of those who die from malaria are children. In fact, the deaths are much evenly spread than that - adults die too.

Sarah Boseley writes today that

It also raises urgent questions about the future of the troubled Global Fund to Fight Aids, TB and Malaria, which has provided the money for most of the tools to combat the disease in Africa, such as insecticide-impregnated bed nets and new drugs. The fund is in financial crisis and has had to cancel its next grant-making round.

There is some good news in the data. Since the peak of 2004 of 1.8 million deaths worldwide, the number has fallen annually and between 2007 and 2010, the decline in deaths has been more than 7% each year.

The researchers said the key to collecting the new data was the use of verbal autopsy data.

In a verbal autopsy, researchers interview the relatives of someone who has recently died to identify the cause of death. IHME and collaborators around the world published a series of articles in a special edition of Population Health Metrics in August 2011 focused on advancing the science of verbal autopsy. Verbal autopsy data were especially important in India, where malaria deaths have been vastly undercounted in both children and adults. IHME found that more than 37,000 people over the age of 15 in India died from malaria in 2010, and the chances of someone dying from malaria in India have fallen rapidly since 1980.

We've extracted the data for deaths and death rates for all ages below. What can you do with it?

Data summary

Download the data

DATA: download the full spreadsheet

NEW! Buy our book

• Facts are Sacred: the power of data (on Kindle)

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Simon Rogers
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Library lending figures: what's the top book?

Datablog (the Guardian)Fri, 02/03/2012 - 09:43

Categories:

Visualization

Which books do people borrow? Which authors are the most popular? Find out
Get the data
Top authors
Top books
Last year's data

What was the most popular book borrowed from libraries last year?

We have the top 250 books according to book lending data from the Public Lending Right (PLR), which manages payments to authors. We don't have the number of loans but we do have the order of popularity.

The list is dominated by US author Dan Brown this year. Dan has overtaken James Patterson who topped the lending list the previous two years and has generally been very successful for a number of years.

John Dugdale notices that what's interesting about the list is what's missing in 2011. In his analysis Dugdale questions:

where is David Nicholls's One Day, Britain's No 1 bestseller in 2011 after making the top five in 2010? Like Dawn French's debut novel A Tiny Bit Marvellous, also a hit in both years, it's nowhere to be found. One inference would be that people are more likely to buy books they expect to read or refer to more than once.

This year PLR have provided some regional breakdown so we can see the popularity of books in different parts of the UK. This goes along with a little more detailed analysis of the authors popularity.

Download the data to get even more info - and see what you can do with it.

Top authors

Top books

Download the data

DATA: download the full spreadsheet

More data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Lisa Evans
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Revealing the Energy Consumption of Each Building in New York

Information AestheticsThu, 02/02/2012 - 19:16

Categories:

Visualization


The remarkably detailed map [columbia.edu] developed by the Modi Research Group of the Earth Institute at Columbia University reveals the total annual building energy consumption of New York, at both the block and 'taxlot' level (which is nearly at building level).

The map was built using MapBox. The total energy consumption is expressed in kilowatt hours (kWh) per square meter of land area. The data actually was not retrieved from utility companies, but calculated via an elaborate statistical model that is based on current large-scale estimates (e.g. the average energy use by ZIP code) in addition to lower-scale, estimated parameters (like the type and size of the building). Hovering over individual blocks or lots shows more detailed information, such as the type of energy being used, for which purpose (e.g. heating and cooling, electricity or hot water) and in what quantity.

More detailed information is available here. Via NYTimes Green, WSJ Blog and Co.Exist. Thnkx Adam!


Groundhog day 2011: how well can groundhogs predict the weather?

Datablog (the Guardian)Thu, 02/02/2012 - 18:00

Categories:

Visualization

Groundhogs like Punxsutawney Phil and Staten Island Chuck don't have a great track record for accurate weather prediction. See how we figured it out

Get the data

Groundhog day is celebrated today. On this day the length of the rest of winter is said to be predicted by how a groundhog behaves when it rears its sleepy head from its burrow. If the groundhog leaves the burrow it signifies that winter will end soon. If the groundhog goes back into its burrow then it predicts that winter will continue for another six weeks.

This begs the question: how well have groundhogs predicted the weather in the past? To answer this we delve into history to see how groundhogs have behaved on the 2nd of February.

We have taken the behaviour of groundhogs for years going back to 1999. We have used this to calculate the modal behaviour of the groundhogs, that is effectively the 'groundhog consensus' on the matter of winter each year.

Now for the tricky part: how do we measure if the winter ended or continued for six weeks? We have taken snow cover in North America for February of every year to 1999, but this alone does not tell us if the groundhog was right about winter ending. To work this out we need to know how snowy a given February was relative to an average February. We calculated the mean average snowfall for a February in North America using data from the last 10 years, then we subtracted this average from the snowcover for the particular year. This gives us an indication of the severity of the winter for that year relative to the other years.

The conclusion of our little study is that groundhogs have only predicted the length of winter correctly three times in the last 10 years.

There are lots of details we've skipped over to get this result. For example the groundhog's predictive skills could be regionally based, and so predicting the winter for the entire of North America is just not fair on the little guy.

It is clear that there is scope for further investigation. Here is the full data including the names of all the groundhogs that have taken part.

Data summary

Download the data

DATA: download the full spreadsheet

More data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Lisa Evans
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Road accident statistics: how safe are our roads?

Datablog (the Guardian)Thu, 02/02/2012 - 16:25

Categories:

Visualization

And why are deaths of cyclists going up? The latest figures are out today - see what they say
Get the data

UPDATE, Gwyn Topham: 16:20 We have been sent an interesting update on the cycling figures from Green mayoral candidate Jenny Jones. Although Boris Johnson has been keen to push his image as a champion cycling in London, the trend for safer cycling has reversed since he came into office. The TfL figures show, I calculate, a casualty for every 58,000 cycling trips in 2007 to a rate of about every 49,000 in 2010. Last year looks even worse, with deaths and serious injuries up from 358 to 407 for the first nine months recorded so far in the capital. While we don't have the numbers for the first three quarters of 2011 to make the same per trip comparison, the bare figures so far in London suggest it is set to be the worst annual toll since 2000.
The data has been added to our spreadsheet

Are our roads really getting safer? The official statistics out today from the Department of Transport show that, for a lot of people, they are. Except for cyclists, that is.

The drop isn't huge - but it is there. People killed or seriously injured on Britain's roads are by 2% to 6,630 in the third quarter of 2011. If you look at all casualties on the road, that's still 52,490 people hurt or injured - but it's down by 5% from the same period last year when it was 55,105.

Transport correspondent Gwyn Topham says the government will be relieved.

obviously good news in itself, but after two quarters of worsening safety - which might be dismissed as a statistical blip - a third set of bad results might have had fingers pointing. After steady decline of deaths over many years, the toll had started to climb. Without any overarching reason, this coincided uncomfortably with the accession of the coalition government and a transport secretary in Philip Hammond who pledged in Clarksonian terms to end the "war on motorists". The mood music around road safety seemed to have changed: Hammond mooted the raising of the motorway speed limit and abolition of annual MoTs. The latter plans were dropped yesterday by his successor Justine Greening.

The most notable increase is deaths and serious injuries to cyclists - an 8% year on year rise that, as a new Times campaign today demands, deserves real attention.

The data shows that cyclist's deaths have gone up to 970 killed or seriously injured in the three months to September 2011 - compared to 905 the previous year. Interestingly, overall cyclist casualties have stayed static - up only 0.1%. But at a time when all road casualties are going down, this stands out. And it's still 5,470 people.

We've also extracted the figures for all road casualties by police force - and added in the populations so you can compare each area properly. We don't have the latest cyclist figures by force yet as these are estimates.

The full data is below. What can you do with it?

Data summary

Download the data

DATA: download the full spreadsheet

NEW! Buy our book

• Facts are Sacred: the power of data (on Kindle)

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Simon Rogers
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

US presidential election fundraising: help us explore the FEC data | INTERACTIVE

Datablog (the Guardian)Thu, 02/02/2012 - 16:11

Categories:

Visualization

Interactive: Which candidate has raised the most cash? Where do the donors live? Find your way around the latest data from the Federal Election Commission with this interactive graphic by Craig Bloodworth at the Information Lab and Andy Cotgreave of Tableau.

What can you find in the data? Let us know in the comments below

How do I use this interactive?

Click on a candidate to see their fundraising data - or use the tabs to look at the figures state by state. Or view the donors themselves by zipcode.
Download the data

NEW! Buy our book

• Facts are Sacred: the power of data (on Kindle)

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Simon Rogers
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

US election funding: download the data

Datablog (the Guardian)Thu, 02/02/2012 - 15:23

Categories:

Visualization

How much money have US presidential candidates raised so far? And who got a large donation from a Zombie slayer? Find out here
Get the data

US elections may involve sums of money which seem astronomical to those more used to the modest spending of European elections, but they also release an unrivalled level of detailed data on contributions and spending.

The Federal Election Commission has now released donation data for the US' 2012 Presidential wannabes up to the end of December 2011 - figures covering hundreds of millions of contributions through more than 490,000 seperate donations.

The Guardian's liveblog summarised the revelations here. The key findings were:


Barack Obama and Mitt Romney are by far the most successful campaign fundraisers. The president banked $140m for his campaign to the end of last year. His Republican rival managed to raise $56.8m by December 31, compared with $12.7m for Newt Gingrich and $25.5m for Ron Paul. Rick Santorum struggled with $3.3m.

The filings reveal that the Romney campaign is dependent on the donations of a few wealthy individuals and corporations, while the Obama team rely more on many smaller donors. Bob Perry, a Houston developer who was a leading financier of Swift Boat Veterans for Truth which smeared the 2004 Democratic presidential candidate, John Kerry, gave $1m as did William Koch, brother of Charles and David Koch, who fund the Tea Party movement.

The Super Pac set up by the satirist Stephen Colbert last year raised $1m to the end of January. Donors included the West Wing star Bradley Whitford, and the lieutenant governor of California.

We've also dug into the data to build up a profile of the type of person who donates to the different candidates' campaigns. One of the most interesting factors is looking at the average size of donation a candidate receives.

Size of donation

Looking at donation size for each candidate - in particular what portion of their funds come from small donations (under $50) and what portion comes from big donations (over $2,500) - is telling: 26% of Obama's donations (2% of his cash) come from sub-$50 donations. For Romney, it's 7% – and only 0.2% of his cash.

By contrast, more than 70% of Romney's cash comes from donations larger than $2,500. This figure doesn't top 50% for any other candidate.


Donations month by month

The fundraising for each candidate has also been broken down month-by-month. While it's easy to use this to see 'spikes' in funding around big moments in the election calendar, this chart more than anything shows the sustained fundraising dominance of President Obama and Republican frontrunner Mitt Romney.

Occupation

Other measures are more esoteric. Donors are asked to give details of their occupation when they contribute to a campaign. This measure has several problems - "homemaker", for example, as a profession gives little information and can often be the partner of a corporate executive or similar. Retirees, who are also a huge source of funds, also give no indication to their previous profession.

That said, looking at the top five professions for each candidate is still illuminating: Barack Obama is heavily favoured by lawyers, Ron Paul by engineers (not surprising given his cult internet status), and Mitt Romney by business leaders.

Perhaps the best of all, though - and a Datablog scoop - is a donation of $1,456 to Ron Paul's campaign from a "Zombie slayer", surely a constituency whose support any candidate would seek.

Cities

Finally, we've gathered data on the top five cities donating to each donor. New York's huge influence shows here: it's the second-largest source of funds for President Obama, and despite not being known as a republican base, is the largest source of funds for Mitt Romney:

Get more data

We've gathered all of these statistics, and some more detailed tables, into a google doc here, or for the more adventurous, you can download the full 490,000 row database from the FEC. Let us know what you find.

Download the data

DATA: download our analysis spreadsheet
DATA: download the data from the FEC

NEW! Buy our book

• Facts are Sacred: the power of data (on Kindle)

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

James Ball
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Comparing the Fundraising Performance of the US Presidential Candidates

Information AestheticsThu, 02/02/2012 - 15:17

Categories:

Visualization


The NYTimes released a competitive dashboard of sorts, titled "The 2012 Money Race: Compare the Candidates" [nytimes.com]. Basically, the interactive graphic allows readers to contrast the various performance parameters in terms of fundraising from 2 presidential candidates next to each other. Another recent graphic [nytimes.com] lists the hundreds of organizations and people that fund the so-called Super PACs that are officially not controlled by those very candidates.

As also explained in the accompanying press article, both infographics reveal how President Obama continues to outraise all of the candidates currently seeking the Republican nomination. It is also remarkable, however, that some checks seem to come from sources obscured from public view, like those with only a post office box for a headquarters, and no known employees.

If you like to play around with the data yourself, you should be able to find it at the Federal Election Commission website (which, in fact, also publish simple but interactive infographics of their own).

Graphs Beyond the Hairball

EagerEyes.orgThu, 02/02/2012 - 05:30

Categories:

Visualization

Networks are usually drawn using a technique called node-link diagrams. While that works well for small graphs (the technical name for networks), it breaks down beyond a few dozen nodes. Better techniques exist, though these are currently focused on specific types of graphs or answer particular questions.

Node-Link Diagrams

When you think of a graph, you likely already think of a node-link diagram – unless you’re a mathematician. This technique is incredibly effective in communicating the basic idea of a network: there are nodes, typically shown as little dots or circles, and they’re connected by links, or edges in graph lingo. Even the difference between a directed and an undirected graph is obvious: little arrows mean that there’s a direction, no arrows means no direction. Lane Harrison gives a good overview on the Visual.ly blog. Carlos Scheidegger is also writing an interesting, if mathematical, series on graphs of the node-link variety.

These images are easy to understand even for people who have never seen such a diagram before, which is not something that can be said about many visualization techniques. Most people would also easily be able to figure out how to answer basic questions using such a diagram, like finding the person with the most friends (i.e., the node with the highest degree) or looking for highly connected groups that only have a small number of links between them (so-called cliques).

Hairballs

There is a catch, of course. The simplicity and beauty of node-link diagrams turns into clutter and confusion when the number of nodes and links gets too high: the dreaded hairball.

Many techniques have been developed to sort out the clutter: edge bundling, node filtering, edge lenses, many, many different layout algorithms, etc. But none of them provide a good, general solution to the underlying problem. The question also needs to be asked if the most obvious visual depiction is also the most effective. It may not be.

Matrix Methods

Matrix visualizations represent a very different approach. These techniques are based on the adjacency matrix, which defines which nodes in a graph are connected to which. Imagine a table with a row and a column for each node. The value in each cell of the matrix contains a value of 1 if there is a connection between the node in that row and that column, and 0 if not.

Matrix visualization techniques display that matrix rather than the node-link version of the graph. No more crossing lines and no more hairball. Seeing structures in such a visualization requires some training and some support from the visualization tool, but the advantage is that there are no more lines cluttering up the view. This illustration from Nathalie Henry and Jean-Daniel Fekete’s InfoVis 2006 paper MatrixExplorer: a Dual-Representation System to Explore Social Networks nicely shows how structures in the node-link diagram translate into the matrix view.

The rows and columns of the matrix can be rearranged, which represents one of the greatest strengths and weaknesses of matrix techniques at the same time: order matters. The patterns that are so obvious in the image above will be easily hidden by jumbling the order of the matrix. But given a good clustering and ordering algorithm, in particular one where the user can specify criteria and weights, a matrix view can show patterns very clearly.

Directed, Tree-Like Graphs: Node Quilts

A technique I found particularly fascinating at InfoVis 2011 was shown in a paper titled Developing and Evaluating Quilts for the Depiction of Large Layered Graphs by Juhee Bae and Ben Watson. Node quilts are designed specifically for directed, acyclic graphs (DAGs): graphs that have a hierarchical structure, where most links point from one layer to the next. This technique was originally designed for genealogical trees, but the version Bae and Watson studied allows links that point up as well (though they should be rare).

Node quilts cleverly exploit the fact that most of the action is in one half of the matrix by folding it to eliminate the parts that are (mostly) empty. The resulting visualization is much denser and also more informative: links that skip layers or that point back are shown outside the matrix itself.

This technique takes a bit of time and study to appreciate, but it extends the matrix visualization idea in a way that is very clever and useful – for particular tasks and data. But focus on particular types of questions is clearly a virtue given the issues with node-link diagrams in general. I also wonder how well the technique might work for undirected graphs, where the lower half of the adjacency matrix can be ignored because it is symmetrical. The focus on using quilts only for DAGs so far may be a bit more narrow than necessary.

What Are You Asking: PivotGraph

In many cases, it makes little sense to look at all the individual data items, while an appropriately aggregated view can provide much more useful information. This is the same idea as behind Parallel Sets and also almost all of the views in Tableau. In 2006, Martin Wattenberg published a paper on a technique he called PivotGraph that adapted the idea of aggregation for use with graphs. For the aggregation to work, there has to be data attached to the graph nodes, and it has to be partly categorical. This is typically the case when looking at rich data like email traffic, phone conversations, etc.

The PivotGraph has two interesting properties. First, it is very goal-directed: it requires the user to pick dimensions along which to aggregate, and which to use to lay out the graph. Second, it uses space in a very different way than node-link diagrams. While space in node-link diagrams is mostly there to avoid collisions between the nodes and clutter between the lines, it carries information in the case of the PivotGraph.

The example above shows the communication patterns between people in different departments (rows) and locations (columns). The width of the arrows represents the amount of communication going on (emails, etc.). This is aggregated information, not just along the edges but also in the nodes: each department and location consists of multiple people. What would have been a big hairball had all the individual items been shown has been turned into a much simpler image that answers a question.

Many questions that are asked about network data are of the same nature: How many people who have done A also do B? How do potential customers navigate the different elements on a website and where do they give up? What classes of products are bought together or in quick succession? etc.

The Graph Beyond the Graph

For a while now, people in visualization have talked about the graph without the graph, i.e., graph visualization without the hairballs. Networks are clearly important and challenging data, and it seems a bit myopic to only look at node-link visualization. Node quilts and the PivotGraph represent promising steps into a very different direction. While they require more work to understand and are more limited in what they can be used for, they are also much directed towards a goal than just showing all of the data. I think that this kind of thinking will lead us to much more interesting techniques in the future than trying to teach the old node-link diagram new tricks.

Presidential primary votes 2012: download the data so far

Datablog (the Guardian)Wed, 02/01/2012 - 14:00

Categories:

Visualization

Now Mitt Romney has won Florida, we have four primary results to play with. See the data for yourself for each state and county
Get the data
Live updated results

Four elections down, over 40 to go in the Republican presidential nomination 2012 process. Plus five Democrat states have returned their endorsements of Barack Obama too.

We've been tracking the results and making the data available in an accessible format for each election as it happens, which for the Republicans has been: South Carolina, New Hampshire, Iowa and now Florida. The Democrats have also voted in Nevada.

This is how it looks for the Republicans so far:

You can download the data for each county from the Google spreadsheet below. What can you do with it?

Data summary

Download the data

DATA: download the full spreadsheet

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Simon Rogers
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

US Elections 2012: Florida results county by county

Datablog (the Guardian)Wed, 02/01/2012 - 12:12

Categories:

Visualization

Mitt Romney tasted success in Florida, but how did the Republican presidential candidates do county by county?

Get the data

As the Florida primary results were announced last night they read like this:

Mitt Romney took 46.4% of the votes to win, while Newt Gingrich took 31.9% and Rick Santorum 13.4%.

The Florida primary is a very different contest to any of the previous three.
Guardian analysis explains why:

  • The battle [in Florida] is waged over the airwaves and media buying power is often decisive. That explains why Mitt Romney, whose campaign and associated super PAC have spent almost $14m in ads in the state, went into polling day with a double digit poll lead.
  • The high concentration of Latino voters - predominantly Cuban American - for example has made immigration a major issue. The key counties to watch are Miami-Dade and the two big ones on the crucial I-4 corridor: Hillsborough, which covers Tampa, and Orange which takes in Orlando.

The datablog is collecting data for the 2012 US presidential race, county by county, in one big spreadsheet.

In this spreadsheet you can already see the Iowa caucus, New Hampshire and South Carolina primary results.

We're working to keep on top of the recounts so the data is as accurate as it can be.

We have just added the latest Florida results. You can see a summary of the data below.

Some simple analysis of the votes received in each county relative to the total votes for that county shows that Mitt Romney was most strongly supported in the counties of Miami-Dade then Pinellas, and he received the least support in Liberty.

We are also interested in the voter turnout. What do you suggest we do with the voter turnout figures? And perhaps more importantly, what can you do with this data

Data summary

Download the data

DATA: download the full spreadsheet

More data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Lisa Evans
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Spot: Visualizing Twitter Dynamics as Particles

Information AestheticsTue, 01/31/2012 - 19:18

Categories:

Visualization


Spot [neoformix.com] by Jeff Clark is a comprehensive real-time Twitter visualization that uses a particle metaphor to represent unique tweets.

According to a set of user-defined keywords, the visualization gathers and displays the latest 200 tweets. The according particles are then organized in various spatial configurations to visually filter the available information by different parameters, such as: commonality of words, according to time, according to people, or categorized by the actual tool that was used to send the tweet.

The tool also allows to explore the tweets from a specific list, such as the "Top 100" of the datavis community, for instance.

More detailed information is available here.

See also Revisit and Digg Labs Swarm. Alternatively, check out Visual Digg Explorer by the same designer.

How will the cuts affect services preventing violence against women?

Datablog (the Guardian)Tue, 01/31/2012 - 17:45

Categories:

Visualization

Funding cuts are leading to slashing of budgets in local services helping to prevent violence against women. How will the cuts impact on the most vunerable?
Get the data

Cuts in the national budget are leading to significant cuts in local services aimed at preventing and protecting females against gender-based violence according to a new report published today.

The report commissioned by the Trust for London and Northern Rock Foundation examines the impact of public expenditure cuts on these services. Alexandra Topping writes today:

On an average day last year 230 women were turned away by Women's Aid, around 9% of those seeking refuge, because of a lack of space, the organisation has revealed.

And as further cuts begin to bite more women are likely to be put in danger, said Nicola Harwin, chief executive of Women's Aid, the largest national organisation for domestic and sexual violence services.

Freedom of information requests released in a major new report revealed that 31% of funding to the sector was cut by local authorities between 2010/11 and 2011/12, a reduction from £7.8m to £5.4m.

Compiled by Professor Sylvia Walby, UNESCO Chair in Gender Research, and Jude Towers of Lancaster University, the report records some key findings and details a summary of cuts to violence against women. The spreadsheet embedded below shows this summary which can also be downloaded from our spreadsheet.

Data summary

Download the data

DATA: download the full spreadsheet

NEW! Buy our book

• Facts are Sacred: the power of data (on Kindle)

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook


guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Prince Andrew's meetings listed

Datablog (the Guardian)Tue, 01/31/2012 - 13:26

Categories:

Visualization

A list of Prince Andrew's meetings and engagements with foreign leaders, politicians and diplomats over the past 12 months has caused Labour to ask questions over his role. Get the list of engagements here
Get the data

Prince Andrew is in the news today. It has been revealed that Palace records show the Duke of York undertaking engagements in Saudi Arabia and China for UK Trade and Investment (UKTI) since he stepped down as special representative for trade in July 2011.

According to the Court Circular, the Duke has met four foreign heads of state in the past six months. Robert Booth writes today:


The government has been urged to explain why Prince Andrew has met four foreign heads of state in the past six months and embarked on two full-scale government trade missions despite stepping down as the UK's special representative for trade.

The Duke of York announced last July that he would relinquish the role following criticism of his association with a convicted child sex offender, Jeffrey Epstein, and business connections with dictators including Colonel Gaddafi.

But palace records reveal he has remained at the heart of the UK government's export drive and has carried out 17 engagements in Saudi Arabia and China for UK Trade and Investment, an arm of the Department for Business, Innovation and Skills, since the announcement that he was no longer the special representative for trade.

The tables below show details of Prince Andrew's meetings and engagements over the past 12 months. These are not all the meetings the Prince has undertaken over this time period but those within four categories; meetings for or with representatives of UKTI, UK government ministers, UK government diplomats and foreign government.

The first table shows the number of engagements undertaken by month within the four categories explained above. The second table details each of the engagements individually along with the date of the event. The engagements sourced from the Court Circular are from the 28th January 2011- 27th January 2012.

Data summary

Download the data

DATA: download the full spreadsheet

NEW! Buy our book

• Facts are Sacred: the power of data (on Kindle)

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Simon RogersAmi Sedghi
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

British dead and wounded in Afghanistan, month by month

Datablog (the Guardian)Tue, 01/31/2012 - 12:30

Categories:

Visualization

What is the human cost of the war in Afghanistan for British forces? As British troop deaths reach 397, these are the latest figures - including the most recent wounded and amputation statistics
Get the data
Amputation statistics
Afghanistan civilian casualties
Interactive guide

The total number of British troop deaths in Afghanistan now stands at 397, nearing the 400 mark. According to the latest figures from DASA, 54 UK service personnel suffered amputations in 2011.

2009 was the bloodiest year for British troops in Afghanistan. 2010 nearly caught up. But last year, with the deployment of US troops to Helmand, things quietened down.

The number of British deaths in Afghanistan is now much higher than Iraq and even the Falklands conflict. These are the numbers of British fatalities for Afghanistan - and Iraq, too - updated as they change.

We've broken Afghanistan down month by month.

Research has found that the rate at which British soldiers have been killed in Afghanistan is almost four times that of their US counterparts, and double the rate which is officially classified as "major combat".

Analysis by the Medical Research Council's biostatistics unit at the University of Cambridge also found that the death rate of UK troops is twice that of 2006, when they were described as being involved in the fiercest fighting since their involvement in Korea 50 years ago.

The researchers said the "UK could expect at least as many military fatalities in 10 weeks in Afghanistan as in 20 weeks in 2006".

The official classification of "major combat" is a killing rate of six per 1,000 personnel years. For the 12 months up to May, the killing rate for British troops in Afghanistan stood at 13.

More complicated are the wounded numbers. Rather than one simple set of statistics, the MoD gives us three - all of which are included as a sheet in the dataset below (and summarised down the page).

• Firstly, you have the Noticas numbers. These are the most seriously wounded cases, where the family has been informed the wounded person has been "listed"
• Then there are the people registered at field hospitals - which go from the seriously to the lightly wounded, from all causes, violent and otherwise
• Lastly there are the personnel who've been evacuated by air, which could be serious combat injuries or illnesses such as dysentry

This is how the MoD defines it:

'"Very Seriously ill/ Injured/wounded" or VSI is the definition we use where the illness or injury is of such severity that life or reason is imminently endangered. "Seriously ill/Injured/Wounded" or SI is the definition we use where the patient's condition is of such severity that there is cause for immediate concern, but there is no imminent danger to life or reason. The VSI and SI categories are defined by Joint Casualty and Compassionate Policy and Procedures. They are not strictly medical categories but are designed to give an indication of the severity of the illness to inform what the individual's next of kin are told.'

What do you think? Can you do anything with the data?

Summary tables

Download the data

DATA: British dead and wounded, month by month as a spreadsheet - including names of dead
DATA: US casualties in Afghanistan and Iraq
DATA: how many troops does each country send to Afghanistan
INTERACTIVE: rollcall of the British dead

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook

Simon Rogers
guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Where open government data falls down: buying a train ticket

Datablog (the Guardian)Tue, 01/31/2012 - 11:59

Categories:

Visualization

Open data is all very well, but what if you don't release the most useful datasets of all? Paul Clarke on the scandal of UK transport data

Remind me again: what's the purpose of opening up all this public data?

Ah yes, that's it. To create value. And you can't get a much stronger example of real value in the real world than showing people how to save money when buying train tickets.

Fare pricing is a fairly hit-and-miss business, as you've probably noticed. We don't have a straight relationship between distance and price. Far from it.

The many permutations of route, operator and ticket type throw up some strange results. We hear of first class tickets being cheaper than standard, returns cheaper than singles, and you can definitely get a lower overall price by buying your journey in parts, provided that the train stops at the place where the tickets join.

The rules here are a bit weird: although station staff have an obligation to quote the cheapest overall price for a particular route, they aren't allowed to advertise "split-fare"deals, even where they know they exist. Huh?

Why this distinctly paternalistic approach? Well, say the operators: if a connection runs late, your second ticket might not be eligible, and there might be little details of the terms and conditions of component tickets that trip you up, and, and, and … well, it's all just too complicated for you. Better you get a coherent through-price (and we pocket the higher fare).

There's no denying it is complicated. Precisely how to find the "split-fare" deal you need is a tiresome, labour-intensive process of examining every route, terms and price combination, and stitching together some sense out of it all. And, indeed, in taking on a bit of risk if some of those connections don't run to time.

You might be lucky, and have an assistant who will hack through fares tables and separate websites to do you that for you. But you'd be really be wasting their time (and your money).

Because that sort of task is exactly what technology is good at.

Taking vast arrays of semi-structured data and finding coherent answers. Quickly. And if there's some risk involved, making that clear. We're grown-ups. We can cope.

There's no doubt at all that the raw materials--the fares for individual journey segments--are public information. Nobody would ever want, or try, to hide a fare for a specific route.

So when my esteemed colleague Jonathan Raper - doyen of opening up travel-related information and making it useful - in his work at Placr and elsewhere, put his mind to the question of how new services could crunch up the underlying data to drive out better deals for passengers, I don't doubt that some operators started to get very nervous indeed.

Jonathan got wind--after the November 2011 meeting of the Transport Sector Transparency Board--that a most intriguing piece of advice had been given by the Association of Train Operating Companies (ATOC) to the Department for Transport on the "impact of fare-splitting on rail ticket revenues".

Well, you'd sort of expect an association which represents the interests of train operators to have a view on something that might be highly disruptive to their business models, wouldn't you?

So what was that advice? He put in a Freedom of Information request to find out.

And has just had it refused, on grounds of commercial confidentiality.

This is pretty shocking - and will certainly be challenged, with good reason.

Perhaps more than most, I have some sympathy with issues of commercial reality in relation to operational data. We set up forms of "competition" between providers for contracts, and in order to make that real, it's inevitable that some details--perhaps relating to detailed breakdowns of internal costs, or technical logistics data--might make a difference to subsequent market interest (and pricing strategy) were they all to be laid out on the table. I really do understand that.

But a fare is a fare. It's a very public fact. It's not hidden in any way. So what could ATOC have said to DfT that is so sensitive?

The excuse given by DfT that this advice itself is the sort of commercial detail that would prejudice future openness is, frankly, nonsense.

I look forward to the unmasking of this advice. And in due course to the freeing-up of detailed fares data.

And then to people like Jonathan and Money Saving Expert creating smart new business models that allow us to use information like it's supposed to be used: to empower service users, to increase choice, and to deliver real, pound-notes value into the hands of real people.

That's why we're doing all this open data stuff, remember?

Paul Clarke is a photographer and writer, and sits on the Mayor of London's Digital Advisory Board and the Transparency Sector Panel in the Ministry of Justice. He blogs at Honestlyreal.

NEW! Buy our book

• Facts are Sacred: the power of data (on Kindle)

More open data

Data journalism and data visualisations from the Guardian

World government data

Search the world's government data with our gateway

Development and aid data

Search the world's global development data with our gateway

Can you do something with this data?

Flickr Please post your visualisations and mash-ups on our Flickr group
• Contact us at data@guardian.co.uk

Get the A-Z of data
More at the Datastore directory

Follow us on Twitter
Like us on Facebook


guardian.co.uk © 2012 Guardian News and Media Limited or its affiliated companies. All rights reserved. | Use of this content is subject to our Terms & Conditions | More Feeds

Subscribe to The Universal Pantograph aggregator - Visualization