Local news and data journalism

Screenshot of 'Mapping America' - A New York Times data journalism project

Mapping America – A New York Times data driven journalism project

Finding stories from datasets need not be something exclusive to large global news organisations populated with specialist journalists and developers devoted to producing data driven articles.

Regional and local news websites and blogs in the UK have a great deal of data available to them – with hopefully more on the way – and the potential to interrogate that data to produce interesting stories and visualisations is certainly there.

Doing data journalism on a local news website

Earlier this month, as web editor of the Hackney Post, a local news website run periodically by journalism students at City University as part of our production weeks, my strategy was to have one data driven story a day accompanied by interactives such as maps, graphs or charts wherever possible.

To plan for these stories, we looked back at what data had been recently released specific to Hackney or London boroughs. Possibly the best website for London-specific datasets is the London Datastore, built by the Greater London Authority (GLA) in a bid towards making more of the capital’s data freely available.

The GLA actually claims to be committed to “influencing and cajoling other public sector organisations into releasing their data” on the site, which is a very noble and praiseworthy, and there are some really interesting datasets to explore.

I found some recently published statistics on teenage pregnancies across London and some updated data on food hygiene, while the Bureau of Investigative Journalism had released the regional breakdown of the data behind their story about the prevalence of payday lender shops across the UK.

We looked at each dataset, trying to identify potential stories and we did not come out empty-handed. By analysing the food hygiene data for London, Joe Hall found that half of all Hackney’s food outlets were hit with hygiene warnings in 2013, a figure which constituted the highest rate in London. This is the story we published.

London boroughs food hygeine chart

How London boroughs rate in terms of food hygeine – chart by Joe Hall

When it came to teenage pregnancy across London over a 15-year period, Sophie Murray Morris found that while the rate of under-18 pregnancies has fallen by 38 per cent in the capital from 1998, the third biggest decrease of any London borough was in Hackney, with a decrease of almost 50%. This was the resulting story.

In both cases, the data was freely available in a clean, structured format and only needed some basic analysis in Excel to compare by region, providing some added context to the figures. The maps, created with Google Fusion tables, also include charts in the infowindow to show how each borough compares to the rest of London and England. (Here is a tutorial I wrote showing how to do that).

Hackney Post interactive map screenshot

Hackney Post data journalism interactive on teenage pregnancy rates in London

Proof lies in the analytics

Evidence of the value that data driven stories have in engaging readers lies in the analytics. In the month from February 24 to March 26, readers on average spent two minutes and thirty seconds on a page of The Hackney Post.

When it came to the stories involving interactive data content, all five stories published in the week I was editor were well above average in terms of this dwell time, with four of five posts enjoying almost double that at around 5 minutes.

These are by no means ground-breaking investigative data driven stories and could perhaps have been built up further with more analysis had we had more time to explore other potentially interesting patterns in the data.

However, it just shows that with some simple spreadsheet experience and basic data visualisation skills, regional news outlets can produce interesting stories going beyond citing just the headline figures for their area that generally lack context, as in this example, which is one of many similar local stories online.

The Trinity Mirror’s ‘data unit’ established last year, where data journalists supply their regional titles with engaging data driven stories on a daily basis, looks like a step in the right direction and hopefully an example of things to come.

Here’s an example of one of the most successful stories they’ve done, a schools database, as featured in the Birmingham Mail. The fact that they are also expanding the team shows that there is a real future in the work they do.

Hopefully, this is just the start when it comes to producing interesting data-driven content to engage regional audiences and not a one-off. As the ability to analyse and build stories using a large volume of data becomes an increasingly important skill for journalists to have, regional outlets need not get left behind.

Advertisements

How to build charts in your Google Fusion Tables info-window

Google Fusion Table chart

How a chart in a Google Fusion Table map looks

Interactive maps are often the go-to tool for data journalists wanting to provide readers with a visual aid to make sense of a story.

For reasons largely associated with the way it is collected and distributed, data is generally available by region – whether electoral ward, constituency, local authority, country – and the list goes on. This makes for visualising it using maps ideal, giving people a sense of how their own area is doing relative to other areas of their town, other cities, regions or countries.

Furthermore, with data published regularly in a consistent format, which tends to be the case in national government data releases such as that published by the UK’s Office for National Statistics, creating maps in a tool like Google Fusion Tables for example, is relatively straightforward once you get the hang of it.

But sometimes a simple choropleth map (a tutorial on how to do this here) on its own as a way of visualising data may lack analysis or context.

One way to introduce more insight into your interactive map is by adding charts in the info-window box, showing another layer of analysis.

During a recent data journalism class, our lecturer John Burn-Murdoch went through an example and gave us a few lines of the ‘code’ you need to put into the custom fusion table info-window as a starting point from a previous example.

This allowed us to adapt this to our own examples, which I used it to make the map to accompany this story on the levels of young female graduates in the EU.

Clicking on each country will bring up a column chart showing how the rate of female graduate compares to male ones.

google fusion table map with chart

How to make charts appear in the info-window

Essentially, this works by drawing data from your fusion table and the Google Charts API, displaying a bespoke chart for each region when you click on it. It all depends on the chart parameters you input and what data you ‘ask’ it to display.

To try it out, when you merge your KML fusion table with the table containing your data and make the necessary adjustments to how you want to visualise your map, then click on ‘Change info window’ in the left side of your screen in your configure map options.

configure Google Fusion table info-window

After that, you need to click custom in the box that pops up, which in turn allows you to erase the automated content of the info-window and paste in your own, or as in this case, an example from a previous map which you can then adapt to fit your own.

Below, I have pasted the ‘code’ from my map for you to modify to fit your map.

I will proceed to go through each of the elements.

Therefore, I am starting from the end to guide you through how to adapt it to your own map.

<div class=’googft-info-window’>

<style=”font-family: sans-serif”>EU Tertiary education attainment, 2012<br><font size=”+2″><font color=”#980043″>{Country}</font></font><br><p><br><img src=”http://chart.apis.google.com/chart?

chxt=x,y

&chxr=1,0,60

&chds=0,60

&chbh=a&chs=310×150

&cht=bvg&chco=980043

&chd=t:{% of women aged 30 to 34 with a degree},{% of men aged 30 to 34 with a degree}

&chtt={difference in rate between women and men}%+difference+between+women+and+men

&chxl=0:|Women|Men|

“></p>

Title and appearance:

<div class=’googft-info-window’>

<style=”font-family: sans-serif”>EU Tertiary education attainment, 2012<br><font size=”+2″><font color=”#980043″>{Country}</font></font><br><p><br><img src=”http://chart.apis.google.com/chart?

So, this part dictates the appearance of your chart, from the font to the title which is “EU Tertiary education attainment, 2012”, to the size of the title, colour of text and what is displayed at the top.

The thing to remember is that anything in curly brackets is a value from the fusion table and changes according to what country you click on. google fusion table customising info-window

So in this case, with {Country}, each time you click on a specific country, it will display the name of that country. The same applies every time the curly brackets are used and it is especially important further down when it is drawing on values in your columns.

A <br> tag indicates a linebreak, while <p> is a paragraph tag.

After that, the rest of the information will determine the type and features of your chart.

Chart Type

Cht for example denotes your chart type. What comes after cht= will determine what type of chart it will be. Bvg is a simple column chart, while lc for example will give you a line chart if that’s the best way of displaying your data.

Here is a list of all the  different chart types and their code.

When you change chart type, you may need to modify the other parameters you have to get it to display and a lot of trial and error is the best way of figuring this out.

Chart features – other parameters

&chco=980043 

Series colour

&chs=310×150 

Size

&chbh=

Spacing between bars

chxt=x,y 

Visible axes

&chxr=1,0,60

Axis ranges

&chds=0,60 

Scale

&chd=t {% of women aged 30 to 34 with a degree},{% of men aged 30 to 34 with a degree}

The data you want your chart to display. You can either use a curly bracket to bring in data from your table or input a value. After each data range use | to indicate a new value range in the next column/line etc.

&chtt={difference in rate between women and men}%+difference+between+women+and+men

Chart title. You can either use a column from your fusion table to change based on each value, or just text. As you can seem I have used both. Important to say here that you need to use a + sign whenever you want to have a space in between the text, as it will not recognise spaces.

&chxl=0:|Women|Men|

Axis labels

Here is the full list of chart parameters when making your chart and you will need to change these around based on your values and what type of chart you want to display, but having the code already is a good starting point.

Happy mapping!

Data journalism tools – using Quartz’s chart builder

Quartz chart

Quartz chart

There appears to be a real appetite for new tools attempting to make life easier for data journalists.

Ranging from the advanced, like powerful software for cleaning messy data Open Refine to the simple interactive chart-maker Datawrapper, a real favourite among the data driven journalism community, data journalists have an arsenal of tools at their disposal.

I recently found out that in-house application that digitally native global economy news website Quartz use to make charts is open source and freely available for anyone to use, and it really is another tool worth adding to the list.

Quartz’s chartbuilder is very much like Datawrapper but without the added level of interactivity, which is actually neither always necessary nor possible, for example for wordpress.com accounts that do not allow embedding charts.

To use the chartbuilder, you simply paste the data into the tool and then pick between bar, column, line and scatter depending on how you want to visualise each data series.

Quartz chartbuilder back end

Quartz’s open source chartbuilder is as simple a data journalism tool as they come

There are fields in the workspace for tweaking things like graph axes and labels, while any change you make is visible as you work.

I recently used it for a data-analysis of an EU survey on data roaming and found it to be extremely straightforward, while the option to export the chart as a picture file works a charm.

It’s important also to consider Quartz’s focus on mobile optimised content and how a lot of more interactive content does not really work on mobile, whereas more ambitious tools may not work so well on a smaller screen.

Also, for those into their design, you can export the chart as an SVG, open and subsequently edit it in Illustrator.

Quartz initially created the chart builder in order to lower the barrier for what they described as “non-technical and less-designerly journalists” to create charts in their own newsroom.

According to the application’s creator, David Yanofsky, the chart-builder “has helped all of our reporters and editors become more responsible for their own content and less dependent on others with specialized graphics skills”.

And opening it up to the rest of the world will hopefully see journalists elsewhere become more responsible for the data and visual content to accompany their articles.

Can we work out whether the 50p tax rate raised money?

50p

Photo: Howard Stanbury/Flickr – Creative Commons license

The debate following the commitment made by Labour Party Shadow Chancellor Ed Balls of restoring the 50p tax rate for the top earners has been emblematic of how politics and electoral campaigning are played out in the media.

First a party leader, comes out with a policy announcement, backed up by figures and often a new study assessing its financial impact or implications.

Not long after, other party leaders, using a different set of figures or perhaps taking a very different interpretation of the available data, attempt to show why their counterpart is wrong, demonstrating that in fact it is their own plan has the taxpayers best interests at heart.

This will often continue for months on end, with the different sides trying to poke holes in the other’s stats, while both sets of figures highly unlikely to show the full picture, cherry picked to within an inch of their statistical lives.

When it comes to the 50p tax rate and its potential benefits or pitfalls, evidence used to back up each side of the argument have cited two rather different reports, which came out at different times.

Ed Balls/Labour:

“Latest figures from the HMRC show that people earning over £150,000 paid almost £10bn more in tax in the three years when the 50p top rate of tax was in place than was estimated at the time when the government did its assessment back in 2012.”

Chancellor George Osbourne:

“The direct cost (of reducing the top rate from 50p to 45p) is only 100 million pounds a year. HMRC calculate the loss of other tax revenues may cancel that out. It raises at most a fraction of what we were told and may raise nothing at all.”

So who should we believe?

The answer is… well, probably no one.

In customary fashion, Tim Harford, the Financial Times undercover economist and presenter of Radio 4’s “More or Less” programme that investigates the numbers in the news, said in the latest edition of the show:

“If only life were so simple…  and all the taxable income in the country was a delicious gigantic cake and all the Chancellor had to do was decide how big a slice to take. However, taxable income is a moveable cake, it’s a cake that shrinks from the taxman’s cake slice and grows again when the taxman is out of the room.”

Ed Balls cake

Can Ed Balls have the tax cake and eat it too?

Photo: Tattooed_Mummy/Flickr – Creative Commons license

Essentially the main point here is that taxable income changes in response to tax rates. This is especially true in the case of the 50p tax rate, as both its introduction in 2010 by Alistair Darling and George Osbourne’s decision to cut it to 45p were pre-announced.

This allowed for major behavioural responses as people adjusted how they paid their taxes, bonuses and collected dividends based on the rate at the time, either by paying them early in anticipation of the impending tax hike, or forestalling and waiting to collect it later when it was dropped.

It was in place for such a brief time that it tells us very little about “how much it might have raised in the long run when everything had settled down”, according to the analysis by More or Less.

Data journalism site Ampp3d have done a good job in explaining the debate here, raising the issue of other factors at play, not just the revenue raised by the 50p tax rate.

They also analyse how many people a reintroduction of the 50p tax rate will potentially affect, comparing it to other policies and of course answer the question on everyone’s lips, how its potential introduction will affect Wayne Rooney. This is a great way to provide context to the debate.

Given how tough it is to draw conclusions, especially considering the uncertainty surrounding income tax revenue as a whole, (according to More or Less studies by the Institute of Fiscal Studies show that the top rate of tax at which the Treasury would scoop up the absolute revenue could be as low as 30p to as high as 75p), it is unlikely that a definitive answer is likely to emerge anytime soon.

In the next year we will undoubtedly have these stats thrown at us time and time again, used to back up the respective arguments.

Therefore, it’s important to acknowledge that most probably these figures are being used to reinforce an ideological point of view ahead of the general election rather than the result of having studied the facts and adopted a position through a clear and thorough data analysis.

If you haven’t already, definitely worth listening to the full version of More or Less, available here

Visualising Premier League club spending on agents

Last weekend, the Premier League released the annual spending on agents per club, following a commitment to make this data public each year.

The data shows the total amount each club paid to authorised agents during the period from October 1, 2012, to September 30, 2013. It wasn’t hard to find the data for the previous year and so I decided to put some basic data visualisation skills we learnt with John Burn-Murdoch to the test.

It was a really straightforward dataset (see table below), therefore quite an easy write up for sport websites, using hooks like the fact that the total figure of 96 million representing a record, or highlighting a few interesting points on which club paid the most, the least and so on.

One of the easiest things to do was to compare between the two years, as The Times did (£), making a point that Chelsea and Newcastle spent double this year than they had done the previous one.

Given that in the last few weeks we had learnt about a couple of great data visualisation tools, namely Datawrapper and Tableau, which allow you to quickly and with minimal fuss visualise your data, I thought it was a great chance to try my hand at using them in ‘real time’.

So for the initial year on year comparison I used Datawrapper. I included the values for the clubs in the Premier League for both seasons, so did not include clubs relegated in 2012 or promoted from the Championship last season.

It was really easy, but I encountered a slight problem as I had initially uploaded the values from the CSV as currency, which Datawrapper had some trouble with. I amended that, transposed the table to get each team side by side and simples, here’s the result.

Premier League clubs agent fees fullIf you want to take a look at the interactive version, click on the image.

What is noticeable is that it’s immediately easier to see the difference year on year per club when the data is presented in this way rather than list or text form.

You can quickly identify that only two clubs – Arsenal and West Ham – paid less to agents this year than they had the last, while the difference between the amount Chelsea, Man City, Spurs, Liverpool and the rest spent this season is clear.

Design-wise as there are more than 15 clubs in our chart, including two values for each, it does look quite cramped, but still does what it’s supposed to and in just a few minutes. Maybe rounding up the figures to millions with a couple of decimal places would have worked better though.

Not wanting to stop there while I was in the swing of things and quite eager to practise a little more with Tableau, I went a bit further.

Using just the latest data for 2012-2013, I tried to show how much each club spent in terms of the whole, to give a slightly different visual focus from the initial chart that looked at the change year on year.

Given how much of a no-no it would be to represent this with a pie chart, the best way to do this was probably using a tree map. Again, this is a pretty simple database, but Tableau really does make it look like you have spent much more time and effort than you actually have.

Click on the image to look at the interactive version.

The tree map works well to show the difference between the clubs in terms of the total spent and makes this clearer than a bar or column chart. I  think the colour gradient for the different parts of the tree map comparing value ranges is a great feature as well.
Making these two visualisations, simple as they are, only took around 15 minutes in total. The great thing however is that it looks like it took much more time and effort to create.

Some love for the pivot table

A step too far?

A step too far?

Now marriage is probably a bit too big a commitment, but for a data journalist, love for the excel pivot table is due.

When Excel has formulas such as this:

=IF(F6=1,IF(VLOOKUP($D$6,$B$9:$E$51,4)=”All”,SUMIF(Payments!$N$6:Payments!N279,$I$9,Payments!$Q$6:Payments!Q279),SUMIF(Payments!$K$6:Payments!K279,$I$9,Payments!$Q$6:Payments!Q279)),IF(VLOOKUP(Data!$D$6,Data!$B$9:$E$51,4)=”All”,SUMIF(Payments!$O$6:Payments!Q279,Data!$I$9,Payments!$Q$6:Payments!Q279),SUMIF(Payments!$L$6:Payments!Q279,Data!$I$9,Payments!$Q$6:Payments!Q279)))

(not that I have any idea what that means) then anything that makes life as simple as the pivot table is worth falling in love with.

In his book ‘Data Journalism Heist’, Paul Bradshaw calls pivot tables the ‘crowbar’ of data journalism, as they are the best way to “crack open your data”.

What makes pivot tables such a powerful tool is that they allow you to simply drag and drop data into columns or rows, making it easy to analyse, categorise and summarise the data, giving you the answers you need in no time.

Take a table like this for example.

3754 rows of European Investment Bank finance contracts from 2008 – just a little bit unmanageable for deducing anything from this dataset.

Creating a pivot table

If you want to try this out during the how-to, here is the original copy of the dataset I used.

Click on any cell within the data, making sure there are no empty rows or columns in between and on the Data option in the ribbon find the pivot table option (in some other versions it can be found in the Insert option in the ribbon), and select ‘create manual pivot table’, choosing the new worksheet option.

Creating the pivot table

Creating the pivot table

You should now have an image similar to the screenshot below, as it could be a slight variation depending on your version of excel, it should still function in the same way.

Pivot table - starting point

Pivot table – starting point

A pivot table allows you to drag and drop each heading into a row or column to analyse the different elements of your data.It is best to keep things as simple as possible, just dragging the different field values into rows whenever you want to examine them.

And now the fun begins. In our pivot table, we can take it in turns to look at each of the value fields to the rows section of our pivot table, for example in our data – sector, region, country and date.

Let’s look at how we can look at the amount in loans for each sector.

Dragging the sector field to the row labels

Dragging the sector field to the row labels

Then by dragging the signed amount to the values field,  we can look at the total money in loans from the EIB for each sector.

Excel pivot tutorial screenshot 5

It works just as easily with another field, for example the amount for each country or the total loans for each year.

Comparing values for each country

Comparing values for each country

Comparing values for each year

Comparing values for each year

You can also use the column field to dig deeper into your data, but be very careful as it tends to complicate things.

Using both the column and rows field

Using both the column and rows field

To get to a potential starting point for a story, sort the data in the row with values that you have in order to compare them effectively and identify the largest/smallest values. Just select one of the cells in the values column and press the sort button on the top left of the Pivot Table ribbon.

Sorting to identify largest/smallest valuues

Sorting to identify largest/smallest values

So, you can pretty quickly identify that from all the regions listed in our data, the European Union  has received more money in loans since 2008, followed by Enlargement countries. The regions with the least are the EFTA countries and South Africa is in second to last place. By simply dragging a different value field into your row label, you can just as easily  see which country has received the most, or which year had the lowest total amount of money in loans.

Another brilliant feature of pivot tables is the possibility to analyse different calculations, not just the total sum. By clicking on the i button next to the sum box (this is for macs, similarly you should be able to find this by right-clicking on a pc), you can select different indicators you want the table to display.

Going beyond the sum, looking at other indicators

Going beyond the sum, looking at other indicators

You can now analyse other values other than the sum, for example the COUNT – how many loans were signed for each region, sector, country or year. Other useful things to look at would be AVERAGE amount per loan signed for each field and more.

And there you go, now you can pivot.

But watch out, don’t get too emotionally involved, it looks like Mr. Pivot has been taken.

Football Manager + data viz = Pie Charts are evil

Note:  (There is some data journalism involved here, be a little patient, battle through the Football Manager references if you are not that way inclined and I promise, I will get to it. But honestly, you may enjoy it a little more if you are an FM fan.) 

Perhaps it is a sad indictment of my personality and my life so far, but few things can stir within me the same emotions, excitement and enthusiasm than the mere thought of the world’s best ever football management simulation game, Football Manager.

I am somewhat comforted by the fact that I am not alone in this. There are apparently more than 11 million others, certainly not all like me, but based on this Guardian article a few months ago, there are people who’ve got it worse.

I find it very hard to accept any criticism directed at this glorious game. Football Manager single-handedly ensured that I did not waste my youth and instead led Francesco Totti to Champions League glory with his boyhood club, I made Brendan Rodgers and Michael Laudrup look rather average with my title-winning Swansea side of 2018 and amassed many many more achievements that I better not list here for fear of never getting to my point.

As part of the campaign to accompany the launch of the new version out in the next few weeks, developers have been slowly advertising the various new “items” set to enhance the user experience, via tweets, videos and screenshots.

Taking into account the god-like status I afford the game’s developers, I may be committing an act of heresy by pointing out some possible misdemeanors in their attempt to liven up the squad training dashboard. But in the name of data journalism, I feel I have to.

FM Screenshot - training

The screenshot, which indicates that it is a work in progress and not the final version, depicts levels of squad fitness, happiness and overall general training focus through pie charts.

Now this is not necessarily something all that novel. Football Manager is no stranger to cool and complex data visualisations, introduced to compare players and show form a number of years ago.

Adopting the naive approach that the more data viz the better, without too much thinking and with the utmost confidence in anything the football manager developers produce being of the utmost quality, overwhelmed with excitement at added, attractive-looking visualisations within the game, I tweeted away.

Just because it looks good, doesn’t mean it is good

Oh, what a fool I was.

John Burn Murdoch, experienced data journalist formerly of the Guardian datablog and currently at the FT (and later revealed to me that he was an ex-Football Manager addict), was less than impressed however.

Almost in disbelief, he replied to my tweet by linking to an article on Excelcharts.com, explaining why using pie charts with more than two segments are problematic.

The article simply and clearly explains and demonstrates that you should only use a pie chart when comparing a single proportion to the whole.

It’s very simple, really: you do not compare proportions in a pie chart. Because a pie chart is not a comparison chart, it’s a part-to-whole chart. When you do this

pie-chart 1

what you really want to do is to compare each slice to the whole, like this:

pies-chart

For the full article here.

Traditionally, pie charts are one of the most common ways to analyse data, you see them everywhere – newspapers, magazines, online in any sort of statistical analysis. They are often the go to tool for a quick and easy data visualisation, but after John’s tweet, doing some reading around the issue and looking at different examples of pie charts within the media a little more critically, it’s not hard to see that they are almost never useful in actually comparing different values effectively.

My research led me to many websites castigating the widespread use of pie charts. According to Business Insider, they are the “worst way to convey information ever developed in the history of data visualization”. I found writers wanting to report poor uses of pie charts within the UK national media to “the data viz police”. Not entirely sure they exist to be honest.

However, it is nice to find someone who is prepared to accept that my reaction does not *necessarily* portray stupidity and let me off the hook for suggesting that pie charts looked rather cool.

Here is what programmer Steve Fenton had to say about pie charts, making me feel less bad about my error in judgement.

Since 1801, pie charts having been displaying information and statistics about everything from population to profit, market share to margins and, presumably, the most popular pastry-encased dinners. Pie charts are as popular in the statistics world as their namesakes are in the culinary world – but there are some convincing reasons to avoid using them altogether.

Tell someone that pie charts are rubbish and they’ll look at you with a mixture of confusion, surprise and even anger. Humour them though, because even I reacted this way when I was told the same information. Be patient, remain impartial and explain the following important arguments against the use of evil pie charts.

Full article here.

Fooled by an inherent trust in those at Football Manager to always get things right, despite the many examples on the contrary – who can forget Cherno Samba, Freddy Adu and Tonton Zola Moukoko to name but a few – I fell into the trap.

Don’t worry though, I have been converted. Now, I know.

Pie charts are evil.