Friday, 18 July 2014

Mapping Blight in the Motor City

In my preparations for the launch of our MSc in Applied GIS, I've been putting together lots of case studies of GIS in action. Luckily for me, this has coincided with the launch of the Motor City Mapping project in Detroit; part of a wider attempt by the city to understand and prevent urban blight. One part of this project has produced an amazing survey dataset covering nearly 380,000 land parcels in the city. An overview of this is provided by Motor City Mapping in the following graphic.

Source: www.motorcitymapping.org

This data was generated by survey staff over a short period of time during winter 2013/4 and is probably the most detailed, parcel-level city survey carried out in recent times. For more about the project, take a look at the short video below. One great feature - in addition to all the rest - is that the final dataset contains a link to the photo taken of each land parcel by the survey staff (residents of Detroit surveyed their own neighbourhoods). The entire dataset is pretty big - close to 1GB - but it can be downloaded via this page and used in your GIS. This direct link worked for me.


The image below shows you what it looks like when you map the data using the land use category. 

Link to bigger version

Finally, since they very cleverly included a photo url for each land parcel in a separate column, I decided to extract a small area and put it in a web map using CartoDB so that you can click each land parcel and see what it looks like, in addition to some of the characteristics of the parcel. I extracted the data for Grand Boulevard since it's an important street in Detroit's history, with important locations such as Lee Plaza, Motown Records and Henry Ford Hospital. Click on the image below to go to the full size version. You'll see that I've coloured the map by building condition - mostly good on Grand Boulevard - and when you click on a land parcel you'll see an image of what's on it plus details about condition, occupancy and use. I also included a date of when the survey was carried out.

Full screen version

This is all part of a wider city planning project called 'Time to End Blight', and you can read more about it on their web pages. The report is a great piece of work in a really difficult time in Detroit's history so it's great to see so many people coming together for this. If you have any interest in cities, urban blight, regeneration or revitalisation then I suggest you take a closer look at the report and its recommendations in particular.


Friday, 23 May 2014

The Wonderful World of Open Access

I'm one of the editors of an open access journal, but that's not what this post is about. Instead, it's about the wider world of open access, which I've blogged about before, with some charts and stats. The web is full of opinions on open access, with comments from sceptics, advocates and others somewhere in between. I'm really excited by open access publishing, but of course - like any publishing model - it's not perfect. What I've been doing over the past few years is trying to learn a lot more about the world of open access and really understand what the open access landscape looks like. 

In doing so, I've become pretty familiar with where to find information and for this purpose DOAJ is my first port of call. What you find out very quickly is that there are literally thousands of open access journals - about 10,000 - and that they constitute a very diverse, colourful group. One way to demonstrate this is to look at the metadata on the DOAJ website. Here you can find, amongst other things, a list of URLs. So, I took a screenshot of them all - the results of which you can see here (or by clicking the image below - it takes a while to load).



Why on earth did I do this? Partly as a little spare time project to see how easily it could be done but mostly because I wanted a quick way to see what all the websites looked like - i.e. how many are full-blown fancy websites backed by international publishing houses and how many are more small-scale ventures. It also allows you to more easily identify families of open access journals (scroll down and you'll see quite a bit of this). This doesn't necessarily say anything about the quality of journals (that's for readers to decide) but it does provide a visual overview in a more accessible way. Looking through the full list of 10,000 websites would take a little longer! I used a Firefox extension for this task, and it did take quite a while. The DOAJ spreadsheet I used is from late in 2013 so some more recent journals are not included. To finish with, here are some of my favourites...


'Fast Capitalism' - I love the name and the musical intro:

'Studies in Social Justice' - nice cover shot:

'International Journal of Dental Clinics' - so many languages:

'Reading in a Foreign Language' - I just like this idea:

Not sure what caught my eye about this one, but I like it:





Tuesday, 8 April 2014

Mortgage lending data in Great Britain - a step in the right direction

Since late in 2013 data on bank lending at the postcode sector level for Great Britain has been available via the Council of Mortgage Lenders (mortgages) and the British Bankers' Association (personal loans). This followed an announcement in July 2013 that such data would be made available in order to - among other things - "boost competition" and "identify where action is required to help boost access to finance". It was also said at the time that the data would be "a major step forward in transparency on bank lending". My assessment is that this is only partly true. The new data do represent a major step forward and organisations such as the CML are to be commended for their work on this, but in relation to mortgage lending at least things are more opaque than transparent, as I attempt to explain below.

Location quotient lending map for Liverpool - HSBC lending


Demand?
I should begin by saying that I think this newly available data is a fantastic resource and that it does allow us to ask important questions and - to an extent - hold lenders to account. However, since we have no data on local demand - as they do in the United States under the terms of the Home Mortgage Disclosure Act of 1975 - then the extent to which we can identify which lenders are rationing finance, excluding areas or 'redlining' is extremely limited. In fact, I would concur with George Benston, who said in 1979 (p. 147) that:

  • “If the focus is on the supply of mortgages, either in terms of numbers or dollars, a demand as well as a supply function must be constructed and specified. When demand is not accounted for, there is no way to determine the reason for any given level of supply.”
So, in the image above, which shows lending location quotients for HSBC in Merseyside, there is no way of knowing whether areas of lower lending receive less finance because fewer people there apply for mortgages or whether some other supply-side mechanism is in force (e.g. bank lending policy). All we can really do is compare the lending practices of different institutions and note the differences. The same type of map is shown below for Lloyds banking group (the UK's biggest lender). This would appear to suggest some significant differences in relation to where these two banks lend the most. People with a knowledge of Merseyside will recognise that Lloyds has many higher lending location quotients in poorer areas. Is this 'evidence' of financial exclusion, redlining, sub-prime lending? No, it definitely is not. It does show, however, that banks lend differently at the local level. This is not news, but the new data releases allow us to identify very local patterns and ask questions about it. 

Location quotient lending map for Liverpool - HSBC lending


Interpreting change
Following the first release of data in December 2013 - which included all outstanding mortgage debt up to the end of June 2013 - an updated dataset was released in April 2014, covering the period up to the end of September 2013. Once again, this is a fantastic development in many respects, but it poses new questions of interpretation. We don't have any idea at all - and please let me know if I'm wrong - on how much of the change is down to people paying off mortgages, how much is down to new loans being taken out (none?) and how much is down to data disclosure mechanisms put in place by the banks. The CML's chief economist, Bob Pannell, said this upon the release of the second iteration of the dataset:

  • "Unsurprisingly, with data covering outstanding lending rather than new flows, there are only small changes since the last quarter. It is likely to take some time before any discernable changes or trends emerge from this quarterly data series."

My calculations indicate that most areas experienced only modest change (c. +/- 2%) but as far as I can tell it's not possible to say exactly what causes the change in each area. There are some big changes at the local level in relation to individual postcode sectors, but it's only really possible to speculate at the causes. Nonetheless, here are the top five:

  • London NW9 1 - £907,640 (end Q2, 2013) to £1,396,795 (end Q3, 2013) - 53.9% increase
  • Exeter EX5 7 - £16,894,775 to £23,497,142 - 45.0% increase
  • London EC3A 5 - £2,729,724 to £3,677,645 - 34.7% increase
  • Cambridge CB2 9 - £56,992,750 to £75,546,239 - 32.6% increase
  • London EC1V 8 - £23,254,820 to £30,342,713 - 30.5% increase

Without some additional contextual information - as is available in the United States under the provisions of the HMDA, we can only really guess at the causes of such change. That's why various organisations have been campaigning for greater financial transparency for some time - most notably Friends Provident in their report from 2012.


The data
I've done a reasonable amount of analysis with the new mortgage lending data, including writing and submitting an academic paper, a series of maps and various other bits and bobs via twitter. My assessment closely mirrors that of Owen Boswarva, who notes the 'open-ish' nature of the data releases. The data do not, as far as I can see, come with any kind of licence (such as the Open Government Licence) but I and many others have just taken for granted that the data are 'open'.

The way the data are released is also interesting. The press releases and aggregate lending figures for mortgages are released via the CML, and cover about 73% of the mortgage market. This is obviously a significant advance on what was previously in the public domain, but in terms of getting your hands on individual bank data in one file, you have to scrape and mash the data together, like I did when creating my mortgage lending map site. Since I've worked with the data quite a bit, I thought I'd give a little overview of how I think individual lenders have done in relation to making the data available:

  • Barclays - if you go to the postcode sector data page on the CML website, there are links to data for all banks. When you click on 'Barclays' it will take you to their 'Citizenship' pages (as of 8 April 2014). From there you can link to the new Q3 2013 data release within a news article. It's not the easiest of journeys and could be made much more obvious. The file itself is, rather strangely, called 'Satellite.xlsx'. I think they could do better.
  • Clydesdale & Yorkshire Banks - these institutions are part of the same banking group and so report together as one. The data are pretty easy to find.
  • HSBC - for me, this is the most troublesome data release since I can only find it in PDF. It's not a massive task to convert it into a usable format, but it seems really odd in this day and age that a major financial institution would choose to release 10,000 rows of data in PDFs. If anyone has spotted another format please get in touch. The HSBC approach is at odds with the spirit of the exercise, surely.
  • Lloyds - the UK's biggest lender (following a series of acquisitions) also have a nice data page, which is easy to navigate. They provide some useful information, such as the fact that most buy-to-let mortgages are included in the data, and a direct link to the Excel spreadsheet.
  • Nationwide - my analysis suggests that Nationwide (the only building society to release data) truly live up to their name in terms of the the geography of their lending. Their data page is basic but it does the job. 
  • Santander - this institution also provides clear and simple access to their lending data. This is now different from the link provided on the CML website. 
  • RBS - as far as I can tell, RBS are the only bank to have produced their own maps of lending patterns across Great Britain and their data page is really quite good. The new data are currently provided via a news page link.

Despite the fact that some of the data are a little hard to find, it's mostly quite a good situation - apart from the HSBC PDFs of course. It would be much better if all the data were put together in a single spreadsheet by the CML but perhaps this is something the individual lenders are not too keen on (!) so it's up to people like me to stitch it all together. In which case, it would be great if HSBC started publishing in a more convenient format.


What's the point of all this?
I can sum up quite simply. If this data were released as part of a drive to increase transparency in the banking sector, then I think a few more things need to happen next:

  1. We need some indication of demand - e.g. number of applications, number of refusals, and so on.
  2. We can't do research into subjects like 'redlining' because we don't have the above information. We can make comparisons between banks in poorer areas but that's about it for now. If we really want to look at transparency, we need more information.
  3. We need more of a breakdown of the data in each new release to say how much is new lending and how much of the change is down to other factors - particularly so at the postcode sector level.
  4. Some additional clarity on the 'open' nature of the data would be very welcome.
  5. We need more banks to follow the example of the first seven and make their data available. With more than 100 lenders in the market it's probably not possible to get all to comply, but more work here would be useful.

All of the above represents really positive progress, but I think more is needed. I do realise of course that the CML are "considering additional features and functionality for future reporting waves", so I look forward to seeing what happens next.

Friday, 4 April 2014

Some thoughts on mapping spatial patterns of deprivation

In my research into the geography of deprivation across the UK, I frequently use maps to illustrate the spatial patterns associated with the areas identified as 'least' or 'most' deprived according to official indices such as the Scottish Index of Multiple Deprivation or the English Indices of Deprivation. Lots of other people do similar kinds of things, including mapping gurus such as Oliver O'Brien from UCL (his work is much nicer). A recent example is shown below, which I also tweeted this week. It's difficult to know exactly how people will interpret such maps, particularly when they are only seeing them on twitter without much in the way of context being provided, so this short blog fills some of the gaps and discusses some wider issues.


In previous academic papers (e.g. Urban Studies, 2009; Regional Studies, 2012; Local Economy, 2012) I've written about deprivation quite a bit, and on the need for the debate to centre not just on 'deprived' areas but more widely upon the wider dynamic of socio-spatial inequality. It's a shame that the focus is still very much on 'poor' or 'deprived' areas so in an attempt to draw attention to the urban inequalities which exist across the UK I attempt to illustrate the socio-spatial disparities within different cities. I also did this in a report on Sheffield from 2011 where I tried to draw attention to the socio-spatial inequalities within English cities, as shown below. It's not that concentrated deprivation isn't a problem (far from it) but rather that it's part of something much bigger.


These kinds of maps do draw attention to the general issue but of course they can lead to all sorts of other conclusions and claims because as we know, maps are an abstraction from reality and they do not represent an absolute 'truth'. These maps simply colour small areas within cities according to how they are classified by a government metric which attempts to say how 'deprived' places are. This may be a dubious practice in many respects, but it is woven into the fabric of how places are understood in a policy context and how problems are defined. It's important that we understand what this kind of mapping allows us to say and what it does not. Some of this is covered on the 'What does it all mean' tab of my Scottish Index of Multiple Deprivation 2012 map site, but I want to make a few more points here...

1. Colours. They are not intended to match up to any political party but some people inevitably make such inferences. The maps say nothing about the causes of the patterns, or who is responsible for them. But it doesn't stop people from talking about it and that's no bad thing. There is a lot more that could be said about colour choice but I'm going to leave it there for now.

2. The trouble with choropleth maps. Maps shaded according to some value (such as deprivation rank) present a misleading picture in a number of ways but two important ones stand out here: a) not all people in the area are 'deprived' or 'non-deprived'. This is the classic 'ecological fallacy' issue at work - the third paragraph here says more about that; and b) the shaded zoned themselves cover much wider areas than people actually live in so a big blue or red area gives the impression of a lot of something, when in fact the population of larger spatial units is similar to the smaller ones (as it often is with LSOAs or Data Zones in the examples above). 

3. The sometimes arbitrary nature of local authority boundaries. Places like Leeds are often said to be 'overbounded', whereas Manchester is 'underbounded'. This means that the local authority boundary either extends beyond the core urban area or it doesn't include much of it at all. So, in the cases of Manchester and Liverpool above if you were to extend the boundary of the map you would see more areas that are not so deprived. However, the point here is that local authorities have to deal with the financial, social and spatial implications of these patterns. What happens beyond the boundary is not part of their remit - even if it does impact upon them. The boundaries may be arbitrary in some respects but they have very real implications.

4. Why not take a different approach? A good idea, and one that Oliver O'Brien in particular has been very successful with. If we only look at where people actually live then we get a more realistic (but still not 100% accurate) picture of the spatial patterns associated with deprivation. This can be done using dasymetric mapping, where we assign the attributes of areas to individual features. This isn't a perfect definition, and the technique itself can lead people to assume a higher level of accuracy than can actually be obtained from the underlying data but it has advantages over standard choropleth maps in relation to depicting the places where people actually live or work. See also Neal Hudson's London tenure map in this style. The new OS Open Data VectorMap District buildings layer for Great Britain allows us to do this, so I've produced an example map for Glasgow based on the one in the first image above. This time you can only see the areas where there are buildings (though many are not residential properties).


Is the map above more 'truthful' than the normal choropleth? Probably not. However, this is all irrelevant if we aren't concentrating on the underlying patterns we're trying to draw attention to in the first place. The point is that in cities like Liverpool, Manchester and Glasgow the high levels of deprivation/poverty/disadvantage sit in stark contrast to areas at the other end of the scale. Also, places that we think of as 'deprived' are often far from it - as Peter Matthews might also argue. It's this kind of inequality which I'm attempting to highlight with my mapping - though I do of course like a nice looking map (I've also produced more than a few stinkers in my time). The point of all this? I hope that these maps can start a conversation about the underlying issue. I'll end with an extract from my 2012 Regional Studies paper on the issue...




Thursday, 30 January 2014

Some tips for charts in Fusion Tables info windows

I've recently been working with some mortgage lending data from banks in Great Britain to produce a new mortgage lending maps site. Once again, I've used Google Fusion Tables to map the data because it's relatively quick and easy - further information can be found here. What's not so easy is getting the info windows to do exactly what you want, particularly when you want to include charts of the type shown below. In this post, I explain a little more about how you can get the info window to display such a chart, what can go wrong when trying to do so and what the code underlying code looks like.


The chart shown above is what you'll see when you click on any polygon in my mortgage lending map website. It takes the data from the underlying Fusion Table and provides a unique info window and chart for each postcode sector - in the case of the above it is for the Cardiff postcode sector of CF24 4, which as you can see had more than £117 million of outstanding mortgage debt owed by 839 households at the end of June 2013. The default info windows created in Fusion Table maps simply contain a number of default data records from columns from the underlying table. With this data, I was keen to show a visual comparison of lending mix in each area, and I wanted to do this using a horizontal bar chart. I'd done info window charts before (e.g. in my Deprivation in Scotland website) but these were line graphs showing change over time.

There is some general help on putting charts in info windows from Google, and this is a very good place to start, and you can also find a lot of explanation for what the different bits of chart code mean in Google's Charts Gallery, or in the Chart Feature List, but to help anyone who might be trying something similar to what I've produced, I thought I would provide an annotated code example. I also want to provide some general troubleshooting advice. Here's what the code looks like inside the Fusion Tables interface:



And here is a Word document with comments added explaining what each bit of code does. You'll notice that it's a bit messy but it produces a very nice looking chart. One thing that I noticed about all this is that if you want the numerical data to appear in the info window with comma separators - as above, for the £117 million figure - then it appears to disappear from the chart. This is what happened to me anyway. My advice would be to keep the number format for your data as 'none' in the Fusion Table field options. My other advice would be to remember to save a good version of your code but most of all to experiment with different things and then share the results.

I hope someone finds this useful! I know I would have before I got into this. 

Tuesday, 26 November 2013

A World of Open Access

I've recently been appointed as one of the Editors-in-Chief (with Alex Singleton) of a new Taylor and Francis/Regional Studies Association open access journal. For the two years prior to the launch of Regional Studies, Regional Science I was also heavily involved in the research and development of the journal. As we're about to start publishing papers, I thought I'd blog on the topic of open access more generally and include some interesting data from the Directory of Open Access Journals, the most authoritative source of open access information on the web. Those with an interest in open access will of course know all about DOAJ and the fact that there are now nearly 10,000 open access journals in the world. 9,990 to be exact (as of 26 November 2013).

However, few people have looked closely at the data on open access; probably because most people are still in debate about the merits and pitfalls of open access itself. The simple fact is that open access publishing is having a major impact on academia and the biggest journal in the world (by volume of papers) is now PLoS ONE, an open access title, as documented by several commentators including Heather Morrison at the University of Ottawa. Now for some data and charts...

The United States has over 1,200 open access journals and seven countries account for more than 50% of all open access journals. Journals come from a total of 124 nations (click charts to enlarge):

by country - full size chart


Open access publishing is currently in a major phase of expansion, but it's not new - for example, 22 new titles were launched in 1985. The peak year to date has been 2011 with 1,099 titles being launched:

by year started - full size chart


The vast majority of open access titles do not charge a publication fee (66%) but a substantial minority do (26%), including many of the best known titles:

by publication fee - full size chart

Your knowledge of, and exposure to, open access will vary greatly by discipline. There are over 500 open access titles in Medicine and more than 160 in Political Science but only 3 in Geology (according to DOAJ):

by discipline - full size chart

The majority of open access titles publish only in English (5,538 or 55%), with the next closest language of publication being Spanish (621 or 6%):

by language - full size chart

The data these charts are based on can be downloaded directly in CSV format from the DOAJ FAQ page. Just scroll down and look for the section entitled "How can I get journal metadata from DOAJ?".

There's a lot of activity in the field of open access but it is highly unequal in terms of its geographic, disciplinary and linguistic distribution. In the subject fields of Regional Studies and Regional Science (the subject areas for our new title) the open access landscape is considerably less crowded - particularly in relation to titles supported by major international learned societies and multinational publishing houses. Given this situation, we expect that Regional Studies, Regional Science (already known more commonly as RSRS) will play an important role in helping improve access to knowledge in regional research across a wide range of disciplines, with a focus on geography, planning and economics.

Look out for our first articles in December, by Andrew Beer (Adelaide, Australia) and Terry Clower (North Texas, United States), Sarah Ayres (Bristol, UK), John Gibney (Birmingham, UK) and Markku Sotarauta (Tampere, Finland).


Tuesday, 10 September 2013

The Age of Buildings in the City of Chicago

Following my last post, on the geography of New York City, I've been exploring other building-level datasets to see what they can offer us in relation to telling us more about the fabric of the cities we live in. This time, I've focused on Chicago's 'Building Footprints' dataset. It's not nearly as detailed as New York's PLUTO data but it does include variables on (e.g.) number of floors and year built. As with the NYC data, it is not perfect but we can still make good use of it to understand the development of the city and its structure. I've mapped the city using number of floors as a proxy for height and shaded it by building age to produce the following overview (blues = older buildings, reds = newer).


Besides looking relatively interesting, the above graphic also reveals something about the phased construction of the City of Chicago and - possibly - something more about the data itself. As with the New York City PLUTO data, I produced a chart of the 'year built' column just to give me some idea of its distribution. It looks better than the New York City chart but I'm still not convinced it is 100% accurate (the year built data run from 1852 to 2010). 


Were there really nearly 15,000 buildings constructed in 2006 and only 78 in 2000? Possibly, but it would be good to know more about the accuracy of the data. In total there are 820,154 building footprints in the dataset and there are a range of different columns - which you can read more about in the metadata file. Once again, it's pretty cumbersome to work with in a normal desktop GIS setting but my machine can just about handle it.