More maps! and some cautionary words

Jun 10, 2019 in PROJECT • R
r data-visualisation mapping leaflet
23 min read

I am a woman of my word. When I told you in my last post that I would be back with more maps, I’d already started writing this. You’ll remember that I was using the R package leaflet to visualise data from the Stack Overflow Developer Survey on a world map. I’ve taken what I learnt then, and with just some small tweaks to the data, I’ve got some more maps to show you!

But - this post isn’t all the joy of maps. It’s also about the problems with maps. They look great but are they the right way to visualise data? The answer, as it basically always is, is: sometimes not.

Before we get to that sobering realisation, let’s allow ourselves to enjoy some more leaflet-generated maps.

More maps from the SO developer survey data

Data scientist

After looking at gender in my previous post, this time I wanted to look at how many respondents to the survey said they were data scientists. As a female data scientist, I am not at all narcissistic, you can see. But if you want to look at a different profession or indeed any other survey result I haven’t covered, you can use my code with small adaptations.

Check out Thailand! 16.3% of its respondents are data scientists. In the UK, it’s only 6.8%, and you can see by the colour that it’s a pretty similar story in Canada and Sweden (both 6.7%).

Humanities and Arts graduates

Continuing on my theme of focusing on me, I wondered how countries differed by representation of people with humanities and arts degrees. My BA was in English Literature, and I feel pretty passionate about the industry being more open-minded about the academic background of its workers. There will definitely be a post about that soon!

Let’s have a look…

Well, the USA in bold purple is leading here, with 7.4% of the respondents from these less traditional developer and data science degree paths. The UK also has one of the higher proportions at only 5.3%. I wonder how this compares to proportions of degree types overall?

Median age

The final thing I explored was age. Unlike the other questions I’ve explored, age is a numerical category, so instead of looked at proportion, I’ve shown the median age of respondents for each country.

Ooh! Looks like in North America, Northern Europe and Australia, the median age is in the 30s, whereas it’s 5+ years younger in India, parts of SE Asia and in the parts of North/Central Africa for which we have (enough) data. In the UK, it’s 30, making me super average in this area at least, having celebrated my 30th a few months ago.

Be careful what you do with maps

I always really enjoy working with maps because the visuals are so rewarding. But they do present some problems, some of which have been obvious in this project. So here’s a rundown of some of the things to be wary of if you want to map data.

Countries and regions differ in area. This presents a couple of problems. Firstly, if you are looking for a specific place, it can be hard to find (try identifying Singapore on these maps). But also, we ascribe value to area. We’re using this map to get a sense of the data, but the colour of a country like Russia or Canada massively affects our instinctive impression of the data. And often, big areas on a map correlate with low population density. I’ve done maps using UK constituencies, and those can be really misleading because all the London constituencies are tiny and almost impossible to interpret without zooming. Yet London is the most densely populated region in the UK. You can also think of maps of US election results - even when Democrats are leading, maps often look much redder because Republicans tend to lead in large states with sparser populations.
Really, you need complete data, or at least as complete as you can get it. The greyed out countries in my map really detract from the message. But as we saw in my previous post, if I’d used all the data available, the results would have been really skewed because of tiny sample sizes. A similar issue is that if the data isn’t from the same source, you might be dealing with incomparable results. Unlike some other types of data visualisation, your missing or outlying data really stands out on a map.
Matching maps to data can be a pain! This is kind of true for any data work, but everybody who’s worked with regional data knows the frustration of different naming conventions, as well as shifting borders and regional definitions that take a while to feed through to available maps or indeed your data collection.
It’s actually quite inefficient for some things. Yes, you have the data for every! single! region! but you can only show one thing at at time. For example, in my data scientist map, it might have been interesting to also show other fields, or whether there were differences in people selecting multiple options by country. I’m sure there are elegant ways to do that! But on a basic data visualisation product like this…you have to decide what your key focus is.

There are probably more reasons to be cautious. But just because you’re careful doesn’t mean you can’t use maps. Geospatial data visualisation is really engaging, and leaflet makes it that much more impactful in R.