Things I learnt in June 2019

This is the first in what will hopefully become a regular series of posts giving a quick selection of things I’ve learnt in the past month. Plenty of things don’t quite warrant a whole post of their own, but feel worth writing about. There’s every chance this will be a fairly occasional feature, but I’ll start with this one and see how it goes!

Music generated by AI is surprisingly advanced

Earlier in June, I attended CogX 2019, which bills itself as “the festival of AI and emerging technology”. It wasn’t all plain sailing; whoever thought the British summer was reliable enough to put most of the talks in what were effectively marquees is probably regretting that after the non-stop downpour on the first day. However, most of the speakers I saw were excellent.

I was particularly affected by the session on whether AI can be creative. Christine Payne told us about MuseNet and her work using neural networks to create music. She played us several demonstrations of different styles, such as a Beethoven symphony, a ragtime piece, and even combined styles such as a Bon Jovi/Chopin mashup. It was way ahead of other things I had heard, and I wonder how fast it will be developed from here. There followed an interesting panel discussion on the uses of music like this but also the topic of the session. Personally, I think creativity is in itself linked to intent and emotional experience, which AI lacks, so in my understanding of the concept, I don’t think AI is creative. But that seems almost secondary to the fact that it clearly can create.

While that talk was probably my favourite, the speakers in the ‘Dirty Data, Bad Decisions’ session were great, in particular Rashida Richardson who presented on her paper about racial bias in predictive policing. I also appreciated seeing Stuart Russell talk, who really clearly demonstrated why when loss function is unknown it shouldn’t be assumed to be uniform.

The leaflet package is even more flexible than I demonstrated

Obviously, you already saw that I posted about using leaflet in R to create beautiful choropleths. One thing I forgot to mention is that there is actually a large number of map bases you can use to adapt how you want it to look. It might be that you don’t want to show all the detailed roads, or you want to emphasise certain geographical detials, or you just want a pretty palette…

library(leaflet)
leaflet() %>%
  addProviderTiles(providers$Stamen.Watercolor)

It’s as easy as using addProviderTiles() where you would normally use addTiles().

You can see all the options here.

How to iterate string searches faster

Finally, when I was QAing something for a colleague, one of the things he asked was whether I could speed up a section that was running quite slowly. I needed to use a vector of regular expressions to search for matches in a column of a data.table. The original was something like this:

for (i in 1:length(word_vector)) {
  dt[word %like% word_vector[i], match := T]
}

There were a couple of ways to make it quicker. Firstly, this answer on Stack Overflow suggests that using the stringi package to do partial matching is faster than using the base grep functions. In the above example, %like% is a wrapper for grep(..., value=TRUE) - so it stands to reason that using something like stri_detect_regex() will be faster.

The second thing was to take it out of the for loop, as these can be slow. I figured I should be able to just do this in one line, but was struggling to make this happen until I found this advice to use the paste0() function as I called the vector, collapsing with | so it would be read as one long statement with lots of OR operators.

This got me something like this, which ended up being about 6X faster than the original - which in this case saved about an hour!

library(stringi)
dt[stri_detect_regex(word, paste(word_vector, collapse = "|")), match:= T]

There’s plenty more I learnt this month, but I hope these are useful or interesting! I’m posting this a few days early as I’m off to France for a short break. The weather forecast has been saying it will be 40 degrees, so I’m very much hoping this is an instance where predictive analytics turns out to be wrong…

See you in July!

comments powered by Disqus