Feature engineering with dates Feb 27, 2021 Remember my last post? I was talking about my 2020 resolution to eat less meat, and how I’d tracked it in my bullet journal. The data that I have is…well, on first glance there’s not much to it. ...
Page to plot Jan 2, 2021 It’s the second day of January 2021, and a fitting time to be making resolutions. It’s also the time to look back at commitments made at the start of last year and check in on how they went. ...
Reflections on 2020 Dec 19, 2020 It’s the last month of 2020, and what a joy the year has been. Life is now pretty same-y, and whatever I’m doing, I’m doing it from home. That makes it easy to feel like nothing is changing, so I wanted to spend a few minutes actively reflecting on what I’ve done and learnt this year. ...
Hashtag hashtag analysis Nov 15, 2020 We’re back looking at tweets! This is going to be my last post in this series. In my last two posts, I looked at how often curators of @WeAreRLadies tweet. ...
Tweet timing Nov 2, 2020 I explored how often curators of @WeAreRLadies tweet in my last post. In this post, I go a little further with this analysis and explore how activity changes depending on the day of the week and time of day. ...
Analysing Twitter data Oct 24, 2020 Earlier this month, I was the curator of @WeAreRLadies for a week. They have a different person tweeting every seven days; I was in charge in the week commencing 5th October. ...
Making it work from home Sep 26, 2020 It’s been about six months since the UK first went into lockdown. Many of us around the world have drastically changed our lifestyles and ways of doing things. I’ve become much better at working from home and there are lots of things I prefer about it. ...
Using callbacks and logging during training with gensim Aug 24, 2020 How long should you train an LDA model for? This post is less to do with the actual minutes and hours it takes to train a model, which is impacted in several ways, but more do with the number of opportunities the model has during training to learn from the data, and therefore the ultimate quality of the model. ...
Data cleaning and exploration with data.table Jul 19, 2020 In May, I delivered a few training sessions for R beginners and I’m using those materials for some posts here. Following on from my two posts on ggplot2, this is based on my session of data. ...
Stop using iris Jun 24, 2020 The iris dataset is very widely used in the data science community, whether as a training aid, a tool for trying out new skills, or just a well-known set of numbers that can be used as background while demonstrating something in a blog. ...