Pair programming, or pairing, is a common practice among software engineers, but has not been uniformly adopted in data science at this point. Some workplaces have zero pairing as standard between data scientists, whereas others use it as they would with other coding roles. I really enjoy pairing, so I wanted to write about the benefits it can bring to data science projects.
What is pairing?
Very broadly, pairing is when two people are working on the same code at the same time while communicating. A standard set up is to have one person writing code (the ‘driver’) while the other person talks through what they should do and checks the code as it is written (the ‘navigator’).
What are the benefits for data scientists?
Data science workflows can be quite different to software engineering, but a lot of the benefits that pairing brings software engineers are true for data scientists. These are reasons why I like it.
I produce better code. This is partly because I am more likely to do things ‘properly’ with somebody else involved, and partly because they are able to spot things I’ve missed. This is also true when I’ve driving - without having to write myself, I am more free to spot little syntax errors or efficiency gains in the code.
I learn a lot. Working with other people almost always exposes me to things I don’t know, where it’s a different approach to a common problem, a new package, or a process that I’m unfamiliar with. I’ve really enjoyed pairing with data engineers who are coming at the problem with a different set of priorities, and who often have great practices that are useful for my own coding approach.
I get to socialise. Particularly as we’ve been working remotely, it can be easy to be isolated as a data scientist. If you have your own project that you are working on, you can go a long time without speaking to colleagues beyond a stand up update. Pairing is a great way to work with other people, and really helps make you feel connected.
To my last point, I think that pairing remotely is actually better than pairing in person! You can both look at your screens as well as doing any googling you think might be relevant. With some tools, you can annotate the screen or highlight code so you can be clear about what you’re referring to, and as the navigator you can send over code snippets easily. One of the problems people sometimes have with pairing is feeling self-conscious, but being remote definitely lessens that.
Some common concerns
Two people are doing something one person could do - that’s so inefficient.
I have actually not found this to be the case. I normally code faster with two of us, and on top of that by writing better quality code to start with, we avoid the kinds of problems that can come later with less polished code, which can really slow things down.
I’m new to data science so I can’t contribute.
I definitely understand this worry because I had it when I started out. However, it’s actually super helpful being the driver with a more experienced navigator. You get a lot of practice at writing code and they are there to guide you so the onus isn’t on you to solve everything. It also can be enlightening to realise that more experienced people than you also can make mistakes or not know all the answers without checking!
It’s embarrassing having somebody watch me code.
They shouldn’t just be watching you code - it should be a partnership so if you are driving they should be saying what you should be doing. Yes, it can be a bit awkward if they say to do something and you don’t know how but in a good partnership you should be able to just be open about that and have them explain more clearly or with more detail.
I really enjoy pairing as part of a varied role. I wouldn’t want to pair all day, every day, because it can be quite tiring and sometimes you just need to code on your own for a bit as a break or to crack a problem that is quite specific. In my ideal role, I would probably spend about 60% of my coding time pairing.
I’d definitely recommend trying it out if you haven’t before!