Yesterday, I was at the Women Driven Development Conference, speaking on a panel about unconventional paths into tech. With my BA in English, I have not taken a typical route into data science. I’m planning a post where I’ll look at job ads in more detail, but suffice to say, I don’t think there is one specific and correct way into the profession, and there are certainly ways for organisations and people looking to hire data scientists to be more imaginative. As I’ve noted previously, in the most recent Stack Overflow survey, only 5.3% of respondents in the UK (in any tech role, not just data science) had an arts or humanities degree. So what if you are in the position I was in a couple of years ago? I had worked out that I wanted to be a data scientist, but I knew my formal qualifications absolutely did not match what I was seeing on job ads. I’m going to use this post to expand on the panel discussion with some thoughts on exactly that.
If you’ve never done a STEM subject beyond school and you haven’t done any coding, you’re going to have to do some work to get to a reasonable level. Luckily, the number of free resources out there is quite astounding. I would recommend at a minimum doing some stats courses, learning either R or Python and getting familiar with key ideas and terms in data science. I think you can actually learn a lot on the job; things like SQL and git are the kinds of skills I think you can pick up on the go, but as they are mentioned in a lot of ads, you might just feel more comfortable having a basic understanding of what they are.
I started off with Basic Statistics and Inferential Statistics, both of which you can take for free on Coursera. I decided I wanted to learn R and worked through Hadley Wickham’s book R for Data Science, and although I now use a lot of different tools and packages, this was a useful foundation. O’Reilly books are often really good resources and plenty are available for free or sometimes on offer at very reasonable prices. As far as data science and machine learning goes, there are a huge number of video courses out there, but one of the resources I referred to a lot when I was very new was a guidebook on machine learning basics from Dataiku. It is pretty simple and standalone it won’t be enough to give you the skills you would you need on the job, but it is a good introduction and actually answers some questions that commonly come up in data science interviews, such as how to evaluate or validate models, and what the difference between supervised and unsupervised learning is, as well as examples of these approaches.
This GitHub repo provides a lot more resources. The key, I think, is to find stuff that is at your level and communicated in a way that makes sense to you. Sometimes this means watching the same video or going through the same exercise a few times. Sometimes it means ditching a course because you realise it isn’t working. For example, some courses will go into very intense detail and use jargon and obfuscating techniques that can be intimidating. Don’t feel like you aren’t cut out for data science if you don’t understand everything like this - focus on the concepts and being able to follow through the logic, and find trainers and writers who speak to you better.
Learning theory is important, but a lot of stuff only really sticks when you’re using it in a real situation. In addition, potential employers will want some kind of proof that you’re not just talk. Doing some projects is therefore key. You might find in your courses there are projects, which are a good start in a safe environment, but ultimately you’ll want to be defining, designing and working through your own projects to demonstrate you can handle real world data and apply your learning to new scenarios.
For this, it’s a good idea to build on something you already know about. Can you find statistics or data relating to the field you currently work in? Or if you are a student, relevant to your course? For example, if you are an English grad like me, you might find it interesting to do some text analytics using books available through the Gutenberg project. Alternatively, browse official data available for your country; there are lots of UK datatsets available through the Office for National Statistics.
Obviously it would be particularly great if you could do this at work - if you don’t have a role that this fits into, you still might be able to make a case for doing a small project as a development exercise, or at least talking with people in your organisation who are involved in analysis or research about the kinds of things they do.
If you’re struggling to think of anything, you might benefit from doing hackathons, where broad briefs and some data are often provided and you work as part of a team to complete a project. This way you can also learn from other people with more experience. Look for beginner-friendly hackathons (such as those run by WDD!).
When doing a project, make sure it’s documented, so employers can see that you’ve done stuff. This might be on a GitHub repo or through blogging or tweeting. And think through how you would answer questions about the project - why did you make the decisions you did? Were there other approaches you dismissed and why?
Finally, I think it’s worth looking at your existing experience and working out how to reframe it so it is relevant. Just because you have never worked as a data scientist doesn’t mean you are completely inexperienced. What kind of problem-solving have you done? Have you worked with people in different roles and communicated between them?
If you are just out of university or school and have no experience yet, you might find a stepping stone job useful, where you can develop your numerate skills and prepare for your next job in data science. But in most cases I don’t think that would be necessary if you can make sure you have ways of demonstrating your skills, experience and interest.
Know the industry
Data science is a rapidly evolving field and although you are unlikely to be right at the bleeding edge from day one as a new hire, it’s worth being able to show you have an actual interest in and awareness of what is going on. Plus it is genuinely interesting - I assume if you want to get into data science you agree!
I get most of my data science news from slack groups I’m in and experts I follow on twitter, as well as conferences where new ideas are presented. Recently, for example, somebody in a group I’m in shared an article about Computer Designed Organisms - organisms designed by computers and built using frog skin and heart tissue. This is pretty incredible stuff, and I can easily see how at an interview you could use this as a springboard to talk about ethics and issues around communicating about AI. There are also some interesting (and I think quite lovely) arts-related developments out there; I find MuseNet fascinating and I love following Painting Variables on Instagram, where Joanne Hastie creates abstract paintings using a robotic arm.
Be part of the community
Attend meetups, conferences and hackathons. Participate in slack groups or other online communities. Talk to people you already know who have a connection to the world you want to get into.
This one has so many benefits. It helps achieve some of the other goals here - I go to meetups where there are technical workshops, so for example I’ve learnt more about the python package pandas, building chatbots, web scraping and sentiment analysis in sessions I’ve attended. Beyond that, though, there are other good reasons for creating a network around you. Here are some things I’ve got out of mine:
Coding help when I’ve been stuck on something I couldn’t solve through googling
Honest and uplifting conversations that helped me to realise my potential
Alerts about jobs that might be a good fit and advice on applying
Validation as I’ve been able to help others
A feeling that I belong here
Have some perspective
It can be intimidating trying to get into data science when you don’t have a typical background. Job ads might paint a picture miles away from what it feels like you are capable of. But actually, data science is one of the fields where there is quite a lot of flexibility in what candidates can look like because there is such a great need for us. Don’t look at a job ad and believe you need to match every single item. I have had interviews and offers from companies that had their very first requirement listed as a masters or PhD in STEM subjects. So once you have done some work to be prepared for a career in data science, just start applying to things that you think you would be a good fit for, regardless of the exact description. You might be surprised by how you do.