The value of education in Data Science, 2020 version

7 minute read

A few weeks back, I’ve chaired a roundtable (virtual, in the times of Covid), aimed at discussing the evolution of Data Science as a field and its maturity. The roundtable has been organised as part of a series of events MBN Solutions runs to bring people together here in Scotland and debate over topics in tech. At the last in-person one of these events, a side conversation was growing around Data Science specifically and its value to the business world, as well as how its perception has been changing in time (for the better or the worse), so I thought it’d deserve its own event to allow us to chat these themes further, which are very interesting to me. I’m always very interested in hearing what people think about the role of data science and what is the situation right now with respect to a few years back. On this blog, you can find a few things I wrote throughout the years about meta-data science, they’re best found via the tags page.

I’ve dealt with Data Science for a while now, I work with it and I’ve approached the profession via my scientific formation. I’ve also always been interested in the meta part of it, meaning in contributing to making the use of data a proper, fair and useful endeavour.

It was a good event, we’ve had a mix of educators, practitioners and tech leaders, all people somehow involved in data roles. Most people were local here to Scotland (and many I know!), others were based farther away - the power of doing these things on Zoom these days (lockdown time, if you’re reading this in the future!) is that you can easily reach much wider.

The debate was called (by me) “Is Data Science a mature field yet? A debate over the evolution of this field” and the areas touched were, specifically:

  • the tension (or lack thereof) between academic formation and industrial work
  • the requirements and expectations that companies have in regards to data scientists, and whether these are generally fulfilled
  • industries and verticals where Data Science has generated some actual change so far

The event has been very fruitful and I’ve been reflecting myself for a while about what’s been discussed, so I just want to share some thoughts here.

Data Science as a field has received an enormous amount of attention in recent years, within tech circles and outside. Working in data isn’t new though, even though the flourishing of “data x” job specs may hint otherwise to the profane; people have been helping companies transform raw data into information for a long time, and people have been researching how to make sense of quantitative information for even longer. What is (relatively) new though is the scale at which this is now happening, and the push that this profession, consequently, has been subject to. The establishment of devoted training programs in universities (and schools!), is a relatively recent development. Up until I was studying (and it is a while ago, although not an immensely long while ago!) there were no “Data Science” courses per se, in fact most people of that lot like myself have approached jobs in Data Science by coming from more “traditional” scientific of technical degrees (Physics is a favourite). It’s good to see that the practice of studying data as its own discipline is well-established now, and a lot of our chat was around what skills do these courses teach that prepare for a job afterwards.

Ever since this field started to emerge, there’s been a tension between the university formation and the industrial requirements. Given that the discipline has firstly span out of other sciences, it has been characterised since the start as very well rooted in an academic setting. On the other side though, the industry, which has been witnessing the growth in data availability and has gradually adopted the view of “innovation by data”, has been baffled for a long time regarding what to do with these new figures, the data scientists. I was interested in hearing whether this friction has started to fade out these days and I’d say my take on this is that some of it has, but there is still a lot of work to do for us all, and mainly on communication, on both sides.

The educators in the room were reporting that courses have been refined to meet criteria, and that new ones keep getting outlined. They were also pointing out that changing curricula to meet an ever-growing list of topics to teach is not feasible, and choices have to be made. In the industry, there is still a certain frustration that new hires come in with somehow mistaken expectations about what the job really entails - and the need to work out things yourself, climbing the ladder from what is needed now to build what is going to be useful tomorrow.

My personal take is that universities do a great job in teaching the basics of the discipline, and I personally believe that we do not need many different courses focussed on the different flavours of data science, we need to teach the fundamentals and do it well, everything else can be built on top of them by the individual itself. I certainly acknowledge the value of having business courses within a data science degree though, and insights-communication ones. In short, and this is hardly an innovative thought, we need to teach the mindset, the critical instruments in dealing with data and the value of it. It is hardly an innovative thought, but I feel like it’s easily overlooked - using data to derive information isn’t about jumping to the state-of-the-art techniques (read: AI) unless there is a real need to. It is primarily about using the power of critical thinking and your own brain, and doing lots of calculations. Then, it is about learning and bootstrapping yourself, and being good at ingesting and analysing huge chunks of information from research papers, tech blogs, textbooks, adapting the lessons to your specific problem. You need to be a good problem-solver, that’s all that is, and to do that the best tool you need is the ability to think critically. Blindly applying methods here and there is not that, and creates a faux impression of competence, which we need to start defusing; it also risks feeding big egos, making this field less than welcoming, especially for newcomers. All the sciences are (still) affected by issues of systemic, old-rooted inequality, if not plain discrimination. In creating something new, new courses, new applications, and in scaling it, we have an opportunity to improve on this. I’d also go further and say that the existing degrees of Statistics and other numerate subjects (all STEM ones really), suit very well someone who wants to train to work with data. Fundamentals are there and everything else will be learned after those are met, especially the sophisticated results. The other way around simply doesn’t work. These existing degrees could be equipped with, or differentiated slightly into, courses peculiar to a specific direction, but I still believe the base ones are more than good.

Furthermore, I personally don’t think it’s fair to load all the responsibility of providing students with directly expendable skills on universities; I believe the industry has to play its part as well: in onboarding these new hires well in the team and guiding them towards the requirements of the business, not expecting that they know what they’re supposed to do at the start. Help people get in and help them help you. And do not hire someone just because they seem to know what they’re doing when talking about neural networks. Hire for the thirst to learn and the willingness to work your problems inside/out to make improvements.

The large and pushy marketing campaigns of the last years, whether conscious or not, that presented this discipline as a kind of modern Graal, creating a gold rush to it have done some damage to this field, I believe. They have impoverished it of its primary call - the scientific one, the contents-focussed one, at the advantage of the quick-results one. The story doesn’t change though: creating something good takes effort, and doesn’t come easy. So let’s all start demolishing the hype from now on, and work towards a better, more inclusive and more genuine Data Science.

Some good reads

  1. M Hutson, Eye-catching advances in some AI fields are not real, Science, May 2020
  2. M Jordan, Artificial Intelligence — The Revolution Hasn’t Happened Yet, Medium, April 2018
  3. V Boykis, Data Science is different now, personal site, February 2019

Tags: ,