13 November 2020

Why do you require a Data Engineer

Companies need to fundamentally ask the question - why do you require a data engineer on your team? The role of a data engineer spans 80% of the data science method. If a data engineer is expected to work alongside data scientists then there is a valid assumption one can make about the incompetence of such data scientists especially as they are only able to fulfil 20% of their job responsibilities as per the data science method. The next follow up question naturally arises - why do you require a data engineer on your team if you already have a team of data scientists? A competent data scientist's job is to be responsible for the entire end-to-end data science method which doesn't mean just building the models but also to pipeline and scale them so they are of use to the company.  In fact, they also are responsible for doing their own feature engineering. Otherwise, this begs the question - how can one build a model without doing the feature engineering themselves? And, it further begs the question, that if feature engineering work is done by someone else other than the one building the model, it is highly likely that it would lead to an overfitted model solution that only partially solves the business case. This means there is no real need to hire a data engineer if there is already a team of data scientists. In fact, companies can significantly reduce their hiring costs by just hiring people that know and understand the data science method. Either only have a team of data engineers or only a team of data scientists as it makes literally no sense of hiring both in a team. The next follow-up question could be - if you already have data scientists on your team, then is there any point in hiring a machine learning engineer? With so many role overlaps it only spells more and more incompetence and clueless team members not to mention frustration in hiring so many people. Companies can save on costs significantly by hiring capable people that have the sense to understand their job functions and have the relevant practical skills. Furthermore, one can then proceed to ask the next question - why do you require a data engineer, a data scientist, and even a machine learning engineer, if you could just hire a team of AI engineers? Machine learning is part of AI, data engineering is part of data science function, data science is pretty much what AI engineers can also do. And, to be able to act as AI engineers you have to understand aspects of computer science. Which leads to the last and final question - ultimately don't they all require a competent software engineer with practical skills that can deliver solutions for business? Invariably, what we find in industry is that the weakest link is the data scientist, and this is usually because companies are tarred with recruiting Phds who have no practical skills apart from their academic theory that adds little quantifiable value for business in delivering products, they neither come with software engineering skills nor do they have the applied experience mindset to appreciate it. Precisely, why so many data scientists in industry are only interested in building models, because in academics, they are rarely taught the importance of feature engineering, of how to build a scalable pipeline, nor about the entire data science method. Essentially, the data engineer role becomes a gap filler for all the inadequacies and deficiencies of an incompetent data scientist resource.