27 June 2021

Incompetent Data Scientists

What does a Machine Learning Engineer do?

Everything a Data Scientist is suppose to be able to do

What does a Data Engineer do?

Everything a Data Scientist is suppose to be able to do

What does a Knowledge Engineer do?

Everything a Data Scientist is suppose to be able to do

What does a NLP Engineer do?

Everything a Data Scientist is suppose to be able to do

What does an AI Engineer do?

Everything a Data Scientist, a Machine Learning Engineer, a NLP Engineer, a Knowledge Engineer, a Software Engineer is suppose to be able to do

So, why isn't a Data Scientist doing any of this and why so many additional roles? Why not just hire an AI Engineer and save the mumbo jumbo of roles?

Because, majority of Data Scientists in industry only like to build models and the models they build are not suitable for prime-time production nor are they evaluated correctly, nor are they built correctly. Basically, they are incompetent, many hold Phd degrees, and are never taught basic Computer Science concepts. In industry, roles are created to fill gaps in particular skills. Shockingly, all the above roles can be done by anyone with a Computer Science degree. Because, Machine Learning is part of AI, Knowledge Representation and Reasoning is part of AI, NLP is part of AI, Data Engineering is part of Computer Science, Software Engineering is part of Computer Science, Data Science is part of Computer Science, and AI is part of Computer Science. And, all these concepts are taught in a Computer Science degree. The industry has become a hodgepodge of roles because of hiring incompetent Phd people that need help with literally everything for them to do their job. And, the job they do amounts to nothing because everyone else is pretty much doing the job for them. Even the aspect of research requires some artefact to be produced which they need help with to complete their work. In fact, even a peer reviewed paper they need others to help them peer review. The useless Phd researcher is more a redundant role in industry and as accounting paperwork would show that hiring one of them leads to hiring a whole list of other people to help them do their work. Badly designed academically inclined recruitment processes, badly designed academic courses, it is a right mess that Phd individuals have created in industry and at academic institutions with a total lack of ability to convert theory into practice. Such things are only going to get worse, as more organizations look to hire clueless Phd people with the false pretence of thinking that they actually have any practical experience and expertise in their respective domains. In fairness, they likely to be more hypocritical, arrogant, egotistic, and come with huge amount of shortcomings in both their approach to work as well as in the practical application of it. Fundamentally, a Phd individual lacks the mindset to think in terms of abstractions, constraints, in terms of distributed systems, does not take into account real-world complexity, nor accounts for aspects of uncertainty that come with modeling data and the relevant scope for error. There are just too many people in industry that are clueless that will just follow the crowd and never really understand whether any of it makes sense. In fact, management and investors, in particular, tend to dictate pretentious attitudes to recruitment and to designated work roles. So long as, organizations keep desiring Phd individuals, they will have to keep recruiting more engineers to support them with everything - what an utter waste of budgets, displaced talents, and resources. The areas of Machine Learning, NLP, and Knowledge Graph techniques have been around for decades compared to the more recent role of a Data Scientist. In fact, traditionally, the role of Data Scientist did not even incorporate aspects of AI, this has been a fairly recent addition to the role function. Traditionally, Data Science used to be mostly about the application of a limited subset of Machine Learning approaches in the context of Data Mining which now overshadows the domain of Data Analysis. Even here, many people in industry make the assumption that AI is all about Machine Learning or that Data Science profession is all about Machine Learning application which could not be further from the truth. Organizations need to re-evaluate their hiring requirements and hire such roles with an engineering mindset where the person is involved in the entire end-to-end data science method rather than a Data Scientist whose only interest is in building Machine Learning models with a subjective evaluation that are inherently overfitted to the data with little to no appreciation of the entire work effort.