Mabble Rabble: Fake Data Engineers

25 February 2020

Fake Data Engineers

How to spot the fake data engineer?

They prefer to use drag and drop interfaces rather than programmatically define their pipelines
They don't use any code control
They have no CI/CD process
They have no tests
They look like they have just spent their entire day coloring in their drawing book i.e a GUI
They don't have a clue what mapreduce is but know how to use spark (to some degree)
They prefer to use static SQL instead of programmatically define their DAGs
They have no clue what a DAG is
They don't have a clue what a dataframe is
They can't tell the difference between a stream vs batch processing
They don't know what immutability means
They smirk at the thought of thinking through the folder structure axis for a data lake
They turn an entire folder structure axis into internal vs external data
They transitioned from a SQL/BI background
They don't know what a computational graph is
They can't tell the difference between a static vs dynamic graph
They don't understand loose coupling or separation of concerns even between a pipeline and a model
They have no clue about data lineage
They have weak skills at abstracting out the workflow steps
They can't tell the difference between unstructured and structured data
They have no clear idea about software engineering principles
They class open big data stacks a very small aspect of data engineering
They think NoSQL literally means no SQL so therefore can only think one-dimensionally
They can't think outside the box to solve business cases
They have more experience with Azure Cloud compared to any other cloud providers
They prefer to use notebooks rather than development lifecycle tools for their work
They have never scaled a machine learning model before
They know the jargon of docker and kubernetes but have no clue about containers
They pronounce kubernetes as kuberneetes
They confuse serial vs parallel workflows
They prefer to use C#/GUIs and Powershell rather than Go, Python, Java, and Scala for their work
They have never attended a conference or summit that relates to their core work
They have more certifications especially microsoft experience on their resume than open source work
Somehow their workflows have always been smooth and perfectly delivered for production
They have the inability to breakdown problems sufficiently into epic set of stories
They never take ownership for when things go wrong rather they anticipate to blame others
They never learn from their mistakes so inherently repeat the same mistakes over and over again
They act like teachers in the team, but inevitably have little practical experience
They not very adaptable nor very curious of their work, of the data, nor with choice of tech stacks
They have little depth and breadth of practical experience
They don't take ownership from end-2-end delivery of their work
They never acknowledge when they are wrong, and rarely are convinced by others
They don't embrace modern data architecture thinking to tackle business challenges
They stick to what they know, but unwilling to train up for what they don't know
They not resourceful nor willing to scope out better open source alternatives for premium options
They build solutions that are of poor working quality
They don't ask for clarifications when they don't understand the abstractions of a data workflow
They not very multi-disciplined
They make poor problem solvers especially during critical issues
They don't understand the value of feature engineering
They approach Gartner for everything at the very minute where they need to use a fraction of brain cell
They will try to question or challenge you on the use of best practices which is usually an obvious evidence of their lack of experience
They will ask questions for which they already should know the answers as part of their job
They will spend more time trying to teach you how to do your job, while at same time not knowing how to do their own
They either over-engineer or under-engineer the solution by spending more time on optimization or how pretty something looks in their GUI
They generally don't like to follow standards driven approaches that could make their job easier for integration work
They lack much of the technical experience required for the work so try to make up for it with an air of superiority by trying to correct others at every opportunity
They have no clue how to clean, extract, mine, load, transform noisy data into an enriched source for consumption
They reject the value of metadata and semantic knowledge over the use of static SQL queries
They don't understand modularity so they class everything as shellcode or expect richer code (who knows what that means?), or do they expect monolithic repos that go on from the start of time of inception endlessly in an unmaintainable pile of crap?
Their only resource of knowledge is stackoverflow or asking others, they have an inept ability to google for things themselves, in fact for everything they likely will want to go on a certification course
When they fix a bug in their code they end up changing code in other places where they not supposed to or inherently introduce even more bugs
When you refer to any patterns or best practices they will often ask "who told you that?" or "did you just make it up yourself?" indicating they probably have never come across those approaches before, the fact that they only learn when someone formally teaches them how to do something, get insecure by realizing how inexperienced they really are, or try to be defensive by attempting to question your credibility