25 February 2020

Fake Data Engineers

How to spot the fake data engineer?
  • They prefer to use drag and drop interfaces rather than programmatically define their pipelines
  • They don't use any code control
  • They have no CI/CD process
  • They have no tests
  • They look like they have just spent their entire day coloring in their drawing book i.e a GUI
  • They don't have a clue what mapreduce is but know how to use spark (to some degree)
  • They prefer to use static SQL instead of programmatically define their DAGs
  • They have no clue what a DAG is
  • They don't have a clue what a dataframe is
  • They can't tell the difference between a stream vs batch processing
  • They don't know what immutability means
  • They smirk at the thought of thinking through the folder structure axis for a data lake
  • They turn an entire folder structure axis into internal vs external data
  • They transitioned from a SQL/BI background
  • They don't know what a computational graph is
  • They can't tell the difference between a static vs dynamic graph
  • They don't understand loose coupling or separation of concerns even between a pipeline and a model
  • They have no clue about data lineage
  • They have weak skills at abstracting out the workflow steps
  • They can't tell the difference between unstructured and structured data
  • They have no clear idea about software engineering principles
  • They class open big data stacks a very small aspect of data engineering
  • They think NoSQL literally means no SQL so therefore can only think one-dimensionally
  • They can't think outside the box to solve business cases
  • They have more experience with Azure Cloud compared to any other cloud providers
  • They prefer to use notebooks rather than development lifecycle tools for their work
  • They have never scaled a machine learning model before
  • They know the jargon of docker and kubernetes but have no clue about containers
  • They pronounce kubernetes as kuberneetes
  • They confuse serial vs parallel workflows
  • They prefer to use C#/GUIs and Powershell rather than Go, Python, Java, and Scala for their work
  • They have never attended a conference or summit that relates to their core work
  • They have more certifications especially microsoft experience on their resume than open source work
  • Somehow their workflows have always been smooth and perfectly delivered for production
  • They have the inability to breakdown problems sufficiently into epic set of stories
  • They never take ownership for when things go wrong rather they anticipate to blame others
  • They never learn from their mistakes so inherently repeat the same mistakes over and over again
  • They act like teachers in the team, but inevitably have little practical experience
  • They not very adaptable nor very curious of their work, of the data, nor with choice of tech stacks
  • They have little depth and breadth of practical experience
  • They don't take ownership from end-2-end delivery of their work
  • They never acknowledge when they are wrong, and rarely are convinced by others
  • They don't embrace modern data architecture thinking to tackle business challenges
  • They stick to what they know, but unwilling to train up for what they don't know
  • They not resourceful nor willing to scope out better open source alternatives for premium options
  • They build solutions that are of poor working quality
  • They don't ask for clarifications when they don't understand the abstractions of a data workflow
  • They not very multi-disciplined
  • They make poor problem solvers especially during critical issues
  • They don't understand the value of feature engineering
  • They approach Gartner for everything at the very minute where they need to use a fraction of brain cell
  • They will try to question or challenge you on the use of best practices which is usually an obvious evidence of their lack of experience
  • They will ask questions for which they already should know the answers as part of their job
  • They will spend more time trying to teach you how to do your job, while at same time not knowing how to do their own
  • They either over-engineer or under-engineer the solution by spending more time on optimization or how pretty something looks in their GUI
  • They generally don't like to follow standards driven approaches that could make their job easier for integration work
  • They lack much of the technical experience required for the work so try to make up for it with an air of superiority by trying to correct others at every opportunity
  • They have no clue how to clean, extract, mine, load, transform noisy data into an enriched source for consumption
  • They reject the value of metadata and semantic knowledge over the use of static SQL queries
  • They don't understand modularity so they class everything as shellcode or expect richer code (who knows what that means?), or do they expect monolithic repos that go on from the start of time of inception endlessly in an unmaintainable pile of crap?
  • Their only resource of knowledge is stackoverflow or asking others, they have an inept ability to google for things themselves, in fact for everything they likely will want to go on a certification course
  • When they fix a bug in their code they end up changing code in other places where they not supposed to or inherently introduce even more bugs
  • When you refer to any patterns or best practices they will often ask "who told you that?" or "did you just make it up yourself?" indicating they probably have never come across those approaches before, the fact that they only learn when someone formally teaches them how to do something, get insecure by realizing how inexperienced they really are, or try to be defensive by attempting to question your credibility