31 December 2019

Why DataRobot Sucks

  • Limited machine learning algorithms and limited ways to optimize them for use cases
  • Limited flexibility of supporting data features
  • Limited Sub-selection model results from the limited choice of models
  • Ugly visualization for performance and comparisons
  • The blender option for ensemble methods and choices is limited
  • To do anything complex it is too limited
  • Doesn't replace the value of handcrafted models and the value of feature engineering process
  • Why automate feature engineering especially when the process is what allows one to build a generalizable model and with a better understanding of the business cases
  • Limited ways to evaluate the choice of model
  • Limited ways of import/export model for build/deployment
  • Tight coupling to a third-party way of doing things
  • Doesn't follow the formal data science method
  • In fact, doesn't even replace 99% of the work
  • GDPR and governance issues when processing data through the third-party models
  • Benchmarks on models are not available for public peer review
  • Cost far exceeds the benefit
  • Many of the models provided have less than optimal outcomes - low confidence scores
  • Not good for complicated analysis, best to keep use cases as simple as possible
  • Productivity gain is only with very limited and simple cases
  • Invariably, most business cases have noisy data and require custom models where it will be unworkable and useless to work against complexity and ambiguity
  • One needs to learn another third-party tool and be willing to trust the model solution blindly
  • Quick wins and successes is not the answer nor the solution if it does not solve the business case
  • The reality is there is no free or easy lunch for most business cases and one has to put in the time and effort
  • No need to cheat one's way through the process
  • Unconvincing autopilot feature - there is no such thing as autopilot in machine learning in fact there is no real community standards or even patterns defined so how can one jump decades in progress
  • Solution is useful to people that can't code, don't like to code, don't understand the data science method, and prefer simple drag and drop options
  • Data input is treated in most cases as a table, not workable for noisy unstructured data
  • Often will end up with overfitted models when the whole point of machine learning is to build generalizable models
  • No flexible options for transfer learning on unseen data
  • And, that isn't even the end of it...