- Identify and understand business case (as a use case or story) - in most cases you are not provided a translated use case or story so it is really about understanding the problem
- Explore and prototype including background research (exploratory stage)
- Identify cases for reuse
- Identify whether this story even requires a model
- Identify relevant datasets - curation
- Visualize the data (how sparse/dense/dirty it is, multiple open source tools available for refinement steps for features, identify additional effort necessary for model build)
- Identify the relevant variances and biases (will the model steps lead to an overfitting or underfitting - the objective is to build a generalizable model)
- Feature Selection/Extraction (may use other ML or natural computation techniques here)
- Feature engineering (this may also include curation/enrichment of metadata)
- Feature re-engineering (this may also include curation/enrichment of metadata)
- Identify the simplest solution that is possible
- Identify the reasoning of using a complex solution
- Custom model to solve the business case (do not just copy model out of a research paper - this is what the exploratory stage was for)
- Evaluation and Benchmarking (formal tests may/may not be used, depends on business case)
- How well does the model scope against small data and large data - identify sufficiency at average and worst time/cost
- Re-Tune/Rinse/Repeat
- Incrementally improve the model
- Incrementally optimize/scale the model (scale only when necessary)
- One simple one complex - one that is sub-par, and one that is riskier
- Evaluation and Documentation
- Pipeline the Solution in Dev-mode (Identify bottlenecks with the model - dry run/end-2-end for integration - at this stage a repeatable build/test/deploy/evaluate cycle may be used - DS/DE)
- A/B/N/Bandit Testing in Stage (generally this stage is covered by the product team, alongside automated acceptance tests, if they know the techniques, or DS/DE maybe involved)
- Release/Integrate for Production (depends whether this is a B2B or B2C case, or beta mode)
- Storytelling (how well does the model answer/solve the question or problem statement - 'through the looking glass’ - refers to both dev, stage/prod cases)
- User/Stakeholder/Client Feedback (Rinse & Repeat, depending on B2B or B2C cases)
- Incremental Analysis and Review of Models
- Rinse & Repeat (some of the steps above repeated multiple times before production release)
Process Flow:
R&D → Dev/UI/UX → Prod
Generally, with a heavy R&D/Backend focused team, the features and functionality tend to be dictated by the forward flow (Bottom-up approach), most AI projects at startups tend to be built that way. The frontend then becomes a thin client as a view to the world for assimilation of the backend efforts, typical pattern tends to be an informational dashboard for storytelling- 'through the looking glass'. This is because, in a top-down approach many of the backend efforts would get lost in translation (equally, in some business cases it may work better).
Data → Information → Knowledge
State-of-the-Art may not imply state-of-the-art for your business case and may in fact lead to a sub-optimal results and more effort. It is all very subjective, depends on the data, the associated features for training a model, and the business case you are trying to solve. Work towards least effort, mostly efficient or sensible outcome.