Edited, from a transcription and summary of a recent talk, in collaboration with Joel Milag, Wow AI and Team
In this insightful session, Dr. Ali Arsanjani, Director of Cloud Partner Engineering at Google Cloud, discussed with the audience how to leverage AI/ML throughout the lifecycle to increase organizational maturity in the use of ML and how to add business value as a result.
Check out his keynote on our website and YouTube channel.
6 levels of organizational maturity of AI/ML
Ali suggested answering a few questions to measure the achievement of higher organizational maturity levels in the adoption, usage and activation of AI/ML for achieving business value across the ML life cycle:
Level 1: Have the data science and machine learning goals for explicitly clarified use cases been defined in terms of clustering, classification, regression, forecasting, and recommendation AI?
Level 2: Is your team consistently applying the entire life cycle for machine learning along with different phases of data cleansing, feature store, selecting an algorithm and training, evaluating the results, explainability and going into hyperparameter optimization, deploying to an endpoint, monitoring the endpoint, having a pipeline and doing it all over again?
Level 3: Do you have a documented reference for activities within the machine learning lifecycle?
Level 4: Have you been able to automate some, if not most tasks?
Level 5: Are you measuring, tracking activities, and retrospectively recognizing bottlenecks?
Level 6: Is the team continuously use data and new learnings to improve the machine learning process?
These are questions you need to ask yourself, your vendors, and your teams periodically to measure and assess the process.
Addressing 4 hidden technical debts in ML systems
Dr. Arsanjani went on to reference a 2017 paper published by his colleagues called “Hidden Technical Debt in Machine Learning Systems”. He specifically emphasized the following tech debt areas that are often overlooked in favor of more common tech debt.
Data Testing Debt
If data replaces code in ML and code should be tested, then some amount of input data testing is critical to a well-functioning system.
Usually, once good results are obtained, we move on but re-running experiments with different types of data to garner similar results is extremely important.
Process Management Debt
There is a problem with updating configurations for similar models safely and automatically and how to manage and assign resources among the models with these different business priorities so you can visualize and detect blockers in the operations performed across the MLOps lifecycle using pipelines.
Creating team cultures that reward the rationalization, reduction of features, general reduction of overall complexity, improvements in reproducibility and scalability, and monitoring to the same degree that improvements in accuracy are valued is one of the most important factors.
Ali believed that these less visible / less popularly considered elements of hidden technical debt are vital to enhancing and accelerating your maturity across the AI/ML lifecycle.
6 Components of ML adoption
He then summed up the six components that are key to the adoption of ML.
- Data Activation requires Data Lakes ingestion and Data Prep capability via Data Prep pipeline
- Data for AI calls for dataprep, labeling and featurization
- Models entail considering experiments and explainability as first class constructs
- Models demand training and optimization and ultimately depositing the meta data of the experiments into a registry
- Models are served but need to be monitored for various types of drift or skew
- The entire ML life cycle should be supported by data, training, and inference (deployment and monitoring) pipelines
Key components of the ML journey in more detail
Component 1: Data & The Data Lake
BigLake is a secure and governed data lake that gathers data from different clouds and there are self-serve analytics as well as interoperability between AI and those analytics. BigLake basically expands BigQuery and allows you to take into account Spark, TensorFlow, Presto, or other sets of capabilities.
A few new BigLake capabilities also let you build a differentiated data platform, such as Analytics Hub, BigQuery ML support, BigQuery Omni, and the upcoming Cloud Data Loss Prevention profiling support and data masking & audit logging.
Component 2: The Feature Store
Feature sharing is used across different use cases. To leverage one feature, you have to insert it into a feature store after it’s cleaned and prepared so everyone can use it. The Vertex AI feature store, as recommended by Dr. Ali Arsanjani, comes in handy in this situation. It allows continuous training, experimentation, integration with CI/CD and serving and monitoring.
Component 3 & 4: Experiments and the Model Registry
According to Ali, companies would want to be able to manage data lineage, training details, hyper-parameters used in each experiment by compiling this meta-data depositing it in a model registry, such as Vertex AI Registry.
If Data Gravity is a key requirement and you do not want to move the tabular and/or unstructured data, that may already be residing in say, BigQuery, BigQuery ML can be used to run a diverse array of algorithms for classification, regression, model operations, and time series right there from the BigQuery data warehouse. The results can be integrated into Vertex AI Registry.
If you wish or are ok to move data, you can use the full power of Pre-trained, AutoML or Custom training that Vertex AI offers. You can choose to use the Vertex AI workbench to run data science experiments in python that can help you enrich the data and put it in a place you can leverage, train, optimize and deploy models.
Component 5: The MLOps Pipeline
The experimentation, retraining, model deployment, and continuous monitoring are integrated together in a governance framework to achieve an end-to-end MLOps capability with Vertex AI pipelines. Ali also emphasized the importance of XAI : having explainability integrated into the data, model building, inference portions of the overall AI/ML lifecycle.
Screenshot taken from Ali Arsanjani’s presentation
Dr. Arsanjani ended his keynote leaving the audience with the idea that maturity, the ability to actualize the business benefits of AI/ML, evolves as the ability of the organization to leverage each portion of the lifecycle and to traverse the AI/ML digital innovation spectrum grows. By picking the best-fit tool for the part of the lifecycle your organization is currently focused on, you can nurture, develop, scale and operationalize AI/ML capabilities, you’ll be able to integrate these capabilities not as siloes but as a continuous digital innovation spectrum that different teams in your organization can traverse and derive business benefit.