PMLE - Serving and Scaling Models - Section 4.1

Serve models for batch and online inference using Agent Platform, Model Garden, Cloud Run, and GKE, packaging models from frameworks such as PyTorch and XGBoost with prebuilt and custom containers, versioning in the Model Registry, implementing rollout strategies such as A/B testing and canary deployments, and handling inference pre- and postprocessing.

Deploy models for online and batch inference on Agent Platform, Cloud Run, and GKE, packaging frameworks such as PyTorch and XGBoost in prebuilt or custom containers and versioning them in the Model Registry. Design rollout strategies such as canary deployments and A/B testing to reduce risk, and implement inference pre- and postprocessing to handle input transformation and output formatting.

Online and batch inferenceCustom containersModel RegistryCanary deployments

More in this domain