PMLE - Serving and Scaling Models - Section 4.2

Scale online model serving by managing and serving features with the Feature Store, deploying to public and private endpoints, choosing CPU, GPU, TPU, and edge hardware, scaling the serving backend for throughput, and tuning models for production training and serving.

Serve and manage features at prediction time using the Agent Platform Feature Store, deploy models to public or private endpoints based on network and security requirements, and select CPU, GPU, TPU, or edge hardware to match latency and throughput targets. Configure serving backend scaling to handle variable traffic without over-provisioning.

Agent Platform Feature StorePrivate endpointsServing backend scalingEdge serving

More in this domain

Serve models for batch and online inference using Agent Platform, Model Garden, Cloud Run, and GKE, packaging models from frameworks such as PyTorch and XGBoost with prebuilt and custom containers, versioning in the Model Registry, implementing rollout strategies such as A/B testing and canary deployments, and handling inference pre- and postprocessing.Section 4.1

Back to all Serving and Scaling Models objectives, or the PMLE cert hub.

Examworthy is not affiliated with or endorsed by Google Cloud. Original, blueprint-aligned practice material only.