PMLE - Scaling Prototypes Into ML Models - Section 3.3

Choose appropriate hardware for training, evaluating CPU, GPU, and TPU options and understanding distributed training across GPUs and TPUs using data and model parallelism strategies.

Compare CPU, GPU, and TPU options for ML training workloads, recognising the throughput and memory trade-offs that make each accelerator appropriate for a given model size and batch size. Distinguish data parallelism from model parallelism and understand how each strategy distributes computation across multiple devices during distributed training.

CPU, GPU, and TPUDistributed trainingData and model parallelism

More in this domain

Choose the model type and Google Cloud product given cost, complexity, latency, and scalability, selecting among options such as ARIMA, DNN, and LLM and products such as Agent Platform AutoML, BigQuery ML, and Agent Platform Pipelines, and deciding deployment and interpretability strategies.Section 3.1
Train models by organising structured and unstructured data on Cloud Storage and BigQuery, ingesting from various sources, using SDKs such as Agent Platform custom training, Kubeflow on GKE, AutoML, and Tabular Workflows, troubleshooting training failures, tuning hyperparameters, and fine-tuning foundation models.Section 3.2

Back to all Scaling Prototypes Into ML Models objectives, or the PMLE cert hub.

Examworthy is not affiliated with or endorsed by Google Cloud. Original, blueprint-aligned practice material only.