PMLE - Collaborating Within and Across Teams to Manage Data and Models - Section 2.1

Explore and preprocess data for ML across tabular, text, and image types, choosing the right tool for scale such as BigQuery, Dataflow, Apache Spark, and in-memory Python frameworks, consolidating features in the Agent Platform Feature Store, and protecting personally identifiable information.

Choose between Dataflow, Apache Spark, and in-memory Python frameworks based on data volume and type, and consolidate reusable features in the Agent Platform Feature Store to avoid training-serving inconsistency. Recognise which PII-handling techniques - such as tokenisation and data masking - are appropriate when preprocessing sensitive tabular, text, or image data.

DataflowApache SparkAgent Platform Feature StorePII handling

More in this domain

Back to all Collaborating Within and Across Teams to Manage Data and Models objectives, or the PMLE cert hub.

Examworthy is not affiliated with or endorsed by Google Cloud. Original, blueprint-aligned practice material only.