AIF-C01 - Applications of foundation models - Section 3.4

Describe methods to evaluate foundation model performance.

Use human evaluation, benchmark datasets, and task-specific metrics to judge a model, and understand the limits of any single metric. Recognise why evaluation must reflect the real task, not a proxy.

BenchmarksHuman evaluationTask metrics

Examworthy is not affiliated with or endorsed by Amazon Web Services. Original, blueprint-aligned practice material only.