Glossary

Certification exam glossary

The cloud, security, AI, privacy, data, and DevOps terms that recur across exam objectives, each defined once in plain English and linked to the certifications that test it. Browse the full certification catalogue for practice.

AI & Machine Learning

Large Language Model (LLM)

A neural network trained on vast amounts of text to predict the next token, which lets it generate and reason over natural language. LLMs are the engine behind chat assistants, summarisation, and retrieval systems.

Tested in:AIF-C01NCA-GENLPMLE

See also:Foundation ModelTokenRetrieval-Augmented Generation (RAG)

Foundation Model

A large model pre-trained on broad data that can be adapted to many downstream tasks through prompting or fine-tuning, rather than being built for a single purpose.

Tested in:AIF-C01NCA-GENL

See also:Large Language Model (LLM)Fine-Tuning

Retrieval-Augmented Generation (RAG)

A technique that grounds a language model's output in documents fetched at query time from an external knowledge source. It reduces hallucination and lets the model use current or private data it was never trained on.

Tested in:AIF-C01NCA-GENLPMLE

See also:EmbeddingHallucinationLarge Language Model (LLM)

Prompt Engineering

The practice of structuring the input to a generative model - instructions, context, and examples - to steer it toward accurate, well-formatted output, without changing the model's weights.

Tested in:AIF-C01NCA-GENL

See also:Fine-TuningLarge Language Model (LLM)

Fine-Tuning

Continuing the training of a pre-trained model on a smaller, task-specific dataset so it adapts to a domain or style. It changes the model's weights, unlike prompting or retrieval.

Tested in:AIF-C01NCA-GENLPMLE

See also:Foundation ModelPrompt Engineering

Embedding

A numeric vector that represents the meaning of text, an image, or other data, so that similar items sit close together in vector space. Embeddings power semantic search and retrieval-augmented generation.

Tested in:AIF-C01NCA-GENLPMLE

See also:Retrieval-Augmented Generation (RAG)

Hallucination

Output from a generative model that is fluent and confident but factually wrong or unsupported by its source. Grounding techniques such as retrieval-augmented generation and human review reduce it.

Tested in:AIF-C01NCA-GENLAIGP

See also:Retrieval-Augmented Generation (RAG)Responsible AI

Inference

Running a trained model on new input to produce a prediction or generation, as opposed to training, which produces the model. Inference cost and latency are the main operational concerns once a model ships.

Tested in:AIF-C01NCA-AIIONCA-ADSPMLE

Token

The unit of text a language model reads and generates, roughly a word fragment. Context windows, pricing, and rate limits are all measured in tokens.

Tested in:AIF-C01NCA-GENL

See also:Large Language Model (LLM)

Responsible AI

The set of practices that make AI systems fair, transparent, accountable, and safe, covering bias mitigation, explainability, privacy, and human oversight. It is a named domain in most AI certifications.

Tested in:AIF-C01AIGPPMLE

See also:AI GovernanceHallucination

Cloud

Shared Responsibility Model

The division of security duties between the cloud provider, who secures the infrastructure, and the customer, who secures their data, identities, and configuration. Where the line falls depends on the service model (IaaS, PaaS, SaaS).

Tested in:CLF-C02AZ-900SC-900SAA-C03

See also:Identity and Access Management (IAM)

Identity and Access Management (IAM)

The control plane that governs who can authenticate and what they are authorised to do. It is how least-privilege access is enforced in every cloud platform.

Tested in:AZ-900AZ-104CLF-C02SAA-C03SC-300

See also:Principle of Least PrivilegeMulti-Factor Authentication (MFA)

Serverless

A model where code runs in response to events without the user provisioning or managing servers; the provider scales capacity automatically and bills per execution. AWS Lambda and Azure Functions are the canonical examples.

Tested in:SAA-C03DVA-C02AZ-900CLF-C02

See also:Autoscaling

Autoscaling

Automatically adding or removing compute capacity in response to demand, so an application meets load without over-provisioning. It is central to cloud cost-efficiency and elasticity.

Tested in:SAA-C03AZ-104CLF-C02

See also:Serverless

Region and Availability Zone

A region is a geographic location containing multiple isolated data centres called availability zones. Spreading workloads across availability zones gives high availability within a region; spreading across regions gives disaster recovery and lower latency.

Tested in:SAA-C03AZ-104CLF-C02

Infrastructure as Code (IaC)

Defining and provisioning cloud resources through machine-readable templates rather than manual configuration, making environments versioned, repeatable, and reviewable. Terraform, CloudFormation, and Bicep are common tools.

Tested in:DOP-C02AZ-400SAA-C03

See also:CI/CD

Object Storage

Storage that keeps data as discrete objects with metadata in a flat namespace, accessed over HTTP, scaling to virtually unlimited capacity. Amazon S3 and Azure Blob Storage are the standard services for backups, media, and data lakes.

Tested in:SAA-C03AZ-900DP-900

See also:Data Warehouse vs Data Lake

Security

Zero Trust

A security model that treats no user, device, or network as inherently trustworthy; every request is authenticated, authorised, and encrypted regardless of where it originates. 'Never trust, always verify' is the guiding principle.

Tested in:SC-900SC-100SY0-701

See also:Principle of Least PrivilegeMulti-Factor Authentication (MFA)

Principle of Least Privilege

Granting each user, service, or process only the permissions it needs to do its job, and no more. It limits the blast radius of a compromised credential.

Tested in:SY0-701SC-300CISSPCISMSC-900

See also:Zero TrustIdentity and Access Management (IAM)

Multi-Factor Authentication (MFA)

Requiring two or more independent proofs of identity - something you know, have, or are - before granting access. It is the single most effective control against credential theft.

Tested in:SC-900SY0-701SC-300CISSP

See also:Zero TrustPhishing

Security Information and Event Management (SIEM)

A system that collects and correlates log and event data from across an environment to detect, alert on, and investigate security incidents. It is the analyst's central pane of glass in a security operations centre.

Tested in:SC-200SY0-701CISM

Defence in Depth

Layering multiple independent security controls so that the failure of any one does not expose the asset. Physical, network, host, application, and data controls each add a layer.

Tested in:SY0-701CISSPSC-900

Encryption at Rest and in Transit

Encryption at rest protects stored data; encryption in transit protects data moving over a network. Together they keep data unreadable to anyone without the keys, whether it is sitting in a database or crossing the internet.

Tested in:SC-900SY0-701SAA-C03CISSP

Phishing

A social-engineering attack that tricks a victim into revealing credentials or running malware, usually through a deceptive email or message. It remains the most common initial-access vector in breaches.

Tested in:SY0-701CISMCISSP

See also:Multi-Factor Authentication (MFA)

Threat, Vulnerability, and Risk

A threat is a potential cause of harm, a vulnerability is a weakness it could exploit, and risk is the likelihood and impact of that happening. Security programmes manage risk by reducing vulnerabilities and countering threats.

Tested in:CISMCRISCCISACISSPSY0-701

Privacy & Governance

General Data Protection Regulation (GDPR)

The European Union law governing how the personal data of EU residents is collected, processed, and protected, with significant fines for non-compliance. It applies to any organisation handling that data, wherever the organisation is based.

Tested in:CIPP-EAIGP

See also:Personally Identifiable Information (PII)Data Controller vs Processor

Personally Identifiable Information (PII)

Any data that can identify a specific individual, directly or in combination with other data, such as a name, email, or device identifier. Protecting it is the core obligation of most privacy laws.

Tested in:CIPP-USCIPP-EAIGP

See also:Data Minimisation

Data Controller vs Processor

A data controller decides why and how personal data is processed; a data processor acts on the controller's instructions. The distinction sets who is accountable under the GDPR and what each party must contract for.

Tested in:CIPP-E

See also:General Data Protection Regulation (GDPR)

Data Minimisation

The principle of collecting and keeping only the personal data necessary for a stated purpose, and no more. It reduces both privacy risk and breach exposure.

Tested in:CIPP-ECIPP-USAIGP

See also:Personally Identifiable Information (PII)

AI Governance

The framework of policies, roles, and controls that ensures AI systems are developed and used lawfully, ethically, and safely across their lifecycle. It is the subject of the IAPP AIGP certification.

Tested in:AIGPAIF-C01

See also:Responsible AI

Data

ETL vs ELT

ETL extracts data, transforms it, then loads it into a warehouse; ELT loads raw data first and transforms it inside the warehouse. ELT suits modern cloud warehouses with cheap, scalable compute.

Tested in:DP-900DP-700COF-C03PDE

See also:Data Warehouse vs Data Lake

Data Warehouse vs Data Lake

A data warehouse stores structured, modelled data optimised for analytics queries; a data lake stores raw data of any format at low cost. A 'lakehouse' combines both, adding warehouse-style management over lake storage.

Tested in:DP-900DP-600COF-C03PDE

See also:Object StorageOLTP vs OLAP

OLTP vs OLAP

OLTP systems handle many small, fast transactions for running a business, such as orders; OLAP systems run large analytical queries over historical data for insight. They are optimised for opposite workloads.

Tested in:DP-900DP-600

Star Schema

A way of modelling analytical data as a central fact table of measurements surrounded by dimension tables of descriptive attributes. Its simplicity makes warehouse queries fast and easy to write.

Tested in:DP-600PL-300PDE

See also:Data Warehouse vs Data Lake

DevOps

CI/CD

Continuous integration merges and tests code changes frequently; continuous delivery and deployment automate releasing those changes to production. Together they shorten the path from commit to live with fewer manual steps.

Tested in:AZ-400DOP-C02GH-200GH-500

See also:Blue-Green DeploymentInfrastructure as Code (IaC)

Blue-Green Deployment

Running two identical production environments and switching traffic from the old (blue) to the new (green) once it is verified. It enables near-zero-downtime releases and instant rollback.

Tested in:AZ-400DOP-C02SAA-C03

See also:CI/CD

Container

A portable, isolated unit that packages an application with its dependencies so it runs the same way across environments. Containers are lighter than virtual machines because they share the host operating system kernel.

Tested in:AZ-400DOP-C02GH-200

See also:Container Orchestration

Container Orchestration

Automating the deployment, scaling, networking, and healing of containers across a cluster. Kubernetes is the de facto standard.

Tested in:DOP-C02AZ-400SAA-C03

See also:Container