Glossary
Certification exam glossary
The cloud, security, AI, privacy, data, and DevOps terms that recur across exam objectives, each defined once in plain English and linked to the certifications that test it. Browse the full certification catalogue for practice.
AI & Machine Learning
- Large Language Model (LLM)
A neural network trained on vast amounts of text to predict the next token, which lets it generate and reason over natural language. LLMs are the engine behind chat assistants, summarisation, and retrieval systems.
- Foundation Model
A large model pre-trained on broad data that can be adapted to many downstream tasks through prompting or fine-tuning, rather than being built for a single purpose.
- Retrieval-Augmented Generation (RAG)
A technique that grounds a language model's output in documents fetched at query time from an external knowledge source. It reduces hallucination and lets the model use current or private data it was never trained on.
- Prompt Engineering
The practice of structuring the input to a generative model - instructions, context, and examples - to steer it toward accurate, well-formatted output, without changing the model's weights.
- Fine-Tuning
Continuing the training of a pre-trained model on a smaller, task-specific dataset so it adapts to a domain or style. It changes the model's weights, unlike prompting or retrieval.
- Embedding
A numeric vector that represents the meaning of text, an image, or other data, so that similar items sit close together in vector space. Embeddings power semantic search and retrieval-augmented generation.
- Hallucination
Output from a generative model that is fluent and confident but factually wrong or unsupported by its source. Grounding techniques such as retrieval-augmented generation and human review reduce it.
- Inference
Running a trained model on new input to produce a prediction or generation, as opposed to training, which produces the model. Inference cost and latency are the main operational concerns once a model ships.
- Token
The unit of text a language model reads and generates, roughly a word fragment. Context windows, pricing, and rate limits are all measured in tokens.
- Responsible AI
The set of practices that make AI systems fair, transparent, accountable, and safe, covering bias mitigation, explainability, privacy, and human oversight. It is a named domain in most AI certifications.
Cloud
- Identity and Access Management (IAM)
The control plane that governs who can authenticate and what they are authorised to do. It is how least-privilege access is enforced in every cloud platform.
- Serverless
A model where code runs in response to events without the user provisioning or managing servers; the provider scales capacity automatically and bills per execution. AWS Lambda and Azure Functions are the canonical examples.
- Autoscaling
Automatically adding or removing compute capacity in response to demand, so an application meets load without over-provisioning. It is central to cloud cost-efficiency and elasticity.
- Region and Availability Zone
A region is a geographic location containing multiple isolated data centres called availability zones. Spreading workloads across availability zones gives high availability within a region; spreading across regions gives disaster recovery and lower latency.
- Infrastructure as Code (IaC)
Defining and provisioning cloud resources through machine-readable templates rather than manual configuration, making environments versioned, repeatable, and reviewable. Terraform, CloudFormation, and Bicep are common tools.
- Object Storage
Storage that keeps data as discrete objects with metadata in a flat namespace, accessed over HTTP, scaling to virtually unlimited capacity. Amazon S3 and Azure Blob Storage are the standard services for backups, media, and data lakes.
Security
- Zero Trust
A security model that treats no user, device, or network as inherently trustworthy; every request is authenticated, authorised, and encrypted regardless of where it originates. 'Never trust, always verify' is the guiding principle.
- Principle of Least Privilege
Granting each user, service, or process only the permissions it needs to do its job, and no more. It limits the blast radius of a compromised credential.
- Multi-Factor Authentication (MFA)
Requiring two or more independent proofs of identity - something you know, have, or are - before granting access. It is the single most effective control against credential theft.
- Security Information and Event Management (SIEM)
A system that collects and correlates log and event data from across an environment to detect, alert on, and investigate security incidents. It is the analyst's central pane of glass in a security operations centre.
- Defence in Depth
Layering multiple independent security controls so that the failure of any one does not expose the asset. Physical, network, host, application, and data controls each add a layer.
- Encryption at Rest and in Transit
Encryption at rest protects stored data; encryption in transit protects data moving over a network. Together they keep data unreadable to anyone without the keys, whether it is sitting in a database or crossing the internet.
- Phishing
A social-engineering attack that tricks a victim into revealing credentials or running malware, usually through a deceptive email or message. It remains the most common initial-access vector in breaches.
- Threat, Vulnerability, and Risk
A threat is a potential cause of harm, a vulnerability is a weakness it could exploit, and risk is the likelihood and impact of that happening. Security programmes manage risk by reducing vulnerabilities and countering threats.
Privacy & Governance
- General Data Protection Regulation (GDPR)
The European Union law governing how the personal data of EU residents is collected, processed, and protected, with significant fines for non-compliance. It applies to any organisation handling that data, wherever the organisation is based.
- Personally Identifiable Information (PII)
Any data that can identify a specific individual, directly or in combination with other data, such as a name, email, or device identifier. Protecting it is the core obligation of most privacy laws.
- Data Controller vs Processor
A data controller decides why and how personal data is processed; a data processor acts on the controller's instructions. The distinction sets who is accountable under the GDPR and what each party must contract for.
- Data Minimisation
The principle of collecting and keeping only the personal data necessary for a stated purpose, and no more. It reduces both privacy risk and breach exposure.
- AI Governance
The framework of policies, roles, and controls that ensures AI systems are developed and used lawfully, ethically, and safely across their lifecycle. It is the subject of the IAPP AIGP certification.
Data
- ETL vs ELT
ETL extracts data, transforms it, then loads it into a warehouse; ELT loads raw data first and transforms it inside the warehouse. ELT suits modern cloud warehouses with cheap, scalable compute.
- Data Warehouse vs Data Lake
A data warehouse stores structured, modelled data optimised for analytics queries; a data lake stores raw data of any format at low cost. A 'lakehouse' combines both, adding warehouse-style management over lake storage.
- OLTP vs OLAP
OLTP systems handle many small, fast transactions for running a business, such as orders; OLAP systems run large analytical queries over historical data for insight. They are optimised for opposite workloads.
- Star Schema
A way of modelling analytical data as a central fact table of measurements surrounded by dimension tables of descriptive attributes. Its simplicity makes warehouse queries fast and easy to write.
DevOps
- CI/CD
Continuous integration merges and tests code changes frequently; continuous delivery and deployment automate releasing those changes to production. Together they shorten the path from commit to live with fewer manual steps.
- Blue-Green Deployment
Running two identical production environments and switching traffic from the old (blue) to the new (green) once it is verified. It enables near-zero-downtime releases and instant rollback.
- Container
A portable, isolated unit that packages an application with its dependencies so it runs the same way across environments. Containers are lighter than virtual machines because they share the host operating system kernel.
- Container Orchestration
Automating the deployment, scaling, networking, and healing of containers across a cluster. Kubernetes is the de facto standard.