Amazon Web Services study guide

How to pass AWS Certified CloudOps Engineer - Associate (SOA-C03)

26 min read5 domains coveredFree practice, no sign-up

The AWS Certified CloudOps Engineer - Associate (SOA-C03) tests one thing above feature recall: can you operate a running AWS workload correctly. Amazon hands you an operational situation, a metric that will not appear, a remediation that must happen without a human, a recovery time and recovery point you have to meet, a permission that must be the minimum, and asks which AWS service or setting does the job. The hard part is rarely knowing what a service is. It is knowing which option meets the stated operational requirement with the least manual effort when three of the four answers look plausible.

It suits people who already run workloads on AWS: systems administrators, operations and CloudOps engineers, and the SRE-minded who monitor, automate, secure and troubleshoot day to day. The exam draws across five weighted domains, with monitoring and remediation, reliability and business continuity, and deployment and automation each carrying the most marks and tied at the top, followed by networking and then security. SOA-C03 replaces the old SOA-C02 SysOps exam: the name changed from SysOps Administrator to CloudOps Engineer, the structure dropped from six domains to five, the standalone cost domain was folded away, and the hands-on exam labs were removed, so every question is now multiple choice or multiple response.

The exam rewards operational decision rules, not memorised limits. Most questions are short scenarios where two or three answers are technically capable and only one is the best fit once you weigh the constraint that was named: automate with no operator action, meet a five-minute recovery point, grant the minimum permission, or fix it with the least operational overhead. Practising on scenario questions with a worked explanation, and a reason every wrong option is wrong, beats reading service overviews because the skill under test is choosing correctly under that pressure.

SOA-C03 is an operate-it-correctly exam: almost every question is a scenario with a monitoring, remediation, recovery, security or networking constraint, and the right answer is the AWS operations service or setting that meets it with the least manual effort.

Difficulty

Intermediate

Best for

Working AWS operators: systems administrators, operations and CloudOps engineers, and SRE-minded practitioners who deploy, monitor, secure, automate and troubleshoot workloads on AWS and need to prove they can keep them running under real constraints.

Prerequisites

None enforced. AWS recommends around one year of hands-on experience operating workloads on AWS plus experience in a related operations role. Practical exposure to CloudWatch, Systems Manager, CloudFormation, IAM, VPC networking and the managed databases is what actually carries you through the scenarios.

65
Questions
130 min
Time allowed
720 / 1000
Pass mark
$150
Exam cost (USD)
320
Practice questions

How this exam thinks

One habit decides this exam: read the scenario for its operational constraint, then pick the service or setting built to meet it. Almost every question is a short situation with a stated limit, automate without a human, meet a recovery time or point, grant least privilege, surface a signal that is missing, or fix it with the least operational overhead, and the answer is the AWS operations service that meets that limit. Several options will be technically capable. Only one is the best fit once you weigh what the scenario actually asked for.

The default tie-breaker is the managed, automated, least-operational-overhead option. AWS builds the exam around its own preference for managed automation, so when two answers both work, the one with less to run and operate usually wins: an EventBridge rule invoking a Systems Manager Automation runbook over an SNS email to on-call, AWS Backup over a hand-rolled snapshot script, Session Manager over a bastion host, a gateway VPC endpoint over a NAT gateway for S3. Reach for the manual or heavier option only when the scenario names a reason. That reason is the signal that the obvious automated answer is the trap.

The rest is a handful of operational discriminations the exam leans on. For monitoring, a metric filter turns a log pattern into a metric, an alarm watches a metric and acts, a composite alarm combines alarm states, EventBridge routes an event, and a runbook performs the fix, and remember EC2 publishes no memory or disk metric without the CloudWatch agent. For continuity, Multi-AZ survives a zone fault while cross-Region is for disaster recovery, and the DR tier (backup and restore, pilot light, warm standby) is chosen by the named RTO and RPO. For automation, drift detection reports out-of-band changes while a change set previews an update, and StackSets deploy across accounts and Regions. For security, least privilege through a scoped IAM role beats stored keys, and an SCP caps what a whole account can ever do. For networking, security groups are stateful and network ACLs are stateless, and Reachability Analyzer names a blocking component without sending traffic. Name the constraint, then choose the service built for it.

What each domain tests and how to study it

The SOA-C03 blueprint is split across 5 domains. Weights are the official share of the exam; see the official exam guide for the authoritative breakdown.

  1. Monitoring, Logging, Analysis, Remediation, and Performance Optimization

    22% of exam

    What you must be able to do. Given a monitoring, logging, remediation or performance scenario, choose the CloudWatch, EventBridge, Systems Manager or tuning approach that surfaces the right signal and remediates or optimises with the least manual effort.

    In one sentenceThe largest domain, tied at the top: instrument with CloudWatch and CloudTrail, alarm on the right metric, automate the fix with EventBridge and Systems Manager, and tune compute, storage and database performance.

    Recall check: answer these from memory first
    • An EC2 fleet must alarm on memory and disk usage but the default monitoring is unchanged. What collects those signals, and why are they absent from the AWS/EC2 namespace?
    • Name the component for each role: turn a log pattern into a metric; watch a metric and act; combine several alarm states; route an event to a remediation target; perform the multi-step fix itself.
    • A breached health metric must trigger an automatic fix with no operator action. What invokes the remediation, and why is an SNS email to on-call the wrong answer?

    What it tests. Operating the observability and remediation stack. Configuring CloudWatch metrics, alarms and metric filters and the CloudWatch agent to collect in-guest signals from EC2, Amazon ECS and Amazon EKS (memory and disk are not collected by default); centralising logs and traces with CloudWatch Logs, AWS CloudTrail, Amazon Managed Service for Prometheus, Amazon Managed Grafana and AWS X-Ray; building cross-account and cross-Region dashboards and routing alarm notifications through Amazon SNS; automating remediation with Amazon EventBridge, AWS Lambda, AWS Systems Manager Automation runbooks and auto scaling; and optimising performance across EC2 and Amazon EBS (volume types, placement groups), Amazon S3 and shared storage (Transfer Acceleration, multipart upload, AWS DataSync, Amazon EFS and Amazon FSx), and Amazon RDS and caching (Performance Insights, RDS Proxy, ElastiCache, DynamoDB Accelerator).

    How to study it. Make "what signal, what fix" your reflex. Learn that EC2 publishes no memory or disk metric to the AWS/EC2 namespace by default, so in-guest signals require the CloudWatch agent, never detailed monitoring. Fix the role split until it is automatic: a metric filter turns a log pattern into a metric, an alarm watches a metric and acts (it can never watch a log group directly), a composite alarm combines child alarm states, an EventBridge rule routes an event to a target, and a Systems Manager Automation runbook performs the multi-step fix. When remediation must happen with no operator in the loop, the answer is an alarm or EventBridge rule invoking a runbook or Lambda, not an SNS email to on-call. For performance, match the lever to the bottleneck: gp3 to set IOPS and throughput independently at lower cost, Provisioned IOPS io2 for the highest consistent IOPS, placement groups for low-latency cluster networking, RDS Proxy to pool database connections, DynamoDB Accelerator for microsecond key-value reads, and ElastiCache for hot relational reads.

    Easy to confuse

    • CloudWatch alarm versus metric filter. A metric filter scans incoming CloudWatch Logs events and increments a metric whenever a pattern matches; an alarm evaluates a metric over a period and triggers an action. You need a filter first to alarm on a log pattern, because an alarm cannot watch a log group directly.
    • EventBridge rule versus CloudWatch alarm. A CloudWatch alarm watches a numeric metric against a threshold over time; an EventBridge rule matches the shape of an event and routes it to targets. Use an alarm for metric thresholds and an EventBridge rule for event-driven routing such as a resource state change or an API call captured by CloudTrail.
    • Systems Manager Automation runbook versus Run Command. Run Command runs a single command or script across instances now; an Automation runbook orchestrates a multi-step workflow across AWS APIs and resources such as stop, snapshot, replace and notify. Use a runbook for multi-step remediation and Run Command for a one-off in-guest command.
    • CloudWatch agent versus detailed monitoring. Detailed monitoring only raises the publish frequency of existing hypervisor metrics to one minute; the CloudWatch agent adds in-guest memory, disk and custom metrics the hypervisor cannot see. Detailed monitoring never adds memory or disk, so the agent is the answer when those signals are required.

    Worked example from the SOA-C03 bank

    Free sampleMonitoring, Logging, Analysis, Remediation, and Performance Optimizationmedium

    An operations team must alarm on the percentage of used memory and free disk space on a fleet of EC2 instances running a standard Amazon Linux AMI. They have not changed the default monitoring configuration. What is the most operationally efficient way to make these signals available as CloudWatch metrics?

    • AInstall and configure the CloudWatch agent on the instances to collect memory and disk metrics and publish them to a custom namespace. Correct
    • BEnable detailed monitoring on each EC2 instance so that memory and disk metrics are published at a one-minute frequency.
    • CRead the existing MemoryUtilization and DiskSpaceUtilization metrics that EC2 publishes by default to the AWS/EC2 namespace.
    • DCreate a CloudWatch metric filter over the instance system logs to extract memory and disk values into custom metrics.
    Recognise that EC2 publishes no memory or disk metrics by default and the CloudWatch agent is needed to collect in-guest signals. CloudWatch receives EC2 metrics from the hypervisor, which can see CPU, network and EBS activity but cannot see inside the guest operating system. Memory utilisation and free disk space are in-guest values, so they require the CloudWatch agent to read system counters and publish them as custom metrics before any alarm can use them.

    Why A is correct: The CloudWatch agent reads in-guest counters such as mem_used_percent and disk_free, publishing them as custom metrics that alarms can then evaluate, which is exactly the supported pattern.

    Why B is wrong: Detailed monitoring only raises the publishing frequency of the existing hypervisor metrics to one minute; it never adds in-guest memory or disk usage, so the required signals are still missing.

    Why C is wrong: It sounds right because CPU is there by default, but EC2 publishes no memory or disk-space metric to AWS/EC2; those values live inside the guest and are never collected automatically.

    Why D is wrong: Metric filters only run against log events already in CloudWatch Logs, and the default AMI does not log memory or disk usage, so there is nothing for the filter to match.

  2. Reliability and Business Continuity

    22% of exam

    What you must be able to do. Given availability and recovery requirements with a stated RTO and RPO, choose the scaling, load-balancing, Multi-AZ, backup and disaster-recovery design that meets the target with the least operational effort.

    In one sentenceThe continuity domain, tied at the top: scale to demand, spread across Availability Zones, automate backups with AWS Backup, and match the restore and disaster-recovery design to the named RTO and RPO.

    Recall check: answer these from memory first
    • Order the disaster-recovery strategies from cheapest and slowest to dearest and fastest, and say which one the exam picks for an RTO of a few minutes at the lowest steady cost.
    • An RPO of a few minutes is required for an Amazon RDS database. What restore capability meets it, and why does a nightly snapshot fail?
    • Which Auto Scaling health-check type replaces instances the load balancer reports unhealthy, and why is the default EC2 status check not enough?

    What it tests. Keeping the workload up and losing no data. Configuring EC2 Auto Scaling groups and scaling policies; scaling managed databases and adding caching with Amazon RDS and Aurora scaling, DynamoDB capacity modes, Amazon ElastiCache and Amazon CloudFront; configuring and troubleshooting Elastic Load Balancing and Amazon Route 53 health checks; designing fault-tolerant systems with Multi-AZ deployments; automating snapshots and backups for EC2, RDS, EBS, S3 and DynamoDB with AWS Backup, backup plans and vaults; restoring to meet recovery time and recovery point objectives with point-in-time restore and versioning; and following disaster-recovery procedures across Regions from backup and restore through pilot light and warm standby.

    How to study it. Anchor on matching the design to the named RTO and RPO. Learn the disaster-recovery ladder by cost and recovery time: backup and restore is cheapest and slowest, pilot light keeps only the data layer running while the application tier stays off, warm standby runs a scaled-down but live full stack, and multi-site is active-active and fastest. Know that AWS Backup centralises scheduled, policy-driven backups across services through backup plans and vaults, and that point-in-time restore replays continuous backups to meet a recovery point of minutes where a nightly snapshot cannot. Fix the Auto Scaling health-check distinction: set the type to ELB so the group replaces instances the load balancer reports unhealthy, not just ones failing the EC2 status check. Keep Multi-AZ (in-Region high availability with a synchronous standby and automatic failover) separate from cross-Region replication (disaster recovery and lower global latency).

    Easy to confuse

    • Multi-AZ versus cross-Region. Multi-AZ is in-Region high availability with a synchronous standby and automatic failover, surviving an Availability Zone fault; cross-Region replication is for disaster recovery and lower global latency, surviving a Region fault. Match the scope of the failure the scenario names.
    • Pilot light versus warm standby. Pilot light keeps only the core data replicated and running while the application tier is switched off until failover; warm standby runs a scaled-down but fully functional copy that just needs scaling up. Warm standby has a lower RTO at higher steady cost; pilot light is cheaper but slower to recover.
    • AWS Backup versus manual snapshots. AWS Backup centralises scheduled, policy-driven backups across many services with backup plans, vaults and retention rules; manual snapshots are per-resource and ad hoc. Choose AWS Backup when central scheduling, cross-service coverage or compliance retention is required.
    • Point-in-time restore versus snapshot restore. Point-in-time restore replays continuous backups to any second within the retention window, meeting a recovery point of minutes; a snapshot restore only returns the state captured at the snapshot time. Use point-in-time restore when the RPO is minutes rather than a day.

    Worked example from the SOA-C03 bank

    Free sampleReliability and Business Continuitymedium

    A web tier runs in an EC2 Auto Scaling group, and operators want the group to keep average CPU utilisation across the fleet close to 50 percent, adding or removing instances automatically as traffic rises and falls throughout the day. They want the simplest policy that maintains this set point without them defining individual thresholds for each capacity step. Which scaling policy meets this requirement with the least ongoing tuning?

    • AA simple scaling policy that adds two instances whenever a CPU alarm breaches and then waits for a cooldown before evaluating again.
    • BA step scaling policy with several CPU alarm bands that each add a different number of instances as utilisation climbs higher.
    • CA scheduled scaling action that sets desired capacity higher during the day and lower at night based on the usual traffic curve.
    • DA target tracking scaling policy on the average CPU utilisation metric with the target value set to 50 percent. Correct
    Use a target tracking scaling policy to hold a metric at a chosen set point with the least manual threshold tuning. Target tracking works like a thermostat: you name a metric and a target value, and Auto Scaling provisions and manages the underlying CloudWatch alarms, computing the capacity changes needed to keep the metric near the target. This removes the per-band alarm and step design that simple and step scaling require, which is why it is the lowest-maintenance fit for a stable CPU set point.

    Why A is wrong: Simple scaling reacts to one alarm with a fixed change and a blocking cooldown, so it cannot hold a continuous set point and needs the team to hand-tune the threshold and step.

    Why B is wrong: Step scaling reacts faster than simple scaling but still forces operators to design and maintain every alarm band and step, which is exactly the per-threshold tuning they want to avoid.

    Why C is wrong: Scheduled scaling changes capacity on a clock and is blind to the live CPU metric, so it cannot track an actual utilisation set point as traffic varies unpredictably.

    Why D is correct: Target tracking creates and manages the CloudWatch alarms for you and adjusts capacity to hold the metric at the chosen set point, which is the lowest-effort way to keep CPU near 50 percent.

  3. Deployment, Provisioning, and Automation

    22% of exam

    What you must be able to do. Given a provisioning or automation scenario, choose the infrastructure-as-code, image, multi-account or Systems Manager approach that deploys and maintains resources repeatably with the least manual effort.

    In one sentenceThe automation domain, tied at the top: build images and stacks as code, detect and fix CloudFormation drift, share across accounts and Regions, and automate operations with Systems Manager and event-driven workflows.

    Recall check: answer these from memory first
    • A resource was changed in the console outside CloudFormation. Which feature reports exactly which managed resources no longer match the template, and why not a change set?
    • You must apply one template across fifty accounts in three Regions from a management account. What deploys it, and why is a plain stack insufficient?
    • Operators need shell access to private instances with no SSH keys, no bastion and no inbound ports. Which Systems Manager capability provides it, and what does it remove?

    What it tests. Provisioning and maintaining resources as code. Creating and maintaining AMIs and container images with Amazon EC2 Image Builder and Amazon ECR; building stacks with AWS CloudFormation and the AWS CDK using templates and change sets; identifying and remediating CloudFormation drift and deployment issues such as subnet sizing and permissions errors; sharing and provisioning across accounts and Regions with CloudFormation StackSets, AWS Resource Access Manager and AWS Service Catalog; implementing deployment strategies and using third-party tools such as Terraform and Git; automating operations with AWS Systems Manager Run Command, Patch Manager, State Manager, Parameter Store and Session Manager; and building event-driven automation with AWS Lambda, Amazon S3 Event Notifications, Amazon EventBridge and AWS Step Functions.

    How to study it. Learn the CloudFormation feature split cold, because the exam leans on it: drift detection reports resources changed outside CloudFormation, a change set previews how a submitted template would alter the stack before you execute it, StackSets deploy one template to many accounts and Regions from a management account, and a stack policy restricts which resources an update may modify. Map the Systems Manager toolset to its job: Patch Manager with maintenance windows for patching at scale, State Manager to enforce a desired state on a schedule, Run Command for an ad hoc command, Parameter Store for configuration, and Session Manager for an audited shell to private instances with no SSH keys, no bastion and no inbound ports. For event-driven automation, S3 Event Notifications and EventBridge invoke Lambda, and Step Functions orchestrates a multi-step workflow with built-in retry and error handling.

    Easy to confuse

    • CloudFormation drift detection versus a change set. Drift detection compares live resource state against the deployed template and reports out-of-band changes made outside CloudFormation; a change set previews how a newly submitted template would alter the stack before you execute it. Drift for manual console edits, change set for a planned update.
    • CloudFormation StackSets versus nested stacks. StackSets deploy a single template to many accounts and Regions from one operation; nested stacks compose reusable child templates within a single stack in one account. Use StackSets for multi-account and multi-Region rollout, nested stacks for modular composition.
    • Systems Manager Session Manager versus a bastion host. Session Manager opens an audited shell through the SSM agent and IAM with no open inbound ports and no SSH keys; a bastion host needs a public instance, key management and an open port. Session Manager removes that attack surface and the key sprawl.
    • State Manager versus Run Command. State Manager continuously enforces a defined desired state through associations on a schedule; Run Command runs a command once on demand. Use State Manager for ongoing configuration compliance and Run Command for a one-off action.

    Worked example from the SOA-C03 bank

    Free sampleDeployment, Provisioning, and Automationmedium

    A platform team rebuilds a hardened Amazon Linux golden AMI every time the upstream base image receives security patches, and they want the rebuild, the hardening steps, a smoke test and the production of a new AMI to run automatically on a schedule with no instance kept running between builds. Which service produces the new golden AMI in the most managed, repeatable way?

    • AKeep a long-running EC2 builder instance and use a Systems Manager State Manager association to reapply the hardening configuration to it whenever new patches are released.
    • BLaunch an instance from the base AMI on a schedule with Run Command, apply the hardening commands by hand-written scripts, then call CreateImage and terminate the instance manually each cycle.
    • CDefine an EC2 Image Builder pipeline with the base image, hardening and test components and a schedule, so each run builds, tests and outputs a new versioned AMI then tears the build instance down. Correct
    • DStore the hardening steps in a CloudFormation template and deploy a new stack on each patch release so the stack update bakes the configuration into a fresh machine image for the fleet.
    Use an EC2 Image Builder pipeline to build, test and output versioned golden AMIs automatically on a schedule with transient build instances. EC2 Image Builder runs a pipeline that launches a temporary build instance from a chosen base image, applies ordered build and test components, validates the result, registers a new versioned AMI and then terminates the build and test instances. Because the pipeline can run on a schedule or on a source-image change, the hardened golden AMI is rebuilt automatically with nothing left running between cycles, which manual Run Command scripting and State Manager enforcement cannot deliver as one managed flow.

    Why A is wrong: State Manager enforces configuration on a running instance and never outputs an AMI, so it keeps a builder alive between rebuilds and does not produce the new golden image the team needs.

    Why B is wrong: Run Command can drive the steps but the team must stitch together scheduling, image creation and cleanup themselves, which is the manual orchestration that a managed image pipeline removes.

    Why C is correct: Image Builder pipelines orchestrate build and test components on a transient instance, output a versioned AMI and clean up automatically, which matches the scheduled hands-off golden image rebuild exactly.

    Why D is wrong: CloudFormation provisions resources from a template but does not build or register an AMI from a base image, so it cannot bake a hardened golden image the way an image pipeline does.

  4. Security and Compliance

    16% of exam

    What you must be able to do. Given an access, governance or data-protection requirement, choose the IAM model, multi-account guardrail, encryption service and detection service that enforce least privilege and meet the compliance requirement.

    In one sentenceThe security-operations domain: least-privilege IAM, multi-account governance with Organizations and service control policies, encryption with KMS and ACM, and remediating findings from the detection services.

    Recall check: answer these from memory first
    • An EC2 application must read one S3 bucket and nothing else with no stored credentials. What grants it access, and why is a broad policy or an embedded access key wrong?
    • You must stop any account in an Organization from disabling CloudTrail regardless of its own IAM. What enforces that, and why can an IAM policy alone not?
    • Name the detection service for each: anomalous account activity, unpatched CVEs on instances, sensitive data sitting in S3, and resource configuration that drifts from a rule.

    What it tests. Operating security and compliance controls. Implementing IAM users, roles, policies, multi-factor authentication, federation and policy conditions for least privilege; troubleshooting and auditing access with AWS CloudTrail, IAM Access Analyzer and the IAM policy simulator; governing multiple accounts with AWS Organizations, service control policies and AWS Control Tower; enforcing compliance such as Region and service restrictions and remediating AWS Trusted Advisor security findings; protecting data with AWS KMS encryption at rest, AWS Certificate Manager for encryption in transit and AWS Secrets Manager for secret storage; and configuring and remediating findings from AWS Security Hub, Amazon GuardDuty, AWS Config, Amazon Inspector and Amazon Macie, with a data classification scheme.

    How to study it. Make least privilege the reflex: when a workload on EC2 or Lambda must reach another service, the answer is an IAM role scoped to the exact action and resource, never stored access keys and never a broad policy. Learn the guardrail split: a service control policy sets the maximum permissions available to accounts in an Organization but grants nothing on its own, AWS Control Tower stands up a governed landing zone, and a permission boundary caps a principal's maximum permissions. Map each detection service to the threat it answers: GuardDuty for threat detection from account and network logs, Inspector for software vulnerability and CVE scanning, Macie for finding sensitive data in S3, Config for resource-configuration compliance against rules, Security Hub for aggregating findings, and Trusted Advisor for best-practice checks. For data, KMS encrypts at rest, ACM provisions and manages TLS certificates for encryption in transit, and Secrets Manager stores and rotates secrets.

    Easy to confuse

    • Service control policy versus IAM policy. An SCP sets the maximum permissions available to accounts in an Organization but grants nothing on its own; an IAM policy grants permissions to a principal within those limits. An action must be allowed by both, so use an SCP to cap what a whole account can ever do regardless of its own IAM.
    • GuardDuty versus Inspector versus Macie. GuardDuty detects threats from account and network logs, Inspector scans workloads for software vulnerabilities and CVEs, and Macie discovers and classifies sensitive data in S3. One watches behaviour, one scans for vulnerabilities, and one finds sensitive data.
    • AWS KMS versus AWS Secrets Manager. KMS manages encryption keys and performs cryptographic operations; Secrets Manager stores, encrypts using KMS and rotates secret values such as database credentials. Use Secrets Manager for the secret itself with rotation, and KMS for the keys that protect data.
    • IAM Access Analyzer versus the policy simulator. Access Analyzer uses automated reasoning to flag resources shared outside the account or organisation through resource policies; the policy simulator tests whether a given policy would allow or deny a specific request. One finds external exposure, the other evaluates a hypothetical request.

    Worked example from the SOA-C03 bank

    Free sampleSecurity and Compliancemedium

    An application running on Amazon EC2 instances must read objects from a specific Amazon S3 bucket. The current code uses long-lived IAM user access keys stored in a configuration file on each instance. The operations team wants to remove the static keys and grant the access in the most secure, least-privilege way. Which change meets this requirement?

    • AGenerate a new IAM user with an S3 read-only managed policy and distribute its access keys to every instance through a shared encrypted configuration file.
    • BCreate an IAM role with a policy scoped to that bucket and attach it to the instances through an instance profile so they receive temporary credentials automatically. Correct
    • CEmbed the IAM user access keys in the application source code so they are version controlled and can be rotated whenever the build pipeline runs a new deployment.
    • DAttach the AdministratorAccess managed policy to the existing IAM user so the instances can reach the bucket without any further policy changes being required.
    Use an IAM role attached through an instance profile to give EC2 workloads scoped, automatically rotated temporary credentials instead of static access keys. An IAM role assumed through an instance profile lets EC2 obtain short-lived temporary credentials from the instance metadata service, so no long-lived keys live on the host and the credentials rotate automatically. Scoping the role policy to the single bucket satisfies least privilege, whereas distributing IAM user keys, hardcoding them, or granting administrator access all retain static credentials or excessive permissions.

    Why A is wrong: This keeps long-lived static keys on the hosts, which is the very risk the team wants to remove, and a broad read-only policy on all buckets is wider than the single bucket needs.

    Why B is correct: An IAM role delivered through an instance profile supplies automatically rotated temporary credentials and the scoped policy grants only the one bucket, removing static keys and meeting least privilege.

    Why C is wrong: Hardcoding access keys in source code exposes them in the repository and build artifacts and still relies on the long-lived keys the team is trying to eliminate.

    Why D is wrong: Administrator access grants far more than reading one bucket, violating least privilege, and it still depends on the static IAM user keys the team wants to remove.

  5. Networking and Content Delivery

    18% of exam

    What you must be able to do. Given a connectivity, DNS, content-delivery or troubleshooting requirement, choose the VPC, endpoint, Route 53, CloudFront or analysis approach that connects or diagnoses the workload correctly at the lowest cost.

    In one sentenceThe networking domain: build VPC connectivity, add private endpoints, configure Route 53 and CloudFront, and troubleshoot reachability from the logs and the path analysers.

    Recall check: answer these from memory first
    • Private-subnet instances must call the S3 and DynamoDB APIs with no NAT or internet gateway at the lowest per-request cost. Which endpoint type fits, and why not interface endpoints?
    • After a route-table and security-group change you must find which component blocks TCP 3306 without sending traffic. Which tool names the blocking hop, and why not flow logs?
    • A security group allows inbound 443 but the replies are not returning. Is the security group the cause, and how does that differ from a network ACL here?

    What it tests. Operating and troubleshooting the network. Configuring a VPC with subnets, route tables, network ACLs, security groups, NAT gateways and internet gateways; implementing private connectivity with AWS PrivateLink, VPC endpoints and AWS Transit Gateway and auditing network protection services such as AWS WAF, AWS Shield and AWS Network Firewall; configuring DNS and Amazon Route 53 routing policies, query logging and Route 53 Resolver; configuring content and service distribution with Amazon CloudFront and AWS Global Accelerator; troubleshooting VPC connectivity and interpreting VPC Flow Logs, Elastic Load Balancing access logs, AWS WAF logs and CloudFront logs; and remediating CloudFront caching issues and hybrid connectivity over Site-to-Site VPN and Transit Gateway.

    How to study it. Fix the stateful versus stateless split first: security groups are stateful and allow-only, attached to an elastic network interface, so return traffic is automatic; network ACLs are stateless and ordered, attached to a subnet, and evaluate inbound and outbound separately, so a missing outbound rule blocks replies. Learn the endpoint split: a gateway VPC endpoint is free and serves only S3 and DynamoDB through a route-table prefix list, while an interface endpoint (PrivateLink) places an elastic network interface, is billed per hour and per gigabyte, and serves most other services. Keep the three gateways straight: a NAT gateway gives private subnets outbound-only internet access, an internet gateway gives a public subnet two-way access, and an egress-only internet gateway does the same for IPv6 outbound. Know the Route 53 routing policies, and for troubleshooting reach for VPC Reachability Analyzer, which statically traces the path and names the blocking security group, network ACL or route without sending live traffic.

    Easy to confuse

    • Security group versus network ACL. A security group is stateful, allow-only and attached to an interface, so return traffic is allowed automatically; a network ACL is stateless, ordered, supports explicit deny and is attached to a subnet, evaluating inbound and outbound separately. A missing NACL outbound rule blocks replies; a security group never needs a return rule.
    • Gateway endpoint versus interface endpoint. A gateway VPC endpoint adds a route-table prefix list for Amazon S3 and DynamoDB at no charge; an interface endpoint built on AWS PrivateLink places an elastic network interface, is billed per hour and per gigabyte, and covers most other services. For S3 or DynamoDB at the lowest cost, the gateway endpoint is the answer.
    • NAT gateway versus internet gateway. A NAT gateway lets private-subnet instances reach the internet outbound only while staying unreachable inbound; an internet gateway gives a public subnet two-way internet through public IP addresses. Use a NAT gateway for private outbound access and an internet gateway for public-facing resources.
    • VPC Reachability Analyzer versus VPC Flow Logs. Reachability Analyzer statically analyses configuration and names the blocking security group, network ACL or route entry with no live traffic; flow logs only record traffic that actually occurs and never name a blocking rule. Use the analyser to test a path before sending any traffic.

    Worked example from the SOA-C03 bank

    Free sampleNetworking and Content Deliverymedium

    Application instances run in a private subnet and must download operating system patches from public package repositories on the internet. The instances must not be reachable from the internet, and inbound connections initiated from the internet must remain impossible. Which configuration provides the required outbound internet access while keeping the instances unreachable from outside?

    • AAttach an internet gateway to the VPC and add a route from the private subnet to the internet gateway so the instances can reach the package repositories directly.
    • BAdd a route in the private subnet pointing to a virtual private gateway so outbound package traffic leaves the VPC and returns through the same gateway path.
    • CPlace a NAT gateway in a public subnet and route the private subnet's internet-bound traffic to that NAT gateway, which forwards it out through the internet gateway. Correct
    • DCreate an interface VPC endpoint in the private subnet so outbound requests to the public package repositories travel over the AWS private network instead.
    Use a NAT gateway in a public subnet to give private subnet instances outbound internet access while blocking inbound connections from the internet. A NAT gateway performs source network address translation for traffic leaving a private subnet, so instances can initiate outbound connections to the internet through the internet gateway while remaining unaddressable from outside. Routing a private subnet directly to an internet gateway would expose the instances, a virtual private gateway only reaches private networks, and an interface endpoint reaches only specific AWS services, so none of those satisfies general outbound internet access with no inbound exposure.

    Why A is wrong: Routing a private subnet straight to an internet gateway makes those instances publicly addressable and reachable from the internet, which breaks the requirement that they stay unreachable from outside.

    Why B is wrong: A virtual private gateway connects a VPC to on-premises networks over VPN or Direct Connect, not to public internet repositories, so it does not give the instances the outbound internet path they need.

    Why C is correct: A NAT gateway in a public subnet lets private instances start outbound connections to the internet while preventing any inbound connections initiated from outside, which exactly meets the stated requirement.

    Why D is wrong: Interface endpoints reach specific AWS services privately, not arbitrary public package repositories on the internet, so they cannot provide the general outbound internet access the patch downloads require.

A study plan that works

  1. Map the blueprint and book a date

    Day 1

    Read the official AWS exam guide and the five domains with their weights. Book a provisional date now: a fixed date turns open-ended study into a plan and is the strongest predictor of actually sitting. Note that the three top domains, Monitoring and Remediation, Reliability and Business Continuity, and Deployment and Automation, are tied at the top and together are roughly two-thirds of the exam, so plan the heaviest study there.

  2. Build the operations decision maps

    Week 1

    Before drilling any domain, build the decision trees the whole exam rests on: the monitoring role split (metric filter, alarm, composite alarm, EventBridge, runbook), the disaster-recovery ladder against RTO and RPO, the CloudFormation feature split (drift detection, change set, StackSets), least privilege and the multi-account guardrails, and the networking stateful-versus-stateless and endpoint splits. Use the recall prompts in this guide: cover the answer, choose the service from the constraint, then reveal. If you cannot pick from the requirement alone, you do not own it yet.

  3. Go deep on monitoring, remediation and continuity (Domains 1 and 2)

    Weeks 1 to 3

    These two are tied at the top, so they get the most time. Drill the CloudWatch agent and metric-filter details, the alarm-to-runbook remediation chain with no human in the loop, and the disaster-recovery tiers matched to a named RTO and RPO. Practise on scenario questions and read the worked explanation on every one, including the ones you got right, watching for the named operational constraint that picks the answer.

  4. Lock provisioning and automation (Domain 3)

    Weeks 3 to 4

    Deployment and automation is reliable marks if you drill it as decision trees. Fix the CloudFormation feature split (drift detection, change set, StackSets, stack policy), the Systems Manager toolset (Patch Manager, State Manager, Run Command, Parameter Store, Session Manager) and event-driven automation with Lambda, EventBridge and Step Functions. Do the drift-versus-change-set and StackSets-versus-nested-stacks calls by hand until the constraint alone decides them.

  5. Cover security operations and networking (Domains 4 and 5)

    Week 4

    Security at sixteen percent and Networking at eighteen percent are learnable and dependable marks. Drill least privilege and the SCP guardrail, the detection-service map (GuardDuty, Inspector, Macie, Config, Security Hub), and on the network side the stateful-versus-stateless split, the gateway-versus-interface endpoint decision and Reachability Analyzer for path troubleshooting. Tie every choice back to the constraint named.

  6. Drill weak domains, then space the review

    Week 5

    Use your per-domain accuracy to attack the two domains dragging you down, not to re-read what you already know. Then space it: revisit each domain's recall prompts after a few days and again a week later. Spacing roughly doubles what sticks compared with cramming, and it is the cheapest gain available before the exam.

  7. Sit a timed mock and calibrate

    Weeks 5 to 6

    Take at least one full timed mock under exam conditions to rehearse pacing and the flag-and-return habit across the full question set in the time allowed. Treat the score as a per-domain readiness signal, not a single number, and review every missed question, naming the constraint you misread, before you book or sit.

Know when you're ready

Readiness for the AWS Certified CloudOps Engineer - Associate is a score on scenario questions you have not seen before, not a feeling that the services are familiar. Those are different things, and the gap between them is where people fail. Re-reading the docs builds fluency, and fluency feels like knowledge, so confidence rises while real recall does not. The fix is to test yourself: if you can read a fresh operational scenario, name the constraint, and pick the right service or setting while explaining why each other option is wrong, you know it; if you can only nod along to an explanation, you do not yet.

Be especially wary of early confidence on the service map. Knowing what CloudWatch, Systems Manager, CloudFormation and the managed databases each do is the easy half; choosing between them under an automation, recovery, security or cost constraint, when two of them would work, is the half the exam actually tests. Trust your measured per-domain accuracy over your gut, and set the bar at clearing every domain comfortably on unseen questions across more than one session, not scraping a single pass on the marked pass score.

This guide gives you the map. The practice bank is where you find out whether you can navigate it, with a worked explanation and a reason every distractor is wrong on every question. Readiness scoring tells you when you are there. Not before.

Ready to put this into practice?

Free SOA-C03 questions with worked explanations. No sign-up.

Practise SOA-C03 free

Exam-day tips

  • Read the scenario for its operational constraint first. The automation, recovery, security or cost limit named in the question is what picks the answer, so find it before you judge the options.
  • When two services both work, default to the managed, automated, least-overhead one. AWS prefers managed automation; reach for the manual option only when the scenario names a reason such as an engine or tool to preserve.
  • For remediation that needs no human, choose the automated chain. An alarm or EventBridge rule invoking a Systems Manager runbook or Lambda beats an SNS email to on-call whenever the requirement is to fix it automatically.
  • Remember EC2 publishes no memory or disk metric by default. When a question needs in-guest signals, the CloudWatch agent is the answer, not detailed monitoring, which only changes frequency.
  • Match the disaster-recovery tier to the stated RTO and RPO. A few-minute RTO at low steady cost points to pilot light or warm standby; relaxed targets allow backup and restore; a few-minute RPO needs point-in-time restore, not a nightly snapshot.
  • Treat broad permissions as a wrong answer. Any option granting a wildcard policy or embedding long-lived keys is the trap; least privilege through a scoped IAM role almost always wins the security questions.
  • Flag and move on. Cover every question once before you spend time on a hard one; collecting the clear marks first protects the ones you actually know within the time limit.

Frequently asked questions

Is the AWS Certified CloudOps Engineer - Associate the same as the old SysOps Administrator exam?

It is the successor. AWS renamed the certification from SysOps Administrator - Associate to CloudOps Engineer - Associate and re-versioned it from SOA-C02 to SOA-C03. The structure changed from six domains to five, the standalone cost-optimisation domain was folded away, and the hands-on exam labs were removed, so SOA-C03 is multiple choice and multiple response only.

Is the SOA-C03 hard?

It is an associate-level exam, and the difficulty is operational judgement rather than recall. Most questions are scenarios where several AWS services could work and only one fits the stated monitoring, remediation, recovery, security or networking constraint. Scenario practice with worked explanations matters far more than memorising what each service does.

How long should I study for the SOA-C03?

Most candidates with around a year of hands-on AWS operations experience are ready in five to seven weeks of steady study. Less hands-on exposure means more time on the three top domains, Monitoring and Remediation, Reliability and Business Continuity, and Deployment and Automation, which together are about two-thirds of the exam.

Which domains should I focus on?

Monitoring, Logging, Analysis, Remediation, and Performance Optimization, Reliability and Business Continuity, and Deployment, Provisioning, and Automation are each weighted twenty-two percent and tied at the top, so they deserve the most time. Networking and Content Delivery at eighteen percent and Security and Compliance at sixteen percent are smaller but still dependable marks.

How should I think about automated remediation questions?

When a scenario says a problem must be fixed with no operator action, the answer is an automated chain: a CloudWatch alarm or an EventBridge rule invoking a Systems Manager Automation runbook or an AWS Lambda function. An option that only sends an SNS notification to on-call still needs a human, so it is the trap when the requirement is automatic remediation.

What is the difference between CloudFormation drift detection and a change set?

Drift detection compares the live state of a stack's resources against the deployed template and reports anything changed outside CloudFormation, such as a console edit during an incident. A change set previews how a newly submitted template would alter the stack before you execute it. Drift answers what changed out of band; a change set answers what a planned update would do.

How many practice questions should I do before booking?

Enough that every domain clears comfortably on questions you have not seen, and a full timed mock feels comfortable on pacing. Quality of review beats raw volume: on every question, read the explanation and name the constraint that picked the answer, including on the ones you got right.

Is the AWS CloudOps Engineer Associate certification worth it?

SOA-C03 is worth it for cloud operations engineers, SysOps administrators, and DevOps practitioners whose daily work involves monitoring, patching, automating, and maintaining AWS workloads. The certification covers the operational side of AWS that the SAA-C03 does not go deep on - Systems Manager, CloudWatch, patch baselines, deployment automation, and networking troubleshooting - so it complements rather than duplicates the architect track. Those already holding SAA-C03 and moving into an operations-focused role will find the preparation directly relevant to tasks they encounter in practice.

Examworthy is not affiliated with or endorsed by Amazon Web Services. This guide is original study material based on the public exam blueprint. We never reproduce live exam items. SOA-C03 and related marks belong to their respective owners.