7 Tips For Passing the AWS Certified Data Analytics Speciality Exam in 2021

I recently passed the AWS Certified Data Analytics Specialty exam in July 2021. This is essentially Amazon’s data engineering certification, their closest equivalent to the Google Cloud Professional Data Engineer Certification Exam. This is a great exam for those who work in data engineering on AWS or who are interested in becoming data engineers.

The exam is tricky because there are relatively few resources when compared to more popular certification exams, such as the two AWS solutions architect certification exams. Here’s what I did to tackle it.

Tip #1 - The AWS Maniac Guide

I would recommend beginning your studies here:

The unofficial guide to AWS Certified Data Analytics Specialty Exam
Thanks to Wojciech Gawroński for doing an excellent job of summarizing the exam content and collecting resources. If anything, the guide encourages you to somewhat over prepare. This isn’t necessarily a bad thing, because the knowledge you take away from your preparation may ultimately be as valuable as the certification itself.

Tip #2 - Do You Need to Know Hadoop?

It is important to note that the AWS Certified Data Analytics Specialty is an evolution of the previous AWS Certified Big Data Specialty. One major change, noted by Gawroński and others, is that the new exam places much less emphasis on detailed engineering in Hadoop. The previous exam confronted test takers with questions on hdfs CLI commands, among other low level details. These questions no longer make an appearance. Test takers still need to understand the Hadoop ecosystem, especially as it relates to Amazon EMR (Elastic Map Reduce). You should know about the difference between HDFS and EMRFS, and how to decide between them for a specific scenario. They should also know about things like the Hive Metastore, and how it compares with the AWS Glue Data Catalog. 

More broadly, I did not see any questions about Apache Impala or Pig. Knowledge of Hive and its capabilities will help you, but you don’t need to understand low level configuration and architecture details. Unsurprisingly, there is an emphasis on Spark. As you will see in official AWS practice materials, it is useful to know about fundamental Spark primitives, such as data frames.

Gawroński mentions a Udemy course for the older big data exam (AWS Certified Big Data Specialty Certification Course). This now redirects to a course on the new exam (AWS Certified Data Analytics Certification Course). I did not use this course, but I would advise readers to be wary of wasting time on low level Hadoop details if the authors have not yet removed these.

Tip #3 - AWS Official Practice Questions

I found the official AWS practice questions and practice exam to be absolute gold.  The basic set of practice questions is available on the AWS exam site. You can buy a practice exam for $40 through your AWS Certmetrics account, where you register for the certification exam. You may also have a coupon for a free practice exam if you previously passed an AWS certification exam. The practice exam is worth the price of admission - do not skimp here. The questions are very similar to the free ones, but more is better, especially given that these questions are the closest you’ll get to the live exam.

Work through the free questions and the practice exam under a time limit, then go back and understand each question carefully. I would suggest reading about everything mentioned in each question and its answers. These questions are fairly representative of what is on the actual exam, both in content and style. Repeatedly work through the logical process of eliminating answers and choosing a final answer.

Be cautious in using non-official practice questions found in Udemy courses or other prep materials. These questions often don’t match the style of the official exam, they sometimes emphasize information that simply isn’t that important, and I find some of the answers highly suspect. Unofficial questions are good for motivating your study, but emphasize the official questions even though they are very limited in number.

Tip #4 - How to Choose Answers

Regarding question style, here are some techniques I use to look for the correct answer. Rather than simply reading the multiple choice responses sequentially, learn to jump around and look for parallels. Often, two responses will have a large chunk of word-for-word identical text. Try quickly jumping between responses to identify these parallels, then study the differences. Is there something technically incorrect in one of the responses? Is it a matter of best practices? Read the requirements in the question statement extremely carefully. Does the question tell you to prioritize low cost? Low operational overhead? Latency?

Tip #5 - Know Cloud Best Practices Specific to AWS

Also, be aware that while the questions may seem subjective, they are subjective in a very particular way. Familiarity with AWS culture will help you to spot the preferred answer. For instance, given the choice between AWS Glue and Spark on EMR, Glue is nearly always the right answer barring some specific requirement that excludes it; in general, favor managed services. Understanding other aspects of AWS best practices and culture from blog posts, the AWS Well Architected Framework, white papers and general AWS exposure will serve you well.

Incidentally, you should be aware that best practices and culture can be quite different between clouds. Following GCP best practices could cause you to get questions wrong on the AWS exam. This presents challenges if you frequently work across multiple clouds; make sure that you’re immersed in the right headspace before your exam attempt.

Let me also add that you don’t necessarily need to have another AWS cert before taking this exam, but it could help you. Passing the AWS Associate Solutions Architect exam means having a reasonable knowledge of core AWS services used in data engineering, such as S3. You also need to have a good grounding in basic cloud concepts, such as zones and regions. On the other hand, I found the exam to be very light on many core architecture concepts such as networking.

Tip #6 - Read the FAQs

This bit of advice is again drawn from the AWS Maniac guide, but I’m calling it out separately because it was so valuable: Read the AWS service FAQs cited in the guide. These are a wealth of focused  technical information in a highly distilled format; they pull out the details that aid in solving architecture problems, exactly the kinds of problems that you face on the exam. The FAQs also contain specification details that you should memorize. (For example, the AWS Kinesis Data Streams FAQs tell you that messages are limited to 1 MB. Can you identify a practice question where this is useful?)

One word of caution: these FAQs appear to be almost append only. New information about recent developments (EMR Studio, EMR on EKS) is constantly being added, but some of the content is quite dated. (Twitter Storm is an emerging framework? Really?) The answers won’t necessarily mislead you, but you can skip information that is less relevant to the current ecosystem.

Tip #7 - Advice for Remote Testing

Also, a few remarks about remote testing, something that Gawroński also discusses. I’ve found that the testing environment at testing centers can be awful. The air conditioning failed on a hot summer day when I was taking an AWS certification exam. Not fun.

I appreciate the flexibility of at-home, remote testing, and the ability to test in a comfortable environment. Having said that, I had to constantly worry about noise from my hallway triggering an exam failure, and the Pearson app showed a poor internet connection a few times, one of the most common causes of exam revocation. In addition, I tend to silently talk to myself when trying to think through problems, and the proctor threatened me with failure. I have never received a similar censure at a testing center.

In Closing - You Got This!

Okay, thanks for reading! You can do this! If you take the exam, drop me a line - I’m curious to hear about your experience, thoughts on the content and its utility, etc.


Matt Housley