Boost your exam-day confidence with an Exam Replay or an Exam Replay with Practice Test.


Microsoft logo

  • Published:
    January 3, 2017
  • Languages:
  • Audiences:
    Data scientists
  • Technology:
    Microsoft R Server, SQL R Services
  • Credit toward certification:

Analyzing Big Data with Microsoft R

* Pricing does not reflect any promotional offers or reduced pricing for Microsoft Imagine Academy program members, Microsoft Certified Trainers, and Microsoft Partner Network program members. Pricing is subject to change without notice. Pricing does not include applicable taxes. Please confirm exact pricing with the exam provider before registering to take an exam.

Effective May 1, 2017, the existing cancellation policy will be replaced in its entirety with the following policy: Cancelling or rescheduling your exam within 5 business days of your registered exam time is subject to a fee. Failing to show up for your exam appointment or not rescheduling or cancelling your appointment at least 24 hours prior to your scheduled appointment forfeits your entire exam fee.

Watch an Exam Prep session from Microsoft Ignite 2017

Skills measured

This exam measures your ability to accomplish the technical tasks listed below. View video tutorials about the variety of question types on Microsoft exams.

Please note that the questions may test on, but will not be limited to, the topics described in the bulleted text.

Do you have feedback about the relevance of the skills measured on this exam? Please send Microsoft your comments. All feedback will be reviewed and incorporated as appropriate while still maintaining the validity and reliability of the certification process. Note that Microsoft will not respond directly to your feedback. We appreciate your input in ensuring the quality of the Microsoft Certification program.

If you have concerns about specific questions on this exam, please submit an exam challenge.

If you have other questions or feedback about Microsoft Certification exams or about the certification program, registration, or promotions, please contact your Regional Service Center.

Read and explore big data
  • Read data with R Server
    • Read supported data file formats, such as text files, SAS, and SPSS; convert data to XDF format; identify trade-offs between XDF and flat text files; read data through Open Database Connectivity (ODBC) data sources; read in files from other file systems; use an internal data frame as a data source; process data from sources that cannot be read natively by R Server
  • Summarize data
    • Compute crosstabs and univariate statistics, choose when to use rxCrossTabs versus rxCube, integrate with open source technologies by using packages such as dplyrXdf, use group by functionality, create complex formulas to perform multiple tasks in one pass through the data, extract quantiles by using rxQuantile
  • Visualize data
    • Visualize in-memory data with base plotting functions and ggplot2; create custom visualizations with rxSummary and rxCube; visualize data with rxHistogram and rxLinePlot, including faceted plots
Process big data
  • Process data with rxDataStep
    • Subset rows of data, modify and create columns by using the Transforms argument, choose when to use on-the-fly transformations versus in-data transform trade-offs, handle missing values through filtering or replacement, generate a data frame or an XDF file, process dates (POSIXct, POSIXlt)
  • Perform complex transforms that use transform functions
    • Define a transform function; reshape data by using a transform function; use open source packages, such as lubridate; pass in values by using transformVars and transformEnvir; use internal .rx variables and functions for tasks, including cross-chunk communication
  • Manage data sets
    • Sort data in various orders, such as ascending and descending; use rxSort deduplication to remove duplicate values; merge data sources using rxMerge(); merge options and types; identify when alternatives to rxSort and rxMerge should be used
  • Process text using RML packages
    • Create features using RML functions, such as featurizeText(); create indicator variables and arrays using RML functions, such as categorical() and categoricalHash(); perform feature selection using RML functions
Build predictive models with ScaleR
  • Estimate linear models
    • Use rxLinMod, rxGlm, and rxLogit to estimate linear models; set the family for a generalized linear model by using functions such as rxTweedie; process data on the fly by using the appropriate arguments and functions, such as the F function and Transforms argument; weight observations through frequency or probability weights; choose between different types of automatic variable selections, such as greedy searches, repeated scoring, and byproduct of training; identify the impact of missing values during automatic variable selection
  • Build and use partitioning models
    • Use rxDTree, rxDForest, and rxBTrees to build partitioning models; adjust the weighting of false positives and misses by using loss; select parameters that affect bias and variance, such as pruning, learning rate, and tree depth; use as.rpart to interact with open source ecosystems
  • Generate predictions and residuals
    • Use rxPredict to generate predictions; perform parallel scoring using rxExec; generate different types of predictions, such as link and response scores for GLM, response, prob, and vote for rxDForest; generate different types of residuals, such as Usual, Pearson, and DBM
  • Evaluate models and tuning parameters
    • Summarize estimated models; run arbitrary code out of process, such as parallel parameter tuning by using rxExec; evaluate tree models by using RevoTreeView and rxVarImpPlot; calculate model evaluation metrics by using built-in functions; calculate model evaluation metrics and visualizations by using custom code, such as mean absolute percentage error and precision recall curves
  • Create additional models using RML packages
    • Build and use a One-Class Support Vector Machine, build and use linear and logistic regressions that use L1 and L2 regularization, build and use a decision tree by using FastTree, use FastTree as a recommender with ranking loss (NDCG), build and use a simple three-layer feed-forward neural network
Use R Server in different environments
  • Use different compute contexts to run R Server effectively
    • Change the compute context (rxHadoopMR, rxSpark, rxLocalseq, and rxLocalParallel); identify which compute context to use for different tasks; use different data source objects, depending on the context (RxOdbcData and RxTextData); identify and use appropriate data sources for different data sources and compute contexts (HDFS and SQL Server); debug processes across different compute contexts; identify use cases for RevoPemaR
  • Optimize tasks by using local compute contexts
    • Identify and execute tasks that can be run only in the local compute context, identify tasks that are more efficient to run in the local compute context, choose between rxLocalseq and rxLocalParallel, profile across different compute contexts
  • Perform in-database analytics by using SQL Server
    • Choose when to perform in-database versus out-of-database computations, identify limitations of in-database computations, use in-database versus out-of-database compute contexts appropriately, use stored procedures for data processing steps, serialize objects and write back to binary fields in a table, write tables, configure R to optimize SQL Server ( chunksize, numtasks, and computecontext), effectively communicate performance properties to SQL administrators and architects (SQL Server Profiler)
  • Implement analysis workflows in the Hadoop ecosystem and Spark
    • Use appropriate R Server functions in Spark; integrate with Hive, Pig, and Hadoop MapReduce; integrate with the Spark ecosystem of tools, such as SparklyR and SparkR; profile and tune across different compute contexts; use doRSR for parallelizing code that was written using open source foreach
  • Deploy predictive models to SQL Server and Azure Machine Learning
    • Deploy predictive models to SQL Server as a stored procedure, deploy an arbitrary function to Azure Machine Learning by using the AzureML R package, identify when to use DeployR

Who should take this exam?

Candidates for this exam are data scientists or analysts who process and analyze data sets larger than memory using R. Candidates should have experience with R, familiarity with data structures, familiarity with basic programming concepts (such as control flow and scope), and familiarity with writing and debugging R functions.

Candidates should be familiar with common statistical methods and data analysis best practices. Candidates should also have a high-level understanding of data platforms, such as the Hadoop ecosystem, SQL Server, and core T-SQL capabilities.

More information about exams

Preparing for an exam

We recommend that you review this exam preparation guide in its entirety and familiarize yourself with the resources on this website before you schedule your exam. See the Microsoft Certification exam overview for information about registration, videos of typical exam question formats, and other preparation resources. For information on exam policies and scoring, see the Microsoft Certification exam policies and FAQs.


This preparation guide is subject to change at any time without prior notice and at the sole discretion of Microsoft. Microsoft exams might include adaptive testing technology and simulation items. Microsoft does not identify the format in which exams are presented. Please use this preparation guide to prepare for the exam, regardless of its format. To help you prepare for this exam, Microsoft recommends that you have hands-on experience with the product and that you use the specified training resources. These training resources do not necessarily cover all topics listed in the "Skills measured" section.