Microsoft Certified:

Azure Data Scientist Associate

Azure Data Scientists apply Azure’s machine learning techniques to train, evaluate, and deploy models that solve business problems.

azure associate image

Required exams

Image of Exam-DP-100

Exam DP-100: Designing and Implementing a Data Science Solution on Azure


Skills and knowledge

Candidates who earn an Azure Data Scientist certification are verified by Microsoft to have the following skills and knowledge.

Select development environment
  • assess the deployment environment constraints
  • analyze and recommend tools that meet system requirements
  • select the development environment
Set up development environment
  • create an Azure data science environment
  • configure data science work environments
Quantify the business problem
  • define technical success metrics
  • quantify risks
Transform data into usable datasets
  • develop data structures
  • design a data sampling strategy
  • design the data preparation flow
Perform Exploratory Data Analysis (EDA)
  • review visual analytics data to discover patterns and determine next steps
  • identify anomalies, outliers, and other data inconsistencies
  • create descriptive statistics for a dataset
Cleanse and transform data
  • resolve anomalies, outliers, and other data inconsistencies
  • standardize data formats
  • set the granularity for data
Perform feature extraction
  • perform feature extraction algorithms on numerical data
  • perform feature extraction algorithms on non-numerical data
  • scale features
Perform feature selection
  • define the optimality criteria
  • apply feature selection algorithms
Select an algorithmic approach
  • determine appropriate performance metrics
  • implement appropriate algorithms
  • consider data preparation steps that are specific to the selected algorithms
Split datasets
  • determine ideal split based on the nature of the data
  • determine number of splits
  • determine relative size of splits
  • ensure splits are balanced
Identify data imbalances
  • resample a dataset to impose balance
  • adjust performance metric to resolve imbalances
  • implement penalization
Train the model
  • select early stopping criteria
  • tune hyper-parameters
Evaluate model performance
  • score models against evaluation metrics
  • implement cross-validation
  • identify and address overfitting
  • identify root cause of performance results