NEW

Microsoft Certified:

Azure Data Scientist Associate

Azure Data Scientists apply Azure’s machine learning techniques to train, evaluate, and deploy models that solve business problems.

azure associate image

Required exams

Image of Exam-DP-100

Exam DP-100: Designing and Implementing a Data Science Solution on Azure

LEARN MORE

Skills and knowledge

Candidates who earn an Azure Data Scientist certification are verified by Microsoft to have the following skills and knowledge.

Select development environment
  • assess the deployment environment constraints
  • analyze and recommend tools that meet system requirements
  • select the development environment
Set up development environment
  • create an Azure data science environment
  • configure data science work environments
Quantify the business problem
  • define technical success metrics
  • quantify risks
Transform data into usable datasets
  • develop data structures
  • design a data sampling strategy
  • design the data preparation flow
Perform Exploratory Data Analysis (EDA)
  • review visual analytics data to discover patterns and determine next steps
  • identify anomalies, outliers, and other data inconsistencies
  • create descriptive statistics for a dataset
Cleanse and transform data
  • resolve anomalies, outliers, and other data inconsistencies
  • standardize data formats
  • set the granularity for data
Perform feature extraction
  • perform feature extraction algorithms on numerical data
  • perform feature extraction algorithms on non-numerical data
  • scale features
Perform feature selection
  • define the optimality criteria
  • apply feature selection algorithms
Select an algorithmic approach
  • determine appropriate performance metrics
  • implement appropriate algorithms
  • consider data preparation steps that are specific to the selected algorithms
Split datasets
  • determine ideal split based on the nature of the data
  • determine number of splits
  • determine relative size of splits
  • ensure splits are balanced
Identify data imbalances
  • resample a dataset to impose balance
  • adjust performance metric to resolve imbalances
  • implement penalization
Train the model
  • select early stopping criteria
  • tune hyper-parameters
Evaluate model performance
  • score models against evaluation metrics
  • implement cross-validation
  • identify and address overfitting
  • identify root cause of performance results