You get what you measure: New NLU benchmarks for few-shot learning and robustness evaluation
Recent progress in natural language understanding (NLU) has been driven in part by the availability of large-scale benchmarks that provide an environment for researchers to test and measure the performance of AI models. Most of these benchmarks are designed for academic settings--typically datasets that feature…