Rocket-fast embedded TypeScript for MakeCode Arcade
When we began developing Microsoft MakeCode, a computing education platform, it was all about making programming easier, more engaging, and just plain friendlier. After all, if we were going to inspire the next generation of coders, easier entry into the world of computer science would be important. To accomplish this, we designed a high-level programming language with text and graphical input modalities along with a simple yet powerful set of programming libraries. Today, the language, Static TypeScript (STS), is used in all seven MakeCode editors, and its ability to meet the demands of programs requiring higher levels of performance is on full display in the latest addition to the platform, MakeCode Arcade.
With Arcade, STS lets developers of all skill levels easily write cool retro-style pixelated games—think Super Mario Bros., but even better because it’s designed by the user—to be run either inside a virtual game console in the browser or on inexpensive microcontroller-based handhelds.
Throwing back with MakeCode Arcade
While memory usage requirements initially guided the design of the MakeCode runtime system and slow execution time isn’t generally a significant problem for most MakeCode editors, execution performance became paramount with Arcade. Arcade sports a 160 × 120 pixel screen with a palette of 16 colors, a four-channel retro-appropriate sound synthesizer, and six game buttons plus menu and reset. Arcade hardware runs on MCUs with around 100 KB of RAM running at around 100 megahertz.
While the console specification—save for the speed—harks back to the 1980s, Arcade provides programmers with decidedly 2010s high-level, easy-to-use APIs. Simple games can be put together in just a few lines of code or a few blocks snapped together depending on whether you’re using the text or block editor. From there, games can be made progressively more involved—all the way to custom AI-run opponents.
To provide enough performance while still allowing a high-level of abstraction, MakeCode compiles a subset of TypeScript directly to Arm Thumb machine code, all locally in the MakeCode web application, so no installation is required. After the first load, the MakeCode web app runs fully offline, as it doesn’t rely on cloud service for compilation. Programs are then transferred using a familiar USB flash drive mechanism or directly with the new WebUSB standard.
Compilation and runtime system
The compiler is quite simple as far as compilers go. It uses a simple stack-based model for temporary values, and there’s no register allocation. In fact, there are no optimizations to speak of except for a simple peep-hole pass on the resulting assembly code. All values are represented uniformly as 32-bit words.
Most other values are pointers to objects either in the garbage-collected heap or read-only flash (MCUs usually have about four times as much flash as RAM, so it’s important not to copy read-only data to RAM). The first word of such objects points to a virtual table, and the subsequent words contain user-defined data fields. The v-table contains runtime type information and pointers to various methods of the object, laid out based on class hierarchy, as well as a list of properties of the object and the memory addresses where they reside.
Following regular TypeScript, the compiler doesn’t generate any runtime type checks on casts. Instead, checks are generated upon usage. A property access is compiled depending on the static type of the accessed object: For classes, it’s a simple memory lookup. For interfaces, we consult the hashed list of properties in the v-table, which is twice as slow, and for dynamic key-value mappings, we use a linear lookup, which is around six times slower (typically, this is the only technique used in embedded interpreters).
Objects—including classes, dynamic maps, binary buffers, and function closures—are allocated on a garbage-collected heap. The garbage collection (GC) algorithm is a simple, nonmoving—yet precise—mark-and-sweep. We previously used a reference counting allocator; however, given the complexity of control flow, including between concurrent threads of execution; the simplicity of the compiler; and the fact that all values are potentially reference counted, including numbers, we had to update reference counts on every function call. Switching to GC improved speed by a factor of two.
The generated code uses a C++ runtime, which is compiled once in the cloud, and the generated code is linked to it. C++ functions can declare typical signatures with integers, bools, and pointers, and appropriate conversion code is inserted by the compiler at calls. If the functions don’t store them, the runtime objects coming in as arguments require no special treatment. If the pointers to them are stored, they need to be registered with the GC.
The compiler automatically generates TypeScript function declarations from the C++ function prototypes with a specific annotation. Such annotations can also include instructions about how these are to be exposed as blocks.
Performance and what it means moving forward
Our compiler exhibits quite good performance. On benchmarks involving heavy object access, it’s at least 20 times faster than MicroPython or Duktape and less than two times slower than V8. On other benchmarks, such as floating point or heavy array access, it’s still two to eight times faster than interpreters. We’ve also implemented a custom byte-code interpreter for potential use on Xbox and iOS, which don’t allow dynamic code generation. It uses the same memory layout as the Arm back end and is about five times slower, though still much faster than other interpreters. This points to both the compiled nature and the memory layouts as causes of good performance. Complex MakeCode Arcade games run at about 30 frames per second on Arm Cortex-M4 MCUs. Extrapolating from our benchmarks, interpreters running the same code would run at an unplayable few frames per second.
We see statically typed languages such as STS playing an important role in the future of IoT, allowing embedded devices to run more efficiently—faster and, as a result, with reduced energy requirements—as well as providing programmers who aren’t embedded developers with an easier, higher-level alternative to the lower-level languages generally used to program these devices. But for now, we’re excited for kids just learning to code and advanced developers alike to have some fun creating games they want to play and share.