ACM TOMM 2017 Best Paper: Let AI take care of the complicated and professional work of graphic layout design for you
The amount of today’s rich media content is unprecedented. Every second, people are creating and sharing enormous amounts of information, and particularly content that includes complex image and text information. Among these contents, the combination of text and images has become mainstream. In the process of content creation, a great challenge that people face is how to design an eye-catching layout when dealing with a variety of textual and graphic information (for example, magazine covers, posters, or PowerPoint presentations). This issue is of great importance, whether we are talking about commercial printing, online journals and magazines, or user-generated content. The layout of graphic content involves a great amount of professional knowledge, including visual communication, information art design, color and aesthetics, plane planning, geometric composition, etc. Previous graphic layout design work not only requires designers who have a wealth of professional knowledge, but also requires a great amount of labor. Enabling computers to automatically create a layout based on the graphic contents is a very difficult problem.
Starting from late 2013, researchers from Microsoft Research Asia and art design specialists from Tsinghua University’s Academy of Arts and Design have come together to work collaboratively in this field where science merges with art. They have combined the aesthetic principles of design with computable image features, creatively proposing a prototype for a computable, automatic composition framework. This prototype, through the optimization of a series of key issues (for example, the weight of text in a photo, the weight of visual space, the factor of color harmony in psychology, the importance of information in visual cognition and semantic understanding, etc.), integrated the knowledge of experts in the field of visual presentation, textual semantics, design principles, and cognitive comprehension, etc., into the same multimedia computing framework, and created a new research direction: automatic design of visual text layout.
Figure 1: The graphic layout effect automatically generated using an algorithm. Note: The original input was a pure image (without any text) and a plain text (such as a title or subtitle), and the output was a mixture of images and text (text embedded in the image).
This research carried out the mathematical expression of general aesthetic perceptions to construct a graphic layout design template library related to the topic, and proposed a prototype framework for the synthesis of computable graphics, both merging top-down aesthetic perception at the macroscopic level, and including bottom-up graphic features at the micro-level. Through the integration of face detection, text detection, and visual saliency detection algorithms, this research is the first to propose a visual attention detection algorithm, which could help form the importance map and attention map for the entire image. As to the algorithm dealing with text layout, this paper formulate the interactive process between the shape of text block and dominant image as an energy optimization problem:
is the cost of the text intrusion into the dominant image in Figure 1, i.e. minimizing the cross between text and important visual objects; represents the waste of free visual space, i.e. the full use of the available visual space in the image to maximize the effect of the text; and represents the mismatch between information importance in perception () and semantics (), the importance of visual perception, i.e., matching the most important text content with the most important visual areas of the image to allow quick access to critical information when reading. The energy-optimized solution process, under the supervision of the aesthetic principles of the design template, allows the final solution to be consistent with the visual aesthetic requirements and to not just be the computer’s optimal solution.
After laying out the text space, the overall harmony of the colors is maintained in the color harmony optimization framework through the analysis of the foreground and background colors of the image, and the difference between the text and background colors are maximized, both to honor the color harmony of the original image in the final layout, and to ensure the readability of the text throughout the layout. The calculation of the overall color harmony is based on the psychological color model proposed in the famous “Color Harmonization” paper, and combines the model preference of the foreground and background theme colors – as mentioned in this paper – under different themes to find the most suitable theme color for the entire layout. Targeting at maximizing local visual contrasts, the paper proposes Hue/Tone Golden Ratio Sampling, which is to map the dominant color of background image covered by texts to the tone and hue space, and identify the golden ratio between the local background tone and the farthest opposite saturation and value. Through this framework, the automation of the entire graphic design process can be completed under the supervision of aesthetic perception.
Figure 2 System Framework Diagram
The system presented in this paper allows users to upload visual background images of specific topics and certain text phrases. The original image is processed in the second stage, obtaining the visual perception map by combining the significant values and maps of the face, text, and attention of the eyes to reconfigure the size of the image in relation to the target layout, and keeping the important areas based on the visual perception map. The resized image can then be used to arrange the spatial distribution of the layout template. After the image is adjusted, the existing phrases, spatial layout, and text are superimposed on the background image through the energy optimization process in the third stage. In the fourth phase of text coloring, the color palette of the cropped image is first analyzed, and at the same time, the theme color is chosen based on the theme properties. Using a specific hue model, palette, textual color, and content characteristics, you can re-color the text by maintaining the color harmony and readability of a certain section.
Figure 3: The typography procedure: (a) the visual importance map (in gray) with gaze attention (in yellow); (b) the selected template from the top-5 ranked templates; (c) the input texts waits; (d) the details of the typography procedure, where the energy defined as E(L) will be minimized in a sub-optimized solution by controlling front height iteratively (e.g., “Coverlines”); (e) the typography result with bottom-up image features and top-down spatial layout constraints.
Figure 4: Color analysis and optimization diagram
This paper has been widely followed by the academic community after its publication and has been downloaded more than 260 times from the ACM database since 2016. This research not only has important theoretical significance, but also has a wide range of application value. For example, the image content-based color detection algorithm proposed in this paper has already been applied in Office Sway, an actual product. Currently, more than 400,000 users from over 60 countries around the world are using the new Office Sway product to conduct design work.
This paper presents the in-depth merging of several different disciplines, including multimedia studies, art design, and color psychology, and demonstrates the use of artificial intelligence in art design. It could be said that the color psychology model has opened a new window for multimedia design, and aesthetic design thinking has given wings of imagination to multimedia analysis.
Visit this website to read the paper: https://www.microsoft.com/en-us/research/publication/automatic-generation-of-visual-textual-presentation-layout/
Authors of the research paper
- Xuyong Yang, joint Ph.D. student at Microsoft Research Asia and the University of Science and Technology of China, and founder of Weicheche
- Tao Mei, senior researcher at Microsoft Research Asia, ACM Distinguished Member, and IAPR Fellow
- Yingqing Xu, Microsoft Research Asia alumni and head of the Department of Information Art and Design at Tsinghua University’s Academy of Arts and Design
- Yong Rui, Microsoft Research Asia alumni and CTO of Lenovo
- Shipeng Li, Microsoft Research Asia alumni and CTO of IngDan
In addition, we give special thanks to the collaborators of this paper: Ph.D. candidate Yue Wu from the the University of Science and Technology of China, and Junjie Yu, graduate student at Tsinghua University’s Academy of Arts and Design.