We introduce a framework for action-driven evolution of 3D indoor scenes, where the goal is to simulate how scenes are altered by human actions, and specifically, by object placements necessitated by the actions. To this end, we develop an action model with each type of action combining information about one or more human poses, one or more object categories, and spatial configurations of objects belonging to these categories which summarize the object-object and object-human relations for the action. Importantly, all these pieces of information are learned from annotated photos. Correlations between the learned actions are analyzed to guide the construction of an action graph. Starting with an initial 3D scene, we probabilistically sample a sequence of actions from the action graph to drive progressive scene evolution. Each action triggers appropriate object placements, based on object co-occurrences and spatial configurations learned for the action model. We show results of our scene evolution that lead to realistic and messy 3D scenes, as well as quantitative evaluations by user studies which compare our method to manual scene creation and state-of-the-art, data-driven methods, in terms of scene plausibility and naturalness.