Compression is a fundamental goal of both human language and digital communication, yet natural language is very different from compression schemes employed by modern computers. We partly explain this difference using the fact that information theory generally assumes a common prior probability distribution shared by the encoder and decoder, whereas human communication has to be robust to the fact that a speaker and listener may have different prior beliefs about what a speaker may say. We model this information-theoretically using the following question: what type of compression scheme would be effective when the encoder and decoder have (boundedly) different prior probability distributions. The resulting compression scheme resembles natural language to a far greater extent than existing digital communication protocols. We also use information theory to justify why ambiguity is necessary for the purpose of compression.