This paper describes an approach to improving the reliability of a crowdsourced labeling task for which there is no objective right answer. Our approach focuses on three contingent elements of the labeling task: data quality, worker reliability, and task design. We describe how we developed and applied this framework to the task of labeling tweets according to their interestingness. We use in-task CAPTCHAs to identify unreliable workers, and measure inter-rater agreement to decide whether subtasks have objective or merely subjective answers.