The use of personalization techniques for structuring online interactions with customers is common today. In a world in which a large part of customer interactions are done in this manner, systematic evaluation of these methods is critical. In this paper we study the problem of evaluating online personalization, and point out the difficulties of a scientific evaluation from automatically tracked data gathered by most online firms. A factor that contributes to the difficulty is that conducting true experiments may often not be possible due to the potential costs in doing so. When such true experimentation is not possible we present a systematic approach for evaluation that is based on bringing domain knowledge explicitly into the process. The advantage of our approach is that it presents a systematic approach to evaluate these systems by making explicit the domain knowledge. There is indeed no free lunch, and the disadvantage of the approach is that it relies on the accuracy of such domain knowledge. More generally this paper suggests that there may be problems in how online personalization systems are evaluated, and argues for systematic approaches to this important problem.