Domain Adaptation for Sentiment Classification in Light of Multiple Sources
Abstract
Sentiment classification is one of the most extensively studied problems in sentiment analysis, and supervised learning methods, which require labeled data for training, have been proven quite effective. However, supervised methods assume that the training domain and the testing domain share the same distribution; otherwise, accuracy drops dramatically. Although this does not pose problems when training data are readily available, in some circumstances, labeled data is quite expensive to acquire. For instance, if we want to detect sentiment from Tweets or Facebook comments, the only way to acquire is to manually label it, and this is prohibitively burdensome and time-consuming. In this paper, we propose a hybrid approach that integrates the sentiment information from source-domain labeled data and a set of preselected sentiment words to solve this problem. The experimental results suggest that our method statistically outperforms the state of the art and even, in some cases, surpasses the in-domain gold standard.

