I am interested in analyzing the effects of information embedded in unstructured data on consumer decisions and firm strategies. I believe that mining unstructured data (such as natural language and visual image) can provide valuable insights and implications for business research. Methodologically, I employ cutting-edge techniques in machine learning and deep learning to construct structural and sentiment measures for large-scale data, employ econometric methods to analyze their impact, and apply marketing and behavioral theories to understand the meaning of the results.

I have received extensive theoretical and methodological training across several disciplines. My doctoral major in marketing provides me with strong conceptual and empirical foundations. My doctoral minor in economics provides rigorous training in econometrics, microeconomics, and industrial organization. I also took courses and have worked with faculty in Management Information Systems and Sociology at the University of Arizona. These experiences, especially collaborating with faculty in other disciplines such as MIS, has provided my research with valuable breadth and depth. For instance, I am currently working with Professor Junming Yin in MIS on the development of statistical learning models.

My dissertation focuses on the effects of information that exists in different formats (text vs. image) on product performance and consumer behavior in different platforms (crowdfunding, online reviews, and video games). The data techniques I employed in the analysis include Convolutional Neural Network (CNN), Recurrent Neural Network (RNN), transfer learning, Support Vector Machine (SVM), and point process. The projects which I am working on are summarized below.


“Differentiation in Online Product Reviews: A Machine Learning Based Analysis (under review at Marketing Science)”

In online review platforms, numerical star ratings tend to converge as the number of reviews increase, limiting the space for new reviews and their reviewers to stand out. Review content could potentially provide an important space for differentiation. However, analyzing and understanding review content is challenging and requires advanced deep learning and machine learning skills. To address this challenge, I use three different machine learning and deep learning approaches: Support Vector Machine (SVM), Convolutional Neural Network (CNN), and Recurrent Neural Network – Long Short-Term Memory (RNN-LSTM), and compare their performance on the Yelp review data set. Among the three methods, CNN shows the highest performance for online review classification. Using the measures obtained from CNN, I investigate the evidence for differentiation in the content of reviews and I find strong support for its existence. Further, the results suggest that the degree of review differentiation is larger when more reviews were posted, or the star rating associated with the review is different from the aggregated star rating of previous reviews. It is important to note that these findings on review content are counter to the behavior found on star ratings. Conceptually, the results suggest two motivations for review content differentiation: standing out from the crowd and providing support for star ratings.


“Effects of Text and Image on Reward-Based Crowdfunding Performance”

While practical guides for entrepreneurs have discussed the effects of textual and visual elements on the performance of reward-based crowdfunding projects widely, these effects remain inconclusive in academic research. In this paper, I use a deep learning model, convolutional neural network (CNN), to obtain the numerical characteristics of text blurbs and cover images in reward-based crowdfunding project description. However, building a neural network from scratch requires large training data sets and significant amount of computational power. To overcome this challenge, I use the transfer learning technique, which allows me to use a neural network model that is pre-trained for a related task and then fine-tune it on my data set. Therefore, my model is able to achieve high performance despite limited training set and computational power. I then apply the obtained characteristics in the empirical model to estimate their effects on the eventual amount of pledged funding of reward-based crowdfunding projects. The results indicate that textual characteristics including readability and linguistic features, i.e. objectivity and positivity, and imagery characteristics, which focus on colors styles in this paper, as well as text-image joint effects, have significant impact on the performance of crowdfunding projects. Moreover, color studies from behavioral literature provide theoretical explanation to these results.


“Capturing Virtual Business Opportunities from Real-World Events”

Firms, especially video game companies and social network platforms, are selling more and more virtual goods to consumers. While previous literature focuses on the virtual world itself to investigate the motivation and behavior patterns of virtual good purchase, many Internet-enabled platforms actually have strong real-world backdrops. Therefore, it is interesting to study the virtual world and the real world as a whole. In this paper, I study how events in a real-world sports league impact consumers’ virtual good purchase behavior in a leading sports video game which imitates the sports league. I collect game statistics in the sports league and investigate their impact on virtual good purchase in the video game. I find that game statistics have significant and instant impact on consumers’ likelihood of making virtual good purchase in the video game. I also find that these effects are moderated by the characteristics of consumers. Conceptually, the results suggest two kinds of motivation behind virtual good purchases in sports video game: challenge-based and enjoyment-based. In addition to the econometric model, I propose Recurrent Multivariate Marked Hawkes Process (RMMHP) model as a cross-validation. As a combination and extension of Recurrent Neural Network and Hawkes Point Process, RMMHP predicts not only the occurrence, but also the volume of virtual good purchase.


“How to Write Research Papers That Have a Larger Impact”

This research I conduct with Nooshin Warren, Matt Farmer, and Caleb Warren examines why some consumer research papers lack impact. Theoretically, this paper proposes that the curse of knowledge, which refers to the phenomenon that experts in a field forget or ignore that their readers do not know as much about the topic as they do, leads to a series of writing practices that make the papers difficult to understand and thus being less cited. In order to empirically study the effects of the writing practices triggered by the curse of knowledge on the impact of consumer research papers, I use several text-mining techniques, including Regular Expression (RGEXP) and pattern matching, to quantitatively measure the usage of writing practices that are associated with the curse of knowledge, such as abstract language, passive voice, excessive focus on the literature, and a reliance on jargon. I investigate papers published in the Journal of Consumer Research between 2000 and 2010, and find that papers that use more passive voice, provide fewer examples, and rely on words that are less commonly used and more abstract, are less cited. Moreover, based on both theoretical discussion and empirical results, this paper proposes six strategies for consumer researchers to make the writing easier to understand.

Moving forward, I envision myself to further engage in the integration of big data analytics, customer insights, and marketing strategies to address substantive business problems. The insights derived will contribute to both academic research and business practice.