Essays on Forecasting Online Shopping Searches
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
My dissertation consists of two essays, both leveraging data from Google Shopping Insights in developing models for forecasting online shopping searches, a critical precursor of sales. The first essay presents a novel Bass-type diffusion model for forecasting online shopping searches of rapid life cycle (RLC) products. The most important innovation of my model is that it allows for imitation/contagion effects to take place through dual channels: a ‘local’ channel of influence mainly through direct, in-person interactions (e.g., schoolmates, colleagues, neighbors) and a ‘national’ channel of influence mainly through social media (e.g., YouTube, Instagram, Facebook). To separate the effects of these two channels of influence, I leverage the fact that data from Google Shopping Insights is at the city level, which allows me to model consumer shopping searches in a particular month and city as a function of not only past searches in the city but also past searches in the rest of the country. The former is treated as a proxy for the amount of local influence, while the latter a proxy for the amount of national influence. Empirical estimates suggest that imitation/contagion can indeed take place through both local and national channels, with their relative importance varying across products. When influence from the national (local) channel dominates, the diffusion curve tends to be steeper (flatter), which provides support for the idea that imitation/contagion through social media, compared to in-person interactions, shortens the product life cycle. Another important feature of my model is that it allows for rapid decay in the influence of prior adopters. Empirical estimates show that, for most RLC products, the influence of prior adopters drops drastically after just one month. The second essay develops a “Big Data” solution to the so-called ‘cold start’ problem in forecasting, where insufficient longitudinal information prevents one from extrapolating historical patterns into the future with standard time series methods. The innovation of my solution is to mitigate the cold start problem by compensating for the lack of longitudinal data with the abundance of data from a large number of cities and products in Google Shopping Insights that can serve as training samples. My solution adopts a fusion of multiple methods for identifying similar products, and then leverages the spatiotemporal patterns of those similar products in the holdout period to forecast city-level growth in online shopping searches for the focal product. Extensive empirical comparisons suggest that my solution outperforms the benchmarks. Furthermore, I find that a bigger training sample is not always better: a gradual increase in the size of the training sample first improves and then counterintuitively reduces forecasting performance. I attribute this finding to the fact that the incremental predictive value of additional training data diminishes as the sample size increases, while the proportion of noise remains. As a result, methods commonly used for variable/feature selection fail to remove the added noises, resulting in over-fitting and thus deteriorating forecasting performances. This finding cautions that, even in the “Big Data” era, all else being equal, the bigger the training data is not necessarily the better when it comes to forecasting demand growth for new products.