Identifying Foodborne Illness and Sanitation Frequencies from Customer-Generated Reviews Using Business Analytics



Journal Title

Journal ISSN

Volume Title


Abstract is an industry-leading crowd-sourced review forum where customers, nicknamed “Yelpers,” post reviews of businesses and rate their satisfaction on a scale of 1 to 5. This project utilized a Yelp dataset containing 231381 individual consumer reviews from 969 restaurants in the Greater Houston area. The objectives of this project were to (1) determine the frequency of terms relating to foodborne disease and restaurant cleanliness issues in customer reviews and (2) relate the frequency of these terms with individual customer satisfaction ratings. It was hypothesized that frequency of words in the dictionaries within a review would have an inverse relationship with consumer satisfaction. Two dictionaries were developed for use in text mining, one pertaining to foodborne disease and one pertaining to restaurant filthiness. Foodborne disease terms were chosen based on an exploratory frequency measurement of terms from reviews and foodborne disease symptoms and filthiness terms were chosen based on typical cleanliness issues, including pest presence and unclean facilities. A significant number of reviews contained keywords and an inverse relationship between frequency and customer satisfaction was proven. Going forward, the recorded frequencies of cleanliness and foodborne disease terms will be analyzed alongside several additional parameters present in the Yelp dataset. This would allow for further regressions which may link increased food poisoning and filth risks with lower income neighborhoods, different periods of the year, and different types of restaurants.