Browsing the Aisles or Browsing the App? How Online Grocery Shopping is Changing What We Buy
Abstract
This paper investigates the systematic differences between online and offline grocery shopping baskets using data from approximately two million brick-and-mortar and Instacart trips. We apply unsupervised machine learning algorithms agnostic to the shopping channel to identify what constitutes a typical food shopping trip for each household. We find that food shopping basket variety is significantly lower for online shopping trips as measured by the number of unique food categories and items purchased. Within a given household, the Instacart baskets are more similar to each other as compared with offline baskets with twice as many overlapping items between successive trips to the same retailer. These results suggest a potential link between online grocery shopping environments and heightened consumer inertia, which may lead to stronger brand loyalty and pose challenges for new entrants in establishing a customer base. Furthermore, Instacart baskets have 13% fewer fresh vegetables and 5%–7% fewer impulse purchases, such as candy, bakery desserts, and savory snacks, which are not compensated for by alternative or additional shopping trips. We discuss the implications of these systematic shopping basket differences for competition, product management, retailers, consumers, and online platforms.
History: Catherine Tucker served as the senior editor.
Supplemental Material: The online appendix and data are available at https://doi.org/10.1287/mksc.2022.0292.
1. Introduction
Online shopping has significantly transformed the retail landscape with the grocery sector being among the most impacted in recent years. The COVID-19 pandemic further accelerated this shift, leading to a sharp rise in demand for online grocery services. In the United States, around 54% of households placed an online grocery order in March 2021, which was a 328% increase from August 2019 (Mercatus 2020, 2021). Instacart, the leading online grocery delivery service and the focus of this paper, also saw multifold year-on-year growth after the onset of the pandemic (Sorvino 2021). Experts predict that this trend is likely to continue in the postpandemic era as consumers become more accustomed to the convenience of online grocery shopping (Hussey 2021). Indeed, according to a recent post-COVID industry survey, consumers expressed sustained interest in purchasing groceries online (McKinsey 2022), and Instacart sales further increased by 39% in 2022 (Kang et al. 2023).
Over the past few decades, conventional brick-and-mortar (BM) grocery stores have optimized their physical space, including store layouts, shelf displays, and endcaps, all with the primary objective of maximizing revenue. In the digital sphere, the grocery shopping experience predominantly revolves around a mobile app–based interface, resulting in reduced reliance on physical store arrangements and increasing the importance of “digital” shelves. As the shift toward online grocery shopping continues, it is important to understand how this channel shapes consumer behavior. Therefore, our paper aims to investigate the systematic differences between online and brick-and-mortar shopping baskets.
To address this question, we leverage a large-scale panel data set that captures a household’s entire online and offline food shopping activity across multiple retailers. This comprehensive 360° view addresses a formidable data challenge in omnichannel research (Cui et al. 2021). We begin by examining the overall grocery shopping patterns across both online and offline channels for those households that use both. We find that the households that adopted online grocery shopping make grocery purchases from three distinct online retailers on average. Furthermore, households use Instacart services from one to two preferred retailers, typically coinciding with the retailers they frequent the most for their regular brick-and-mortar grocery shopping. Online shopping baskets generally consist of a higher proportion of bulk purchases and exhibit less diversity in their contents, suggesting that the choice of online channels may be primarily driven by convenience factors. Additionally, both Instacart and other online baskets show higher spending and a larger quantity of items bought in comparison with the average brick-and-mortar grocery baskets. Although these overall shopping patterns provide valuable insights, they mask considerable heterogeneity in trip characteristics within individual households. Our data consist of various types of shopping trips, including routine restocking of commonly used grocery items and intermittent visits to convenience stores or gas stations. To address trip heterogeneity, we utilize machine learning techniques to analyze the shopping patterns of each household with the goal of identifying their regular restocking grocery trips, which we refer to as “characteristic” trips. Our analysis demonstrates that our algorithm effectively distinguishes a plausible substitution of a brick-and-mortar characteristic trip with an Instacart trip.
When we restrict the comparison of Instacart and offline baskets, focusing solely on the characteristic trips, we document four important results. First, we find that, on average, online baskets exhibit significantly lower variety than brick-and-mortar baskets as measured by the number of unique food categories and items purchased. Specifically, we find that online basket variety is 9.6% lower at the category level and 14.1% lower at the item level. Second, we find that these Instacart trips are 27% more similar to each other than offline trips within the same household when comparing categories. A more granular item-level similarity analysis indicates that Instacart shopping trips have twice as much item overlap between any two successive trips within the same retailer. Additionally, we find that, although households with more experience using the online channel tend to show less pronounced differences in variety and similarity, the gap in these measures persists even among the most experienced users. One potential explanation for these differences in similarity may be online shopping platforms’ adoption of user-friendly interfaces that simplify the shopping process for consumers by recommending their previous purchases to them as they assemble their online baskets (IRI Worldwide 2020). Third, we document the systematic basket composition differences between characteristic online and brick-and-mortar baskets. Specifically, we find that Instacart shopping baskets have 13.6% fewer fresh vegetables compared with the brick-and-mortar shopping baskets. At the same time, we see fewer impulse purchase categories, such as candy (7.1% fewer), bakery desserts (5.9%), and savory snacks such as chips (4.7%). Fourth, we investigate whether households compensate for the fewer fresh vegetable and impulse purchases by modifying their shopping behavior via alternative or additional trips. We find no evidence of compensatory shopping behavior or any adjustments in eating out or using food delivery services, such as GrubHub or UberEats.
Our research contributions overlap with the proposed future research directions for offline–online retail research (Ratchford et al. 2022). Specifically, our work contributes to the stream of literature about omnichannel retail and the impact of digitization on shopping behaviors as well as the literature on inertial and habitual behavior.
1.1. Omnichannel Retail
Our research offers new insights about omnichannel strategies (Neslin 2022). Much of the past research in this area is focused on determining whether offline and online shopping help or hurt each other (Wang and Goldfarb 2017, Bell et al. 2018, Narang and Shankar 2019, Li 2020), differences in price sensitivity, and customer loyalty or search (Degeratu et al. 2000, Danaher et al. 2003, Chu et al. 2008) as well as strategies for omnichannel retail (Luo and Sun 2016, Ertekin et al. 2021). Furthermore, other research shows a positive effect of offline shopping on online channels (Wang and Goldfarb 2017, Bell et al. 2018, Li 2020). Our research, on the other hand, highlights the overall differences between offline and online shopping behaviors as well as differences in the shopping behavior when the primary goal for the shopper remains the same: to complete a routine grocery shopping trip.
1.2. Digitization and Shopping Behavior
The impact of digitization on shopping behavior yields mixed findings about its eventual effect on variety as some research finds that online channels (Brynjolfsson et al. 2011, Choi and Bell 2011, Zentner et al. 2013, Datta et al. 2018, Nagaraj and Reimers 2021, Donnelly et al. 2023), popularity information (Tucker and Zhang 2011), and recommendation systems (Fleder and Hosanagar 2009, Oestreicher-Singer and Sundararajan 2012, Li et al. 2022) increase consumption variety or consumption of niche products (and, consequently, welfare), although others find the opposite effect (Pozzi 2012, Holtz et al. 2020). Our findings are more consistent with the latter as we find that Instacart grocery purchases exhibit lower variety, and the past purchase shortcuts might be contributing toward creating filter bubbles and echo chambers in consumption patterns (Ge et al. 2020). Most related to us are studies that investigate the impact of online channels on grocery shopping behavior (Milkman et al. 2010, Huyghe et al. 2017, Harris-Lagoudakis 2022) in single retailer settings. We significantly depart from this past literature in terms of both our research design and key findings. Unlike the aforementioned studies that focus on one retailer in a prepandemic setting, our sample covers a time period with significantly higher online grocery delivery service penetration as well as households across all 50 states and 98,851 unique store locations (6,760 unique stores with Instacart delivery). Importantly, our empirical setting provides us a comprehensive picture of household grocery shopping behavior across multiple retailers, allowing us to evaluate potential compensatory behavior across other trips to fully understand a wider timeline of consumption behavior.
1.3. Inertial Purchasing Behavior
Past research on state dependence in consumption behavior is mature and provides many important insights about inertial behavior—stickiness in customer purchase or usage choices—stemming from customer loyalty or habit formation (Chintagunta 1998, Erdem and Sun 2001, Dubé et al. 2010, Bronnenberg et al. 2012). Understanding whether online channels might accelerate inertial behavior is particularly important as inertial shopping behavior is often used to justify various marketing interventions, including advertising (Freimer and Horsky 2012), free sample and freemium design (Bawa and Shoemaker 2004), product line decisions (Chintagunta 1998), and pricing promotions (Gupta et al. 1997). Indeed, Danaher et al. (2003) and Guo and Wang (2023) provide evidence that customers demonstrate higher brand loyalty in the online channel compared with offline. Although establishing the causal link between online grocery shopping and accelerated inertial behavior remains a topic for future research, our results indicate a noteworthy pattern: online baskets tend to exhibit greater similarity than offline ones within the same household. This suggests the possibility of distinct inertial patterns across trips across different channels and may have significant implications for competition as inertia can impede new entrants from establishing a customer base (Pozzi 2012, Bornstein 2020).
2. Empirical Setting
2.1. Data and Sample Construction
Our data come from Numerator, a market research company with a representative U.S. consumer panel. The company uses a popular mobile app to capture the images of receipts from BM stores uploaded by the panelists and proprietary methods to access online grocery purchases of the same panelists. Our data set includes information on each panelist and the panelist’s shopping trips, including the date of the trip, name of the retailer, channel type, total amount spent on an item, total quantity purchased, and category and department to which the item belongs. Numerator extracts this information via algorithms that classify items from receipts into standardized categories across all retailers. These categories are highly granular product groups such as “packaged cookies,” “yogurt & yogurt drinks,” and “stocks & broths.” One limitation of our data is that Numerator provided us with a subset of data limited to grocery sector (i.e., food) items. As a result, although we can observe all the food items and the overall basket totals, we do not observe nonfood items purchased during the same shopping trips.1
As Numerator relies on an algorithm to encode individual items in an uploaded receipt, there are additional limitations to the information that we can obtain at an item level. Although we can identify certain details, such as item ID, brand name, parent brand name, and category name, we are unable to observe other specifics such as package size or flavor, which traditional universal product codes in scanner data would typically provide. Additionally, an item ID assigned to BM receipts can only be linked to an item ID in Instacart receipts in 2.5% of cases. Because of these limitations, we are unable to calculate item-level price differences between Instacart and BM channels directly. Therefore, we conduct most of our analysis at the category level. Nevertheless, we carried out robustness tests that indicate pricing differences between the channels are unlikely to be the main explanation for our findings.
In addition to the shopping data, we also have some demographic information about the panelists. This information includes the income bracket of the household, age of the panelist, ethnic/racial background, education level, and household zip code. The descriptive statistics for these demographics are presented in Online Table A1.
Instacart is the dominant grocery delivery service in the online space with a market share of approximately 50% (Damiani 2020). Unlike retailer-specific online delivery services, such as Walmart.com and Kroger.com, Instacart is a third-party service that partners with more than 700 different retail chains. The grocery retail market is highly localized with prominent chains such as Stop & Shop in the Northeast, Publix in the South, and Safeway in the West. Focusing on Instacart purchases enables us to capture households’ typical shopping behavior for their regular grocery needs because most popular local retail chains do not typically offer regional-level alternative delivery services. Our data sample covers a comprehensive national selection of significant regional and local grocery retailers, allowing for greater generalizability of our findings regarding omnichannel basket composition and variety.
Figure 1 presents the Instacart sales index data from Numerator, in which January 2019 sales are used as the normalization benchmark at 100. This sales trend displays the distinctive e-commerce “COVID bump” pattern.2 Postbump sales are notably higher than prepandemic levels, indicating the continued increased utilization of grocery delivery services among U.S. households (Kang et al. 2023).
Our data set includes shopping trip information from 86,684 households between 2019 and 2021. However, not all panelists are active in all three years (see Online Appendix A for details). We are only able to observe Instacart sales for panelists who gave permission for Numerator to track their Instacart purchases, resulting in a sample of n = 4,388 households with at least one Instacart purchase. These households had a total of n = 1,968,392 distinct online and offline shopping trips during 2019–2021 and form the primary sample for our study. Although the permission sample limitation is a drawback of our data, we find that the sales patterns of this sample closely mirror overall Instacart sales patterns from other proprietary data sources, such as Earnest Research (see Online Figure A2). Additionally, our primary sample’s demographic composition is similar to that of the overall managed panel sample, which is designed to be representative of U.S. consumers (see Online Appendix A.3).
2.2. Road Map of Empirical Analysis
Broadly, our empirical analysis proceeds in two main steps as summarized in Table 1. In Section 3, we start by examining the overall grocery shopping patterns across both online and offline channels. The majority of the empirical investigation in this paper, covered in Section 4, is dedicated to understanding shopping patterns when households are likely substituting their regular BM trip with an online Instacart trip (as opposed to using online channels as supplementary shopping). We first introduce the method to identify such trips and then proceed to compare the variety, similarity, and composition of online versus offline characteristic baskets.
|
Road Map of Empirical Analysis
Section | Scope |
---|---|
3. All online versus all offline trips | Describe the general patterns of shopping behavior for online versus offline groceries |
4. Characteristic Instacart versus characteristic offline trips | |
4.1. Identifying characteristic trips | Introduce an algorithm that detects a likely substitution of an offline trip with an Instacart trip at a household level |
4.2. Basket variety | Show that the variety of Instacart baskets is systematically lower than that of offline baskets |
4.3. Basket similarities | Show that Instacart baskets within the same household exhibit higher similarity than offline baskets |
4.4. Basket composition | Show systematic differences in basket composition, such as lower produce and impulse purchases on Instacart compared with offline trips |
4.5. Compensation and adjustment | Demonstrate that households do not compensate for the differences in their online characteristic trips by altering their behavior in other shopping trips |
3. Comparison of All Online and Offline Purchases
Table 2 presents a summary of the shopping trip data for 4,388 households across BM, Instacart, and other online channels. The BM channel accounts for 87.2% of all expenditure-weighted trips, although Instacart and other online channels constitute 3.9% and 8.9%, respectively. The other online channels include Amazon.com (47.7% of all other online trips), Walmart.com (22.4%), Target.com (12.7%), Kroger.com (5.2%), and a long tail of other retailers, such as Sheetz.com (0.1%) and Wine.com (0.1%). On average, households spend less on food in the BM channel ($29.89) compared with Instacart ($50.46) and other online channels ($34.5) and make significantly more trips using the BM channel than the online channels. The BM channel trips include visits to various types and sizes of retailers, ranging from mass merchandisers, such as Sam’s Club, to traditional grocery stores, such as Kroger, as well as small convenience stores and gas stations. Overall, our sample covers 8,117 BM retailers with 98,209 unique store locations across 3,350 zip codes. One hundred thirty-four retailers with 6,760 unique store locations have serviced Instacart purchases.
|
Summary Statistics of All Shopping Trips Across Both Channels
Brick and mortar | Instacart | Other online | ||||
---|---|---|---|---|---|---|
Mean | Standard deviation | Mean | Standard deviation | Mean | Standard deviation | |
Grocery amount per trip, $ | 29.89 | 39.73 | 50.46 | 40.99 | 34.53 | 40.24 |
Number of unique grocery items bought per trip | 6.99 | 8.57 | 11.23 | 8.70 | 6.75 | 10.16 |
Number of grocery categories bought per trip | 5.06 | 5.41 | 8.17 | 5.52 | 4.80 | 6.36 |
Number of trips made by a household | 404.89 | 226.07 | 10.16 | 20.55 | 35.94 | 51.04 |
Average expenditure share by channel, % | 87.23 | 3.87 | 8.90 | |||
Percentage of trips to food, mass, club retailers with BM presence | 76.07 | 97.25 | 44.19 | |||
N retailer chains | 8,117 | 134 | 118 | |||
N unique store locations | 98,209 | 6,760 | 11,349 |
Notes. Food and mass retailers are typical retailers, such as Wegmans, Walmart, and Kroger, at which consumers usually shop for groceries. Club retailers include retailers such as Costco and Sam’s Club. The percentage of trips to food, mass, and club retailers does not include trips to dollar stores, drug stores, and gas stations.
Next, we examine the descriptive statistics associated with household shopping patterns across different retailers. Figure 2 illustrates the grocery spending patterns of two sample households in our sample, in which the size of each bubble represents the amount spent on food at a specific retailer. The blue bubbles indicate retailers for which the household made purchases using both BM and Instacart channels. As we note, we observe that, in many cases, the household’s top primary retailer is utilized via both BM and Instacart channels. Table 3 indicates that, during our sample period, an average household in our sample frequented approximately 25 distinct BM retailers, utilized Instacart with 1.7 retailers, and shopped online with three other online retailers. We designate as primary retailers the top two retailers for each household that command the highest grocery expenditure shares in each channel. We find that households, on average, spend 45.38% of their grocery expenditures at their primary BM retailer and 65.36% of their expenditures at their top two primary BM retailers. When it comes to Instacart purchases, household spending is even more concentrated among the primary retailers with an average of about 92% of Instacart purchases coming from the top two primary retailers.
|
Summary Statistics of Shopping Trips to the Primary Retailers by Channel Type
Brick & mortar | Instacart | Other online | ||||
---|---|---|---|---|---|---|
Mean | Standard deviation | Mean | Standard deviation | Mean | Standard deviation | |
Number of retailers per household | 25.26 | 11.02 | 1.70 | 1.32 | 3.08 | 2.03 |
Basket size, $ | ||||||
#1 retailer | 41.77 | 47.54 | 52.42 | 42.20 | 39.66 | 43.62 |
#2 retailer | 34.27 | 41.17 | 46.80 | 37.85 | 26.45 | 31.23 |
Number of trips | ||||||
#1 retailer | 131.48 | 102.27 | 7.55 | 14.69 | 24.23 | 37.09 |
#2 retailer | 70.57 | 61.09 | 4.64 | 7.25 | 10.47 | 22.43 |
Basket expenditure share, % | ||||||
#1 retailer | 45.38 | 77.14 | 77.40 | |||
#2 retailer | 19.98 | 14.66 | 22.31 |
Next, we conduct a comprehensive comparison of the variety of grocery items that are purchased online versus offline within a specific time frame. Here, we estimate the following specification:
The variable of interest, yit, represents the cumulative variety outcome for household I in time period t. To measure variety, we use two primary variables: (i) log(Unique Number of Categories), which is a metric also used in previous research (Haws et al. 2017), and (ii) log(Unique Number of Items). We calculate the number of unique categories and items purchased by a household within a given month, quarter, or year. On the right-hand side, the primary variable of interest is a dummy indicating whether the cumulative variety measure is for an online channel (or Instacart channel), represented by the indicator variable or (). We also include Household × TimePeriod or Household × TimePeriod × Retailer fixed effects. Therefore, the coefficient β measures the average cumulative variety differences purchased online versus offline across all households and all time periods.
Panel A of Table 4 presents the variety differences across BM and online baskets using Household × TimePeriod fixed effects. In panel B of the same table, we also report a specification that includes a triple interaction Household × Retailer × TimePeriod fixed effects to compare the broader variety measures of online versus BM shopping patterns within any given retailer, conditional on a retailer having presence in both online and BM channels. Note that this specification excludes purchases from Amazon.com (because there is no Amazon BM counterpart in our data), which account for the majority of online purchases. In panels C and D, we focus specifically on Instacart purchases and analyze the variety differences between all BM and Instacart trips.
|
Variety Effects Including All Online, All BM, and All Instacart Trips
(1) Number of categories | (2) Number of items | |||||
---|---|---|---|---|---|---|
Monthly | Quarterly | Yearly | Monthly | Quarterly | Yearly | |
Panel A. All online and all BM trips | ||||||
Online | −1.580*** | −1.918*** | −1.814*** | −1.960*** | −2.635*** | −3.050*** |
(0.018) | (0.016) | (0.015) | (0.023) | (0.021) | (0.022) | |
Number of observations | 189,796 | 72,081 | 20,716 | 189,796 | 72,081 | 20,716 |
R2 | 0.506 | 0.632 | 0.656 | 0.518 | 0.666 | 0.739 |
Panel B. All online and all BM trips with retailer fixed effects | ||||||
Online | −0.203*** | −0.561*** | −1.046*** | −0.207*** | −0.664*** | −1.392*** |
(0.015) | (0.013) | 0.012 | (0.017) | (0.016) | (0.016) | |
Number of observations | 791,010 | 449,912 | 208,724 | 791,010 | 449,912 | 208,724 |
R2 | 0.022 | 0.145 | 0.389 | 0.017 | 0.136 | 0.391 |
Panel C. All Instacart and all BM trips | ||||||
Instacart | −0.914*** | −1.516*** | −2.027*** | −1.195*** | −2.178*** | −3.383*** |
(0.020) | (0.017) | (0.016) | (0.025) | (0.024) | (0.022) | |
Number of observations | 150,041 | 54,712 | 16,917 | 150,041 | 54,712 | 16,917 |
R2 | 0.380 | 0.660 | 0.777 | 0.415 | 0.696 | 0.837 |
Panel D. All Instacart and all BM trips with retailer fixed effects | ||||||
Instacart | −0.239*** | −0.562*** | −1.070*** | −0.299*** | −0.728*** | −1.505*** |
(0.016) | (0.015) | (0.014) | (0.020) | (0.020) | (0.020) | |
Number of observations | 730,389 | 414,890 | 192,417 | 730,389 | 414,890 | 192,417 |
R2 | 0.044 | 0.184 | 0.440 | 0.049 | 0.195 | 0.459 |
Notes. This table reports the cumulative basket variety results for all BM and all online trips (panels A and B) as well as all BM and all Instacart trips (panels C and D)—not just the characteristic ones in a given time period. The outcome variable is log(Number of Categories) for column (1) and log(Number of Items) for column (2). For panels A and C, outcome variables are computed at the time period (Month/Quarter/Year) level, and for panels B and D, outcome variables are computed at the time period (month/quarter/year) × retailer level. Models in panels A and C use Household × Month/Quarter/Year fixed effects. Models in panels B and D use Household × Month/Quarter/Year × Retailer fixed effects. Clustered standard errors reported in parentheses.
***p < 0.001; **p < 0.01; *p < 0.05.
In summary, our results show that online purchases exhibit a significantly lower variety than those made in BM stores. For instance, when we analyze all shopping trips, the cumulative variety of items purchased within a month is 158% (91%) lower online (on Instacart) than offline. Focusing on retailers present in both online and offline domains, the variety decreases by 20% (24%) online (on Instacart).3
Finally, we also investigate whether online trips are more likely to involve bulk purchases or stockpiling trips. Our analysis reveals that all online (Instacart) trips are 4.6% (5%) more likely to include bulk purchases than BM trips to the same retailer. The detailed results of this analysis are provided in Online Table D6.
4. Comparison of Online and Offline Characteristic Trips
4.1. Identifying Characteristic Trips
As noted, we observe close to two million shopping trips for our sample of 4,388 households. These shopping trips comprise grocery trips in which a given household makes purchases that reflect regular restocking of commonly used household grocery items and nonregular (intermittent) shopping trips, such as fill-in trips or runs to convenience stores or gas stations. Our objective is to examine the shopping histories of each household to identify their routine restocking grocery trips irrespective of the channel. In other words, we aim to identify those Instacart trips that serve as likely substitutes for a BM trip. We refer to such trips as “characterisitc” trips.
There are various approaches to analyzing household-level shopping patterns, each with its advantages and disadvantages, depending on the empirical context. For example, Manchanda et al. (1999) propose a hierarchical Bayesian multicategory purchase incidence model, but such models become increasingly complex with numerous category combinations as in our setting. Ruiz et al. (2020) adopt the sequential choice probabilistic SHOPPER model, although Platzer and Reutterer (2016) and Reutterer et al. (2021) investigate household-specific interpurchase timing. Jindal et al. (2020) use clustering of shopping trips based on total dollars spent as well as the number of items and categories purchased to identify comparable trips. In contrast, our approach employs a combination of representation learning using embeddings and clustering techniques from machine learning to analyze the shopping behavior of each household and determine what constitutes a regular restocking shopping trip for that household. Our embedding-based approach is more flexible, easily scalable, and allows us not only to identify relevant similar online and offline grocery trips, but also generates vector representations for each observed trip, enabling us to evaluate trip similarities.4
We refer the reader to Online Appendix B for a detailed explanation of this methodology. Specifically, Online Appendix B.1 summarizes the technical details for our two-step approach, B.2 provides examples to build intuition, B.3 describes the different classification approaches we consider to establish the robustness and generalizability, B.4 presents a validation exercise with human coders, and B.5 provides technical details about identifying the optimal number of clusters for classification.
After classifying all trips into characteristic and noncharacteristic trips, we present summary statistics for each trip type to further demonstrate the distinction between them in terms of shopping objectives and contexts. Table 5 displays key summary statistics for shopping basket outcomes of interest for both trip types, although Figure 3 visually highlights the differences between them. Specifically, panel (a) shows density plots of the spacing (in days) between two successive trips made by a household. Characteristic trips have significantly longer spacing between successive trips compared with noncharacteristic trips (206% higher; t = 82.3, p < 0.000). This suggests that noncharacteristic trips may represent unplanned runs to the shops with a higher sense of urgency. This assertion is further supported by the data patterns depicted in panel (b), in which we see that the average number of items in a basket is significantly lower for noncharacteristic trips and the basket size of noncharacteristic trips does not vary with the total frequency of household trips. In contrast, the size of the basket for characteristic trips varies significantly with the frequency of trips: households that shop less frequently have larger characteristic baskets, and the number of items in the characteristic baskets decreases as households increase the shopping frequency. This might suggest that noncharacteristic trips satisfy some urgent needs of the household, although characteristic trips tend to satisfy the less urgent grocery needs that can be achieved with varying frequency.
|
Summary Statistics by Trip Type
Characteristic trip | Noncharacteristic trip | |||
---|---|---|---|---|
Mean | Standard deviation | Mean | Standard deviation | |
Grocery amount per trip, $ | 72.64 | 52.69 | 17.64 | 22.54 |
Number of items bought per trip | 23.93 | 17.05 | 4.92 | 6.05 |
Number of categories bought per trip | 12.50 | 6.39 | 2.81 | 2.20 |
Average spacing between trips, days | 10.56 | 5.39 | 3.45 | 1.91 |
Number of trips | 467,543 | 1,500,849 | ||
Percentage of trips to food, mass, club retailers | 90.15 | 65.12 |
Note. Total number of trips classified N = 1,968,392.
In Online Appendix B.4, we report the results of the validation exercise with human coders that confirm the effectiveness of our embeddings classification approach in identifying trips with different shopping objectives. Furthermore, in Online Appendix D.8, we conduct an analysis to evaluate the degree of substitutability between characteristic Instacart and BM trips and find that they are almost perfect substitutes. This analysis further validates our characteristic trip construct as our goal was to identify trips that could substitute for one another. Taken together, these findings provide compelling evidence that our classification methodology effectively differentiates between characteristic and noncharacteristic trips and successfully identifies Instacart and BM characteristic trips that serve as substitutes for one another.
4.2. Characteristic Basket Variety
Our first outcome of interest is the variety of characteristic baskets. We use the same two measures to proxy for variety as in Section 3: the number of distinct categories and of items in a given trip. Figure 4 shows the density plots of the number of unique categories and items for characteristic trips across the two channels. On average, Instacart characteristic baskets have fewer distinct categories (items), and the dispersion of the number of categories (items) is significantly lower than for BM trips.
Motivated by these apparent descriptive differences in basket varieties, we investigate these patterns in a systematic manner using the following econometric framework:
Here, yirt represents a basket variety for household i at retailer r on shopping occasion t. We consider two variety measures: (i) the number of distinct categories purchased and (ii) the number of distinct items purchased. On the right-hand side, the main variable of interest is whether any given shopping trip was made using Instacart, which is represented by the indicator . As such, β measures the systematic differences of Instacart characteristic baskets relative to BM characteristic baskets in the outcome variable of interest. For our baseline specification, we use two sets of highly granular fixed effects: (i) is the triple interaction for Household × Retailer × Quarter fixed effects, which controls for any household-, retailer-, and quarter-level unobserved differences, although (ii) represents the Household × Month fixed effects, which account for time-varying household consumption changes across different channels. In other words, these fixed effects ultimately allow us to compare Instacart baskets to BM baskets within a given household, retailer, and quarter combination, simultaneously controlling for household-specific seasonality effects.
In Figure 5, we present the specification chart for the estimates of β using log(Number of Categories) and log(Number of Items) as outcome variables. We consider seven different specifications, each with varying sets of fixed effects, resulting in a total of 14 estimates. This approach allows us to examine the sensitivity of our results to different model specifications.5 The baseline specifications (highlighted in blue and specified in Equation (2)) include Household × Retailer × Quarter and Household × Month fixed effects with standard errors clustered at the Household × Retailer level. The most restrictive specification includes the triple interaction fixed effects at a month level: Household × Retailer × Month.6
Overall, we find that Instacart characteristic baskets have a lower variety compared with BM baskets, and this finding is robust across all 14 specifications. Specifically, using our preferred specification in Equation (2), we find that the variety of Instacart baskets is around 9.6% (14.1%) lower than the variety of BM baskets when considering distinct categories (items). Furthermore, we find that the lower variety estimates range between 5.8% and 12.8% (13.7% and 17.5%) for distinct categories (items) depending on the classification and specification used. Of note, the embedding-based classification approaches provide more conservative and precise estimates than classification based on department totals as shown in Online Figure D1. Additionally, the variety differences between Instacart and BM trips are significantly larger when evaluating all trips (see Table 4) rather than just characteristic trips, indicating that our main specification results focused on characteristic trips provide a conservative lower bound estimate of the differences in basket variety.
Next, we investigate whether the diversity of Instacart baskets varies with a household’s experience with Instacart. In particular, we consider a specification that adds an interaction term with the number of Instacart trips (as a proxy for experience with Instacart) to Equation (2). We find that, as households utilize Instacart more frequently, the previously pronounced pattern of lower variety on Instacart becomes somewhat less prominent. One possible explanation is that households are more willing to consider a wider range of products as they become more experienced and familiar with the Instacart platform; however, even the most experienced Instacart users appear to exhibit significantly lower basket variety compared with their BM counterparts (see Online Table D4 for more details).
Finally, in Online Appendix D.3, we examine the heterogeneity in variety differences across observable household characteristics, such as income, education, and race. Our findings indicate that the online versus offline basket variety differences remain largely consistent across various household characteristics.
4.3. Characteristic Basket Similarity
Our next set of results pertains to comparing household-level basket similarity between Instacart versus BM shopping trips. To obtain category-level similarity measures, we use pairwise distances between different trips. Because all trips are represented in 100-dimensional space, we can calculate the Euclidean distance between any two characteristic trips. Using these distances, we calculate a similarity score for Instacart characteristic trips for each household by taking the average of the within-household pairwise distances between all combinations of such trips. Analogously, we compute a similarity score for characteristic trips made to BM stores and then compare the two scores. An important nuance of this approach is that there are significantly more BM characteristic trips in the data compared with Instacart characteristic trips. Therefore, differences in similarity scores between the two channels may be an artifact of the disparity in the number of trips. To address this issue, we randomly sample the same number of BM characteristic trips whenever the number of Instacart characteristic trips is lower than the number of BM characteristic trips.7 Finally, we evaluate mean differences in similarity scores among Instacart and BM baskets.
Figure 6(a) provides a visualization of the density plots of the average of within-household pairwise Euclidean distances of BM and Instacart characteristic trips for the households that have more than two Instacart characteristic trips. This graph shows that the distribution of Instacart pairwise distances is to the left of the distribution of BM pairwise distances, indicating that, within each household, the Instacart trips are significantly more similar to each other than the BM trips. Specifically, we find that the category-level within-household pairwise distances are 27.1% larger for BM shopping trips as compared with Instacart shopping trips (see Table 6).
|
Similarity Metrics by Characteristic Trip Type
Distance measure | Instacart | BM | p-value | Mean difference | 95% CI | |
---|---|---|---|---|---|---|
Lower | Upper | |||||
Euclidean distance (category level) | 7.531 | 9.576 | <0.0000 | 2.045 | 2.035 | 2.054 |
Jaccard similarity (retailer–item level) | 0.143 | 0.070 | <0.0000 | 0.073 | 0.066 | 0.079 |
Notes. Euclidean distance refers to the distance between two different baskets represented in the 100-dimension space using category-level embeddings. The Jaccard similarity (item level) refers to the similarity measure computed using the unique item IDs.
Next, we investigate the relationship between a household’s experience with Instacart and the similarity of their Instacart baskets. Figure 6(b) displays pairwise Euclidean distances for Instacart and BM trips for each household (y-axis) as a function of the number of Instacart trips a household has made (x-axis). We observe that, on average, the pairwise Euclidean distances of Instacart trips are lower than those of BM trips, and this pattern is mostly consistent across the range of experience with Instacart although the gap appears to narrow. To formally test this, we regress the relative gap in distances ((BM Distance − Instacart Distance)/BM Distance) on the number of Instacart trips. The results indicate that, although the gap slightly narrows with experience with Instacart, Instacart baskets remain significantly more similar to each other than BM baskets even among the most experienced Instacart users. See Online Table D5 for the detailed results.
As an alternative method of assessing the similarity of baskets, we utilize the Jaccard index to measure the extent of overlap between items purchased in any two successive trips to the same retailer. Specifically, we compute the Jaccard index between any two consecutive characteristic trips at the Retailer × Household × Channel level. We then calculate the average Jaccard index separately for BM and Instacart characteristic trips and perform a t-test. The results are reported in Table 6 and indicate that within-retailer similarity is, on average, twice as high for Instacart shopping trips than for BM shopping trips.
Overall, our findings suggest that, although Instacart characteristic trips have 9.6% (14.1%) lower variety in categories (items) compared with BM characteristic trips, they exhibit a 27.1% higher degree of similarity in terms of categories and more than a 100% higher degree of similarity in terms of item overlap across successive trips to the same retailer. These differences are somewhat mitigated by the household’s experience with Instacart but persist even for the most experienced users.
One plausible explanation for the differences in basket similarity could be potentially attributed to Instacart’s “buy it again” feature, which enables users to add previously purchased items to their current shopping cart. Intuitively, this feature might reduce a household’s incentive to explore the full online assortment during each shopping occasion, thereby contributing to the increased similarity of Instacart baskets. However, it’s important to note that our data and setting do not permit us to establish a definitive causal relationship between successive online basket similarities and this feature. Furthermore, identifying all mechanisms that may contribute to the observed similarity differences is beyond the scope of this paper. Nevertheless, we offer an informal discussion on other potentially relevant mechanisms in Online Appendix C.3.
4.4. Characteristic Basket Composition
Next, we examine the systematic differences in basket composition between BM and Instacart shopping trips by analyzing the variation in the number of items purchased within each category. As the Numerator categories are highly disaggregated, to increase the power of our tests, we aggregate similar categories and reduce the number of highly granular categories from 217 to 88. For example, we aggregate categories such as bacon, beef, lamb, pork, poultry, meat snacks, and sausage into a more general category called meat. For each resulting aggregated category, we construct a dependent variable in the form of and loop over these 88 categories to estimate Equation (2).
In Table 7, we present the 10 categories with the largest negative β estimates (in absolute magnitude) and the only four categories with significant positive β estimates from 88 regressions. Our results reveal that, spanning all categories, the most substantial difference—amounting to 13.6% fewer purchases in Instacart characteristic baskets—is observed in the fresh vegetables category. The next set of categories exhibits differences with noticeably lower magnitudes, and this set includes several items from the impulse purchase categories, such as candy, bakery desserts, and savory snacks such as chips. The observed differences in the fresh vegetable and impulse category purchases correspond to fewer purchases along both dimensions, namely, the intensive and extensive margins (see Online Appendix D.10). Additionally, we observe differences in fresh meat purchases, lower by about 4.3%, although these differences are partially offset by higher frozen meat purchases as reported in the “More” column of Table 7. The same column shows that all categories with systematically higher purchase incidences in Instacart baskets center around frozen foods. Contrary to the overall variety or similarity patterns in Sections 4.2 and 4.3, we find no evidence suggesting that these patterns vary with a household’s experience with Instacart in any of the categories in which differences in basket composition are identified.
|
Categories that Exhibit the Largest Differences in Instacart vs. Brick & Mortar Characteristic Baskets
Less | More | ||
---|---|---|---|
Category name | Estimate | Category name | Estimate |
Fresh vegetables | −0.136*** | Frozen meat | 0.025*** |
(0.016) | (0.006) | ||
Candy | −0.071*** | Frozen vegetables | 0.016* |
(0.007) | (0.006) | ||
Cheese | −0.061*** | Frozen breakfast | 0.014* |
(0.011) | (0.005) | ||
Bakery desserts | −0.059*** | Frozen fruit | 0.008** |
(0.008) | (0.002) | ||
Savory snacks | −0.047*** | ||
(0.009) | |||
Fresh meat | −0.043*** | ||
(0.012) | |||
Bread | −0.029*** | ||
(0.008) | |||
Sweet snacks | −0.028*** | ||
(0.007) | |||
Deli | −0.027*** | ||
(0.007) | |||
Pasta | −0.026*** | ||
(0.005) |
Notes. This table compiles the categories with the largest negative β estimates (in absolute magnitude) from 88 regressions, each for a different category, estimated using Equation (2). For each regression, the dependent variable is from that respective category. Clustered standard errors reported in the parentheses.
***p < 0.001; **p < 0.01; *p < 0.05.
Although it is outside the scope of this paper to document the underlying mechanisms explaining the differences in shopping basket composition, our informal conversations with Instacart consumers and Instacart data scientists give us some clues as to why we might observe these differences. For instance, the decrease in purchases of fresh vegetables in Instacart baskets could be attributed to consumers wanting to have greater control over the selection of these items as they may not trust the Instacart shopper to select the freshest options. Instacart is aware of this potential issue as they have specific policies to encourage its shoppers to “look for the freshest possible items and pay special attention to expiration dates, broken seals, and the quality of fresh produce” (Instacart 2021). The lower number of impulse category purchases, on the other hand, can potentially be explained by behavioral mechanisms related to self-control. For instance, it might be easier for shoppers to stick to their shopping lists and avoid impulse purchases when they are not exposed to the visual temptations of items in physical stores (Huyghe et al. 2017).
4.5. Compensation and Adjustment Beyond Characteristic Trips
So far, we have demonstrated that there are systematic differences in basket variety, similarity, and composition between BM and Instacart characteristic trips. Next, we investigate the extent to which households adjust their grocery shopping, dining out, and/or food delivery behavior outside the focal characteristic trips. Specifically, we examine whether households compensate for the differences in their characteristic trips by altering their behavior in other shopping trips or dining occasions. For example, do households purchase additional candy during their convenience store runs to make up for not buying candy on Instacart? Similarly, do they modify their dining out or food delivery behavior to compensate for changes in their grocery shopping behavior?
We begin by examining whether there are systematic differences in grocery shopping patterns in the seven-day window surrounding focal BM and Instacart characteristic trips (i.e., three days before, three days after, and other shopping episodes on the same day as the focal trip). We investigate adjustments in both the extensive and intensive margins. The extensive margin is measured by determining whether households make more noncharacteristic trips around Instacart characteristic trips than around BM characteristic trips. The intensive margin is measured by calculating the total spending in noncharacteristic trips to see whether households spend more to compensate for fewer purchases around Instacart shopping trips than around BM shopping trips. The results, presented in the first column of Table 8, suggest that households do not have a higher frequency or spending of noncharacteristic grocery trips.
|
Extensive and Intensive Margin Differences Around Instacart Trips
Noncharacteristic trips | Restaurant trips | Delivery trips | |
---|---|---|---|
Panel A. Extensive margin: by number of trips | |||
Instacart | −0.017* | 0.001 | 0.004 |
(0.008) | (0.007) | (0.003) | |
Number of observations | 434,322 | 434,322 | 434,322 |
R2 | 0.678 | 0.741 | 0.681 |
Panel B. Intensive margin: by dollar spend | |||
Instacart | −0.001 | −0.001 | 0.009 |
(0.011) | (0.012) | (0.008) | |
Number of observations | 434,322 | 434,322 | 434,322 |
R2 | 0.661 | 0.727 | 0.670 |
Notes. The dependent variable for extensive margin regressions is log(number of noncharacteristic trips) and for intensive margin is log(dollar spent in noncharacteristic trips). All models use Household × Retailer × Quarter and HH × Month fixed effects. Standard errors reported in parentheses.
***p < 0.001; **p < 0.01; *p < 0.05.
In addition to observing all grocery receipts, we also observe restaurant and food delivery trips and the total spending on those trips. To explore whether households’ shopping behavior toward restaurants/food delivery differs when they shop for groceries online, we analyze the data on the number and spending of these trips. On average, each household in our sample is observed to have 1.12 such trips per week. The results in the second and third columns of Table 8 indicate that there are no detectable differences in either the number or spending across restaurant or delivery trips.
Finally, we examine potential category-specific adjustments related to Instacart characteristic trips. Our analysis shows no evidence that households make up for the items they didn’t buy via Instacart by purchasing them through other channels. Specifically, our findings indicate that, when an average household replaces a BM shopping trip with an Instacart shopping trip, it purchases 13% fewer vegetables and 5%–6% fewer impulse purchases in that week.8 Online Appendix D.9 details our category-specific analysis and reports the results.
Overall, our results uncover no apparent patterns consistent with major compensatory adjustments at the extensive or intensive margins around Instacart characteristic trips.
4.6. Robustness
In this section, we provide brief details about each of the robustness checks and other analyses we implement, which are detailed in respective online appendices. Although these robustness checks help strengthen our main findings and demonstrate that they are not artifacts of potential confounders, we emphasize that establishing robustness does not imply causality. We interpret our main findings as descriptive evidence of systematic differences in basket variety, similarity, and content across online and offline channels.9
Specifically, in Online Appendix C.1, we conduct multiple robustness checks to ensure that our documented patterns are not artifacts of pandemic-induced changes in consumption behavior. In Online Appendix C.2, we demonstrate that systematic price discrepancies between Instacart and BM channels are unlikely to explain our main findings. Finally, in Online Appendix C.3, we provide an informal discussion outlining other potential explanations of our results.
5. Managerial and Policy Implications
Our research provides implications for retail management and sheds light on consumer shopping habits in the context of health and wellness. Furthermore, we highlight the potential of the online grocery channel to amplify the inertia in consumer behavior patterns, an aspect that warrants further exploration and carries significant implications for industry competition and brand management. We discuss each of these implications next.
First, we demonstrate that it is possible to predict which products are less (or more) likely to be purchased when households complete their routine grocery shopping online as opposed to in BM stores. With this information, both BM retailers and online grocery platforms can take steps to counteract the decline in sales in specific categories. For instance, if customers are purchasing fewer fresh produce items online because of concerns about the quality, the platform could provide training to shoppers to pick higher quality produce or select better replacements in the event of stockouts.
Second, our research carries significant implications for the nutritional content of consumers’ purchases as online baskets appear to contain fewer vegetables but, at the same time, fewer unhealthy items, such as junk food. These findings might help provide guidance to food assistance programs, such as SNAP, which could consider encouraging beneficiaries to utilize offline channels to redeem their benefits in order to increase their consumption of fresh produce. Moreover, being aware of whether recipients redeem their SNAP benefits online or offline can provide policymakers with more insights into the impact of these policies (Hinnosaar 2023). Retailers and online platforms can also play a role by promoting healthier food choices among online customers and enhancing access to nutritious ingredients for households residing in food deserts with limited availability of healthy food options. Furthermore, the emergence of CRISPR-edited produce, which has the capability to enhance the perishability and quality of fresh food items, may lead to increased consumer confidence in purchasing produce through online channels (Deng et al. 2023).
Third, our findings may have implications for how brands can cultivate customer loyalty in online shopping environments. We observe that online grocery baskets exhibit considerable similarity compared with offline baskets. Although further research is needed to explicitly explore the link between basket similarities and consumer inertia across channels, these findings hold important implications for brand management. Our results suggest that brands should prioritize establishing a presence in customers’ online baskets early on as overlooking this aspect may lead to the brand being disregarded in future purchases. In the online shopping environment, traditional in-store promotional tactics, such as product displays and samples, are not applicable. Therefore, brands need to reevaluate their approaches to introducing new products. To establish a brand presence in online shopping environments, promotional activities and generating awareness through paid search and premium placements on digital shelves will become increasingly important. We are already witnessing brands investing in these strategic efforts as evidenced by the significant growth in Instacart’s revenue from their internal advertising bidding platform, which nearly doubled from 2019 to 2022 (Anderson 2022).
Finally, the growing reliance on online grocery shopping has the potential to significantly impact competition within the grocery sector. Previous studies in marketing (Danaher et al. 2003, Guo and Wang 2023) show that customers tend to display higher brand loyalty in the online channel. Consequently, this heightened brand loyalty may present challenges for new entrants seeking to establish a customer base. This aligns with prior research on barriers to entry in the retail market, indicating that consumer inertia can create difficulties for newcomers, encourage larger incumbents to raise their prices, and favor production within larger firms with higher markups, simultaneously discouraging the emergence of smaller players (Pozzi 2012, Bornstein 2020).
6. Concluding Remarks
In this paper, we begin by outlining the general patterns of grocery shopping behavior in offline and online channels. Then, to gain insights into shopping patterns when households are likely substituting their regular offline trips with online Instacart trips, we identify characteristic shopping trips for each household and measure the systematic differences in basket variety, similarity, and composition across channels. We find that Instacart trips exhibit a lower basket variety compared with offline trips, and within households, Instacart baskets have higher similarity in food contents than offline baskets. Additionally, we examine systematic basket composition differences and find significant differences in the fresh vegetable and impulse purchase categories. Finally, we highlight that, when households substitute some of their characteristic offline restocking trips with Instacart orders, they do not compensate for the reduced purchases in the fresh vegetable and impulse categories through alternative grocery shopping or restaurant trips.
As with every study, ours has limitations that might present opportunities for future research. Although we observe household purchases, we do not observe actual consumption. A priori, the fewer purchases of the types of items that we document might have ambiguous effects on consumer welfare. On one hand, research shows that approximately 40% of food is wasted by U.S. households, and this percentage increases to 50% when considering fresh produce (Goldenberg 2016). Therefore, lower purchases of specific categories may help reduce household food waste. On the other hand, our findings draw attention to fewer purchases of categories that are unequivocally healthy (vegetables) and unequivocally unhealthy (impulse purchases). Although evaluating consumer welfare is beyond the scope of this paper, future studies could investigate whether lower purchases contribute to reduced waste or actually alter consumption behavior. Examining the healthiness of foods purchased online versus offline could also offer new substantive insights into the possibility of online food deserts and ultimately contribute to the ongoing debate about food deserts in general (Kolb 2021, Caoui et al. 2022).
Another important dimension of consumer welfare that we do not address in this paper is the trade-off between differences in basket variety/composition and time savings associated with online grocery shopping (Chintagunta et al. 2012). Unfortunately, our data does not contain information needed for calculating the distances between household residences and BM stores, which could serve as a useful proxy for time savings. Nonetheless, Huang and Bronnenberg (2022) provide a promising framework for examining these trade-offs in detail as they demonstrate potential consumer welfare gains with the e-commerce channel in the retail fashion industry.
Data limitations prevent us from formally modeling the differences in consumer consideration set formation, which is likely to vary across channels. Past research in this area mainly focuses on either offline or online channels (Chakravarti and Janiszewski 2003, Bronnenberg et al. 2023) but not both. This may be an important avenue for future research as our results suggest that online channels may accelerate consumer inertia and progression through the purchase funnel. Additionally, product availability may influence consideration sets, and future research could incorporate inventory-related information or shocks into product demand forecasts (Ergin et al. 2022, Levine and Seiler 2022).
Finally, our study highlights the potential of flexible and easily scalable embedding-based methods for revealing informative household shopping patterns. Future research can expand upon these machine learning techniques to improve demand estimation (Magnolfi et al. 2022) and forecasting (Doan et al. 2018).
The authors thank conference and seminar participants from the Canadian Centre for Health Economics, American Marketing Association Winter Conference, Vienna University of Economics and Business, Marketing Science Conference, National Bureau of Economics Research Digitization Tutorial, Temple University, Instacart Data Science Seminar, Erasmus University, McGill University, Virtual Digital Economy Seminar, Marketing Modelers Group in New York, Arizona State University, eNumerate 2021, Baltic Economic Association, and Cornell University as well as Anne Byrne, Daniel Hooker, Ben Leyden, Peggy Liu, Daniel McCarthy, Mike Palazzolo, Adithya Pattabhiramaiah, Davide Proserpio, Andrey Simonov, Anna Tuchman, Kosuke Uetake, Kenneth Wilbur, Alminas Zaldokas, and Xinrong Zhu for helpful comments and discussions. The author names are listed in alphabetical order. Instacart was not involved in this research. All authors certify that they have no affiliations with or involvement in any organization or entity with any financial interest or nonfinancial interest in the subject matter or materials discussed in this manuscript.
1 In this paper, we use the terms “grocery” and “food” interchangeably.
2 We refer readers to Oblander and McCarthy (2023) for a more comprehensive investigation of the impact of COVID on consumer behavior across various digital/mobile platforms, including Instacart.
3 We note that, when limiting the analysis to multichannel retailers, the assortment differences between the two channels are usually minimal. This likely stems from these retailers maintaining similar product ranges across their platforms to ensure a uniform customer experience and explains the magnitude differences between the results reported panels A versus C and panels B versus D in Table 4.
4 Based on our informal conversations with store managers, the application of embeddings to summarize high-dimensional information from shopping baskets is also starting to be used by in-house data analysts in some grocery retail chains (see, for example, Walmart Labs application in Mantha et al. 2020). Indeed, recent computer and data science literature shows several applications of embedding-based approaches for studying consumer behavior (see, for example, Entezari et al. 2021, Vijjali et al. 2022). In general, the use of embeddings remains somewhat nascent in marketing literature. The exceptions include using embeddings to characterize market structures (Gabel et al. 2019), predict consumer responses to marketing actions (Gabel and Timoshenko 2022), and study product-level competition (Chen et al. 2020, 2022).
5 For robustness, Online Table D3 also reports the estimates using count data (number of categories) as a dependent variable with Poisson regression. The point estimates and resulting interpretations are very similar.
6 Online Appendix D.2 presents the results of our sensitivity analysis, in which we estimate the same specifications using three different data samples from the years 2019, 2020, and 2021, respectively. This exercise helps demonstrate that our findings are robust across time. Online Figure D1 shows the specification plot of the estimates and Online Figure D2 reports estimates across all five classification approaches discussed in Online Appendix B.3.
7 We also use a method similar to bootstrapping to assess the robustness of our approach. We repeat the random draw process of N BM trips 1,000 times. We then compute the average of household-specific pairwise distances from 1,000 iterations. The result from Welch’s t-test for difference in means of Euclidean distances is quantitatively similar to the result without the bootstrapping procedure.
8 Because an average Instacart user employs Instacart for one out of every five regular grocery shopping trips (see Online Appendix C.1), a back-of-the-envelope estimate suggests that overall vegetable purchases by an average Instacart-adopting household are about 3% lower.
9 Even in an ideal randomized experiment setting, establishing causal effects may not be possible as randomizing households to forcibly use one channel over another would not be feasible.
References
- 2022) How will grocers react to Instacart’s expanded digital ad network? RetailWire. Accessed January 20, 2023, https://retailwire.com/discussion/how-will-grocers-react-to-instacarts-expanded-digital-ad-network/.Google Scholar (
- 2004) The effects of free sample promotions on incremental brand sales. Marketing Sci. 23(3):345–363.Link, Google Scholar (
- 2018) Offline showrooms in omnichannel retail: Demand and operational benefits. Management Sci. 64(4):1629–1651.Link, Google Scholar (
- 2020) Entry and profits in an aging economy: The role of consumer inertia. Working paper, The Wharton School, University of Pennsylvania, Philadelphia.Google Scholar (
- 2012) The evolution of brand preferences: Evidence from consumer migration. Amer. Econom. Rev. 102(6):2472–2508.Crossref, Google Scholar (
- 2023) Consumer time budgets and grocery shopping behavior. Management Sci., 70(3):1596–1612.Link, Google Scholar (
- 2011) Goodbye Pareto principle, hello long tail: The effect of search costs on the concentration of product sales. Management Sci. 57(8):1373–1386.Link, Google Scholar (
- 2022) The impact of dollar store expansion on local market structure and food access. Preprint, submitted July 25, https://dx.doi.org/10.2139/ssrn.4163102.Google Scholar (
- 2003) The influence of macro-level motives on consideration set composition in novel purchase situations. J. Consumer Res. 30(2):244–258.Crossref, Google Scholar (
- 2022) Product2vec: Understanding product-level competition using representation learning. Working paper, New York University Stern School of Business.Google Scholar (
- 2020) Studying product competition using representation learning. Proc. 43rd Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (Association for Computing Machinery, New York), 1261–1268.Google Scholar (
- 1998) Inertia and variety seeking in a model of brand-purchase timing. Marketing Sci. 17(3):253–270.Link, Google Scholar (
- 2012) Quantifying transaction costs in online/offline grocery channel choice. Marketing Sci. 31(1):96–114.Link, Google Scholar (
- 2011) Preference minorities and the internet. J. Marketing Res. 48(4):670–682.Crossref, Google Scholar (
- 2008) Research note—A comparison of within-household price sensitivity across online and offline channels. Marketing Sci. 27(2):283–299.Link, Google Scholar (
- 2021) Informational challenges in omnichannel marketing: Remedies and future research. J. Marketing 85(1):103–120.Crossref, Google Scholar (
- 2020) Instacart surges past Walmart in online grocery market. Forbes Online (June 9), https://www.forbes.com/sites/jessedamiani/2020/06/09/instacart-surges-past-walmart-in-online-grocery-market/?sh=2defa92b1972.Google Scholar (
- 2003) A comparison of online and offline consumer brand loyalty. Marketing Sci. 22(4):461–476.Link, Google Scholar (
- 2018) Changing their tune: How consumers’ adoption of online streaming affects music consumption and discovery. Marketing Sci. 37(1):5–21.Link, Google Scholar (
- 2000) Consumer choice behavior in online and traditional supermarkets: The effects of brand name, price, and other search attributes. Internat. J. Res. Marketing 17(1):55–78.Crossref, Google Scholar (
- 2023) Consumer acceptance of CRISPR-edited food products and implications for online grocery shopping. Working paper, Cornell SC Johnson College of Business, Ithaca, NY.Google Scholar (
- 2018) Generating realistic sequences of customer-level transactions for retail datasets. 2018 IEEE Internat. Conf. Data Mining Workshops (IEEE, Piscataway, NJ), 820–827.Google Scholar (
- 2023) Welfare effects of personalized rankings. Marketing Sci. 43(1):92–113.Link, Google Scholar (
- 2010) State dependence and alternative explanations for consumer inertia. RAND J. Econom. 41(3):417–445.Crossref, Google Scholar (
- 2021) Tensor-based complementary product recommendation. 2021 IEEE Internat. Conf. Big Data (IEEE, Piscataway, NJ), 409–415.Google Scholar (
- 2001) Testing for choice dynamics in panel data. J. Bus. Econom. Statist. 19(2):142–152.Crossref, Google Scholar (
- 2022) An empirical analysis of intra-firm product substitutability in fashion retailing. Production Oper. Management 31(2):607–621.Crossref, Google Scholar (
- 2021) Online-exclusive or hybrid? Channel merchandising strategies for ship-to-store implementation. Management Sci. 68(8):5828–5846.Link, Google Scholar (
- 2009) Blockbuster culture’s next rise or fall: The impact of recommender systems on sales diversity. Management Sci. 55(5):697–712.Link, Google Scholar (
- 2012) Periodic advertising pulsing in a competitive market. Marketing Sci. 31(4):637–648.Link, Google Scholar (
- 2022) Product choice with large assortments: A scalable deep-learning model. Management Sci. 68(3):1808–1827.Link, Google Scholar (
- 2019) P2v-map: Mapping market structures for large retail assortments. J. Marketing Res. 56(4):557–580.Crossref, Google Scholar (
- 2020) Understanding echo chambers in e-commerce recommender systems. Proc. 43rd Internat. ACM SIGIR Conf. Res. Development Inform. Retrieval (ACM, New York), 2261–2270.Google Scholar (
- 2016) Half of all US food produce is thrown away, new research suggests. The Guardian Online (July 13), https://www.theguardian.com/environment/2016/jul/13/us-food-waste-ugly-fruit-vegetables-perfect.Google Scholar (
- 2023) Will online shopping lead to more brand loyalty than offline shopping? The role of uncertainty avoidance. J. Marketing Res. 61(1):92–109.Crossref, Google Scholar (
- 1997) Household heterogeneity and state dependence in a model of purchase strings: Empirical results and managerial implications. Internat. J. Res. Marketing 14(4):341–357.Crossref, Google Scholar (
- 2022) Online shopping and the healthfulness of grocery purchases. Amer. J. Agricultural Econom. 104(3):1050–1076.Crossref, Google Scholar (
- 2017) Exploring the relationship between varieties of variety and weight loss: When more variety can help people lose weight. J. Marketing Res. 54(4):619–635.Crossref, Google Scholar (
- 2023) The persistence of healthy behaviors in food purchasing. Marketing Sci. 42(3):521–537.Link, Google Scholar (
- 2020) The engagement-diversity connection: Evidence from a field experiment on Spotify. Proc. 21st ACM Conf. Econom. Comput. (ACM, New York), 75–76.Google Scholar (
- 2022) Consumer transportation costs and the value of e-commerce: Evidence from the Dutch apparel industry. Marketing Sci. 42(5):984–1003.Link, Google Scholar (
- 2021) Browsing in the aisles has been replaced by browsing mobile apps | Jācapps. Accessed May 19, 2021, https://jacapps.com/browsing-in-the-aisles-has-been-replaced-by.-browsing-mobile-apps.Google Scholar (
- 2017) Clicks as a healthy alternative to bricks: How online grocery shopping reduces vice purchases. J. Marketing Res. 54(1):61–74.Crossref, Google Scholar (
Instacart (2021) Instacart help center. White paper, InstaCart, San Francisco, https://www.instacart.com/help/section/36000779 7972/360039569911.Google ScholarIRI Worldwide (2020) The changing shape of the CPG demand curve: E-commerce. White paper, Information Resources Inc., Chicago. https://www.iriworldwide.com/IRI/media/Library/COVID-19-Changing-Shape.-of-the-Demand-Curve-Part-6-7-29-20.pdf.Google Scholar- 2020) Marketing-mix response across retail formats: The role of shopping trip types. J. Marketing 84(2):114–132.Crossref, Google Scholar (
- 2023) Instacart’s revenue and profit climb ahead of public listing. The Wall Street Journal Online (February 28), https://www.wsj.com/articles/instacart-sees-revenue-profit-boost-ahead-of-public-listing-1d7891d.Google Scholar (
- 2021) Retail Inequality: Reframing the Food Desert Debate (University of California Press, Berkeley, CA).Google Scholar (
- 2022) Identifying state dependence in brand choice: Evidence from hurricanes. Marketing Sci. 42(5):934–957.Link, Google Scholar (
- 2020) Statistical inference for average treatment effects estimated by synthetic control methods. J. Amer. Statist. Assoc. 115(532):2068–2083.Crossref, Google Scholar (
- 2022) How do recommender systems lead to consumer purchases? A causal mediation analysis of a field experiment. Inform. Systems Res. 33(2):620–637.Link, Google Scholar (
- 2016) New product design under channel acceptance: Brick-and-mortar, online-exclusive, or brick-and-click. Production Oper. Management 25(12):2014–2034.Crossref, Google Scholar (
- 2022) Triplet embeddings for demand estimation. Preprint, submitted May 20, https://dx.doi.org/10.2139/ssrn.4113399.Google Scholar (
- 1999) The “shopping basket”: A model for multicategory purchase incidence decisions. Marketing Sci. 18(2):95–114.Link, Google Scholar (
- 2020) A large-scale deep architecture for personalized grocery basket recommendations. ICASSP 2020-2020 IEEE Internat. Conf. Acoustics Speech Signal Processing (IEEE, Piscataway, NJ), 3807–3811.Google Scholar (
McKinsey (2022) The next horizon for grocery e-commerce. Accessed May 31, 2023, https://www.mckinsey.com/industries/retail/our-insights/the-next-horizon-for-grocery-ecommerce-beyond-the-pandemic-bump#/.Google ScholarMercatus (2020) June 2020 online grocery scorecard: Growth in sales & HH penetration continues. Accessed May 19, 2021, https://www.brickmeetsclick.com/june-2020-online-grocery.-scorecard–growth-in-sales—hh-penetration-continues.Google ScholarMercatus (2021) Total U.S. online grocery sales for March 2021 up 43% vs. year ago. Accessed May 19, 2021, https://www.mercatus.com/newsroom/announcements/total-u-s-online-grocery-sales-for-march-2021-up-43-versus-year-ago.Google Scholar- 2010) I’ll have the ice cream soon and the vegetables later: A study of online grocery purchases and order lead time. Marketing Lett. 21(1):17–35.Crossref, Google Scholar (
- 2021) Digitization and the demand for physical works: Evidence from the Google books project. Preprint, submitted March 4, 2019, https://dx.doi.org/10.2139/ssrn.3339524.Google Scholar (
- 2019) Mobile app introduction and online and offline purchases and product returns. Marketing Sci. 38(5):756–772.Link, Google Scholar (
- 2022) The omnichannel continuum: Integrating online and offline channels along the customer journey. J. Retailing 98(1):111–132.Crossref, Google Scholar (
- 2023) Estimating the long-term impact of major events on consumption patterns: Evidence from covid-19. Marketing Sci. 42(5):839–852.Link, Google Scholar (
- 2012) Recommendation networks and the long tail of electronic commerce. Management Inform. Systems Quart. 36(1):65–83.Crossref, Google Scholar (
- 2016) Ticking away the moments: Timing regularity helps to better predict customer activity. Marketing Sci. 35(5):779–799.Link, Google Scholar (
- 2012) Shopping cost and brand exploration in online grocery. Amer. Econom. J. Microeconomics 4(3):96–120.Crossref, Google Scholar (
- 2022) Online and offline retailing: What we know and directions for future research. J. Retailing 98(1):152–177.Crossref, Google Scholar (
- 2021) Leveraging purchase regularity for predicting customer behavior the easy way. Internat. J. Res. Marketing 38(1):194–215.Crossref, Google Scholar (
- 2020) Shopper: A probabilistic model of consumer choice with substitutes and complements. Ann. Appl. Statist. 14(1):1–27.Crossref, Google Scholar (
- 2021) Instacart survived Covid chaos—But can it keep delivering after the pandemic? Forbes Online (January 27), https://www.forbes.com/sites/chloesorvino/2021/01/27/instacart-survived-covid-chaos---but-can-it-keep-delivering-after-the-pandemic/?sh=1fc4a19ebfa1.Google Scholar (
- 2011) How does popularity information affect choices? A field experiment. Management Sci. 57(5):828–842.Link, Google Scholar (
- 2022) Foodnet: Simplifying online food ordering with contextual food combos. 5th Joint Internat. Conf. Data Sci. Management Data (9th ACM IKDD CODS 27th COMAD) (ACM, New York), 178–185.Google Scholar , M MT, Sathyanarayana J (
- 2017) Can offline stores drive online sales? J. Marketing Res. 54(5):706–719.Crossref, Google Scholar (
- 2013) How video rental patterns change as consumers move online. Management Sci. 59(11):2622–2634.Link, Google Scholar (