Frontiers: ChatGPT Referrals to E-Commerce Websites: How Do LLMs Compare Against Traditional Channels?

Published Online:https://doi.org/10.1287/mksc.2025.0489

Abstract

We investigate organic large language model traffic (oLLM) versus traditional digital channels in e-commerce. Analyzing 12 months of first-party data from 973 websites with $20 billion combined revenue, we examine more than 50,000 transactions from ChatGPT referrals alongside 164 million transactions from traditional channels. Using regression models that account for data sparsity, we assess financial metrics (conversion rate, average order value, revenue per session) and engagement metrics (bounce rate, session duration, page views). Results are consistent across extensive robustness checks. One year after launch, oLLM exhibits conversion rates and revenue per session above paid social but below all other traditional channels. Product complexity moderates the effects: oLLM’s financial outcomes and traffic shares are stronger in complex product categories. Engagement metrics show favorable bounce rates but lower session duration and page views. Temporal analysis shows increasing conversion rates but declining average order values, yielding only moderate revenue-per-session gains over time. Cross-website analyses support growing consumer LLM proficiency as the underlying mechanism. The descriptive study positions oLLM as a new and developing channel. With low volumes and modest revenue per session, oLLM currently serves niche informational needs of proficient consumers and does not yet function as a broad conversion channel.

History: Puneet Manchanda served as the senior editor.

Supplemental Material: The online appendices and data files are available at https://doi.org/10.1287/mksc.2025.0489.

CORRECTED VERSION OF RECORD; SEE END OF ARTICLE

1. Introduction

A new digital channel emerged in August 2024 when ChatGPT, the dominant large language model (LLM) platform, began providing organic outgoing links that direct users to relevant e-commerce websites upon expression of purchase intent. For instance, consumers searching for an espresso machine receive personalized product comparisons and country-specific purchase links (Figure 1). As of the time of this study, these links are fully organic and not advertiser driven.

Figure 1. (Color online) Product Recommendations for Espresso Machines Provided by ChatGPT, Featuring Outgoing Links to Online Retailers (“Buy” Links on the Right)

This organic LLM traffic (oLLM) enters a landscape long shaped by organic and paid search (Ghose and Yang 2009, Li et al. 2016, Reisenbichler et al. 2022) but also referrals, email, affiliate, paid social, and direct channels (Manchanda et al. 2006, Bleier and Eisenbeiss 2015, de Haan et al. 2016, Wies et al. 2023, SimilarWeb 2024). Channels differ in where and how they match consumers to sellers along the purchase funnel. Their lower-funnel effectiveness is commonly measured by conversion rates in academia (Ghose and Yang 2009, Rutz and Bucklin 2011) and practice (Google 2024). How oLLM compares remains an open question.

Several mechanisms suggest oLLM may generate superior outcomes. LLMs access richer contextual information than traditional channels (Soviero et al. 2024), which have been constrained by privacy regulation (Johnson et al. 2020, Miller and Skiera 2024, Aridor et al. 2025). They synthesize information across attributes, reviews, and contextual factors (Yu et al. 2024) and can reason across dimensions that keyword-based search cannot (Luo et al. 2019, Al-Hasan et al. 2024, Yu et al. 2025). Conversational interfaces further enable preference clarification and iterative personalization (Jannach 2023, Al-Hasan et al. 2024), potentially reducing cognitive burden and increasing convenience and persuasion (Steyvers and Kumar 2024, Salvi et al. 2025). Early industry evidence is consistent with this logic: A study of 100 websites reports 6.7% conversion rate for oLLM versus 3.9% for organic search (ThoughtMetric 2025); another in-depth study reports 15.9% versus 1.8% (Seer Interactive 2025); and value per session from oLLM is estimated to be 4.4 times higher than organic search (Semrush 2025).

Other evidence points in the opposite direction. Perceived recommendation usefulness may be hampered by inaccuracies in current LLM outputs (Search Engine Land 2025), declining response quality at scale (Huang and Rust 2025), and interface characteristics that can increase cognitive load (Nguyen et al. 2022). Consumer adoption of LLM-based shopping remains early-stage: technology anxiety and trust moderate adoption intentions (Foroughi et al. 2025), adoption and use correlate with digital sophistication and education (Yang et al. 2025), and only 2.1% of ChatGPT conversations involve purchasable products (Chatterji et al. 2025), reflecting delayed emergence of behavioral shifts (Padilla et al. 2025). Empirical findings consistent with these indicators include Adobe (2025) reporting oLLM conversion rates 9% lower than “non–artificial intelligence (AI)” channels, and SALT (2025) finding engagement levels 27% below organic search in most categories.

Understanding oLLM has broader implications. Changes in search costs and decision quality may influence consumer welfare (Lynch and Ariely 2000, Brown and Goolsbee 2002, Anderson and Renault 2006, Ellison and Ellison 2009, Dinerstein et al. 2018, Ursu et al. 2022). Recommendation bias may differ from advertiser-influenced channels (OpenAI 2025, Reuters 2025).1 As deep-link entry grows, website design may need to adapt, and retail media may become less important (eMarketer 2024, Wroe 2024). More generally, LLMs are reshaping online behavior (Padilla et al. 2025, Gholami et al. 2026) and fueling expectations of disruption to e-commerce channels (The Economist 2023).

Our descriptive study is the first large-scale empirical analysis of oLLM relative to traditional channels. Using 12 months of data (August 2024–July 2025) from 973 e-commerce websites with $20 billion in annual revenue, we observe more than 50,000 oLLM transactions and 164 million transactions from other channels. Through direct access to companies’ Google Analytics accounts provided by Grips Intelligence,2 we observe the channel that brought each consumer to the e-commerce website and their subsequent purchase behavior. Importantly, because these channel assignments rely on last-click attribution, we cannot capture upper-funnel contributions and will understate channels’ roles when they primarily serve discovery functions (Li and Kannan 2014, de Haan et al. 2016, Li et al. 2016, Berman 2018). Still, last-click metrics remain the industry standard and, under current data constraints, provide the most comparable basis for cross-website channel benchmarking. We interpret all results with these limitations in mind.

We compare oLLM with traditional channels across financial metrics (conversion rate, average order value, and revenue per session) and engagement metrics (bounce rate, session duration, and page views) using both model-free and regression approaches. We test robustness across alternative data processing choices, website samples, LLM platforms, and timeframes. We also explore differential effects across websites that differ in product category complexity and consumers’ LLM proficiency (Figure 2).

Figure 2. Conceptual Model

Two patterns emerge. First, oLLM exhibits conversion rates and revenue per session above paid social but below all other traditional channels. Product complexity moderates the effects: oLLM’s financial outcomes and traffic shares are stronger in complex product categories, pointing to limited perceived usefulness for simpler products. Engagement metrics reveal relatively favorable bounce rates but lower session duration and page views.

Second, temporal analysis shows increasing conversion rates. However, declining average order values partially offset these improvements, resulting in moderate revenue-per-session uplift over time. Cross-website analyses reveal that this pattern is consistent with growing consumer LLM proficiency as an underlying mechanism.

The paper proceeds as follows: We present data, metrics, and methodology, followed by descriptive evidence and regression results comparing oLLM with traditional channels. We then examine robustness, temporal dynamics, and heterogeneity in oLLM patterns across websites. We conclude with summary of key insights and limitations.

2. Data, Metrics, and Models

2.1. Traffic Volume of oLLM and Focus on ChatGPT

One year after launch, oLLM accounts for less than 0.2% of all traffic in our data set (Figure 3), about 200 times smaller than Google’s organic search. This reflects oLLM’s low volume in the broader market (Ahrefs 2025, SE Ranking 2025).

Figure 3. (Color online) oLLM Traffic (from ChatGPT Only) Compared with Traditional Channels (log Scale)

ChatGPT dominates the oLLM channel, accounting for over 90% of observed sessions3 from LLM platforms (Figure 4). Other platforms (Perplexity 4.1%, Gemini 2.6%, Copilot 2.1%, Deepseek 0.07%, Grok 0.02%) show negligible volume. Our main analyses therefore examine ChatGPT traffic only, but we include all platforms in a robustness check (Section 4).

Figure 4. (Color online) Observed Traffic from LLMs in Our Data (Absolute Scale)

2.2. Data Set

We analyze first-party e-commerce data from Google Analytics across 973 websites. Table 1 shows overlapping 3-, 6-, and 12-month data blocks. Our main analyses use the six-month data set, capturing most oLLM sessions while avoiding potential idiosyncratic introduction effects. We aggregate data weekly to account for sparse observations. We replicate all analyses using alternative timeframes (3 and 12 months) and aggregation periods (daily, monthly) in Section 4.

Table

Table 1. Overview of Data Sets

Table 1. Overview of Data Sets

Metric12 months (robustness)6 months (main)3 months (robustness)
Date range start2024-07-282025-02-022025-04-27
Date range end2025-08-022025-08-022025-08-02
Number of websites973973953
Total sessions10,513,177,1086,010,804,5423,271,538,643
Total revenue$20,602,399,030$11,485,304,150$6,301,491,260
Total transactions164,875,69092,826,26651,391,206
Total oLLM transactions50,25144,43733,680
Total oLLM sessions4,915,7794,149,1872,944,179
Total oLLM revenue$6,965,894$6,082,834$4,594,709
Mean sessions per website/week238,627254,189253,509
Mean revenue per website/week$467,631$485,698$488,298
Mean transactions per website/week3,7423,9253,982
Mean conversion rate per website/week3.30%3.27%3.31%


Notes. Weekly aggregated data. oLLM refers to traffic from ChatGPT referrals.

2.2.1. Coverage.

Following Similarweb (2025), our data set contains websites from 24 categories. The largest is “E-Commerce and Shopping” (including retailers and marketplaces) with 1.2 billion sessions, followed by “Lifestyle” (including Fashion, Beauty, and Cosmetics) with 1.1 billion sessions. Detailed category descriptions and statistics appear in Online Appendix A.1.

The data set covers all continents, with the Americas (2.7 billion sessions) and Europe (2.1 billion sessions) representing the majority. Forty-nine countries contribute at least 10 million sessions each. Detailed statistics appear in Online Appendix A.2.

2.3. Metrics

We focus our analyses on financial metrics: conversion rate (CR), average order value (AOV), and revenue per session (RPS). CR is the most widely used financial performance metric in e-commerce, capturing the share of website sessions that result in a transaction. We also measure AOV, because the probability of purchase tends to be higher when the product is cheaper, meaning CR and AOV are confounded. RPS captures the joint effect of purchase likelihood and spending per purchase, and gives meaningful intuition on the value of sessions from an advertising perspective. Table 2 provides descriptives for oLLM and eight traditional channels.

Table

Table 2. Channel Volume Summary Statistics (February–August 2025)

Table 2. Channel Volume Summary Statistics (February–August 2025)

ChannelObservationsSessions (M)Revenue ($M)
oLLM30,2324.16.1
Paid search44,8001,530.03,190.4
Organic search47,2441,427.32,595.4
Direct47,2471,269.62,695.3
Other46,941830.51,581.5
Referral46,789353.5605.2
Paid social27,019337.5145.5
Email36,544215.9513.5
Affiliate13,47642.5152.6


Note. (M) = millions; observations refer to week-website-device-channel data points.

We investigate engagement through bounce rate (BR), session duration (SD), and page views (PVs). Although engagement does not directly translate to financial results, it indicates traffic quality. High BR (visitors leaving without another click) typically signals poor fit between website content and consumer interest. Complete variable definitions appear in Online Appendix A.3.

2.4. Modeling Approach

Sparse oLLM observations require careful modeling. At the week/website/device level, CRs may derive from few sessions and AOVs from single transactions. Our models address two challenges: (a) datapoints vary substantially in underlying sessions, and (b) ratio metrics exhibit overdispersion.

For CR and BR, we estimate quasibinomial models with dispersion parameters. For AOV, RPS, SD, and PV, we estimate weighted linear models with bounded, website-relative weights to reduce aggregation-induced heteroskedasticity and cap leverage. All models use

logit(pitCR)=αc(i,t)+γi+δd(i,t)+μm(t),(1)
where αc are channel fixed effects, γi are website fixed effects, δd are device fixed effects, and μm are month fixed effects. The linear specifications follow the same fixed-effects structure, with outcomes modeled in levels rather than log-odds. Complete specifications appear in Online Appendix A.4.

3. Results

3.1. Model-Free Evidence

Model-free comparisons suggest lower CR, AOV, and RPS for oLLM relative to traditional channels (Figures 5 and 6). However, these distributions reflect sparse data: The median oLLM observation contains zero transactions. Without accounting for underlying session counts or website heterogeneity, such comparisons prove misleading—a pattern that persists across engagement metrics (Figures 7 and 8).

Figure 5. (Color online) Model-Free Evidence on CR by Channel
Figure 6. (Color online) Model-Free Evidence on AOV and RPS by Channel
Figure 7. (Color online) Model-Free Evidence on BR by Channel
Figure 8. (Color online) Model-Free Evidence on PV and SD by Channel

3.2. Regression Results

Table 3 reports regression estimates with channel fixed effects αc measuring differences between each traditional channel and oLLM (the reference category). We highlight in bold coefficients with p<0.05 and favoring oLLM.

Table

Table 3. Regression Results: Six-Month Data

Table 3. Regression Results: Six-Month Data

Dependent variables
CRAOVRPSBRSDPV
Affiliate0.621*** (0.060)24.7*** (3.9)3.57*** (0.12)0.217*** (0.035)−12.2* (6.8)0.642*** (0.057)
Paid Search0.370*** (0.058)4.08 (3.10)1.95*** (0.08)−0.156*** (0.033)31.2*** (4.5)0.905*** (0.038)
Other0.366*** (0.059)5.51* (3.15)2.19*** (0.08)0.883*** (0.033)46.1*** (4.7)0.622*** (0.039)
Direct0.282*** (0.058)13.5*** (3.1)3.07*** (0.08)0.321*** (0.033)5.86 (4.56)0.692*** (0.038)
Email0.276*** (0.059)10.5*** (3.3)2.69*** (0.09)0.181*** (0.033)55.4*** (5.1)0.876*** (0.042)
Referral0.218*** (0.059)12.6*** (3.2)1.61*** (0.08)0.202*** (0.033)59.8*** (4.8)0.776*** (0.040)
Organic Search0.121** (0.058)1.83 (3.08)1.01*** (0.08)−0.136*** (0.033)14.6*** (4.5)0.571*** (0.037)
Paid Social−0.760*** (0.059)5.42 (3.49)−0.718*** (0.091)0.360*** (0.033)−131*** (5)−0.413*** (0.045)
Fixed effects: Website, device, monthYesYesYesYesYesYes
R2 [pseudo-R2][0.739]0.4720.284[0.711]0.4570.155
Observations340,292289,524340,292340,292340,292340,292


Notes. All coefficients relative to oLLM baseline. Standard errors in parentheses.

 *p < 0.1; **p < 0.05; ***p < 0.01.

3.2.1. Financial Outcomes.

Figure 9 displays channel fixed effects as log-odds coefficients. To facilitate interpretation, we convert these to percentage differences in conversion likelihood. Among all channels, paid social and organic search exhibit the smallest gaps relative to the oLLM baseline. Paid social sits substantially below oLLM (coefficient: −0.760), corresponding to a 53% lower likelihood of conversion. Organic search modestly exceeds oLLM (coefficient: 0.121), translating to a 13% higher conversion likelihood. Other channels show larger differences: Affiliate’s coefficient of 0.621 corresponds to an 86% higher likelihood of conversion relative to oLLM. These regression-adjusted estimates substantially narrow the gaps indicated by model-free evidence.

Figure 9. (Color online) Regression Results for CR by Channel (log-Odds)

AOV estimates exhibit wider confidence intervals, yielding statistically significant differences for only four of eight channels (Figure 10). Affiliate shows the largest difference at $24.7 higher than oLLM. Differences for organic search, paid search, paid social, and other are not statistically significant.

Figure 10. (Color online) Regression Results for AOV by Channel

RPS results show tighter confidence intervals (Figure 11). oLLM exceeds paid social but falls significantly below remaining channels.

Figure 11. (Color online) Regression Results for RPS by Channel

3.2.2. Engagement Outcomes.

BR is the only metric where oLLM shows more favorable outcomes than most channels (Figure 12), though organic and paid search—channels optimized for low BR—still show better rates. Email shows 20% higher BR than oLLM; organic search shows 13% lower.

Figure 12. (Color online) Regression Results for BR by Channel

PV analysis shows oLLM below all channels except paid social, whereas SD places oLLM far ahead of paid social and comparable to affiliate and direct channels (Figure 13).

Figure 13. (Color online) Regression Results for PV and SD by Channel

4. Robustness

The main models aggregate data at the weekly level, apply no minimum observation thresholds, cover 973 websites, analyze ChatGPT as the only oLLM channel, and span six months. We test the robustness of our findings with respect to all of these specifications. Table 4 summarizes observations and model fit. Notably, the R2 values of the CR model remain stable despite substantial variation in sample size.

Table

Table 4. Robustness Checks Overview

Table 4. Robustness Checks Overview

SpecificationRobustness concern addressedR²Observations
Main model– (reference model)0.739340,292
Data processing choices
 Daily aggregationEffect of sparse observations for oLLM0.6893,896,527
 Monthly aggregationEffect of sparse observations for oLLM0.750149,628
 Minimum 10 sessionsEffect of sparse observations for oLLM0.739320,021
 Minimum 100 sessionsEffect of sparse observations for oLLM0.740280,190
 Minimum 1,000 sessionsEffect of sparse observations for oLLM0.742188,376
 WinsorizedOutliers0.727340,292
Observed websites
 Top 25% oLLM sessionsSelection of websites in the data0.70091,385
 Top 25% total transactionsSelection of websites in the data0.73891,942
Observed LLM platforms
 ChatGPT plus other LLMsOperationalization of oLLM0.746408,342
 ChatGPT plus mobile app trafficOperationalization of oLLM0.741395,430
Observed timeframes
 12 monthsStability of the effects over time0.723614,714
 3 monthsStability of the effects over time0.727188,038


Note. R² values shown for CR regression model denote McFadden pseudo-R2 values.

Figure 14 reports CR channel fixed effects across all specifications. The core finding is robust: oLLM exhibits lower CR than all traditional channels except paid social. Differences relative to organic search become statistically insignificant in five specifications (monthly aggregation, minimum 100 sessions, top 25% of oLLM traffic, oLLM including mobile app, and the three-month window) and exhibit a nonsignificant sign reversal in one specification (minimum 1,000 sessions). Effect sizes vary systematically: the top-25%-revenue subset shows consistently larger gaps, whereas the minimum-1,000-sessions specification yields smaller effects. Nonetheless, the overall pattern remains consistent.

Figure 14. (Color online) Robustness Check Results for CR by Channel

AOV, RPS, and engagement metrics exhibit similar robustness, with minor variation in effect sizes. Details on all tests are available in Online Appendix B.

5. Temporal Dynamics of oLLM Outcomes

The preceding analyses document oLLM’s lower CR and RPS relative to most traditional channels. Yet oLLM remains a young and rapidly developing channel, with both the underlying technology and consumer adoption still maturing. This raises a natural generalizability question: Do the observed gaps persist, narrow, or widen over the study period?

To address this question, we examine temporal trends in CR, AOV, and RPS. Channel fixed effects across 12-, 6-, and 3-month windows already suggest gradual improvement. We now model these trajectories explicitly, using multiple trend specifications to assess robustness. This analysis cannot isolate specific mechanisms driving temporal changes but can characterize the direction and pace of oLLM’s trajectory relative to established channels.

5.1. Model-Free Evidence on Time Trend

Model-free evidence (Figure 15) shows CR increases over time for oLLM, whereas traditional channels remain stable except for seasonal peaks in November and December. Mean weekly AOV (Figure 16) exhibits substantially more volatility, especially for oLLM, and appears to decrease over the observation period. RPS remains stable for traditional channels but shows systematic increase for oLLM (Figure 17).

Figure 15. (Color online) CR by Channel over Time
Figure 16. (Color online) AOV by Channel over Time
Figure 17. (Color online) RPS by Channel over Time

5.2. Regression Analysis with Time Trend

Given data sparsity challenges with model-free evidence for oLLM, we estimate regression models mirroring main specifications but including trend variables. Trend variable specification substantially impacts predictions, so we test four alternatives: linear trend, centered (linear) model, Gompertz model (asymmetric S-curve), and Sigmoid model (logistic with hard upper limit). Complete specifications appear in Online Appendix C.1.

5.3. Projection of Financial Metrics

To facilitate interpretation, we create population estimates for all websites across time, allowing us to project oLLM outcomes in the next year. Although trend continuation is highly uncertain, projections may offer insight into development speed necessary for oLLM to converge with traditional channels.

Figure 18 illustrates slow CR increase for traditional channels and comparatively steeper increase for oLLM. Three models consistently predict CR for oLLM will approach organic search but not reach parity within 12 months. The linear time model shows exponential growth due to the logit transformation of probabilities.

Figure 18. (Color online) Prediction of CR Development per Channel

Figure 19 shows substantial downward trend for oLLM AOV, which counteracts the positive development in CR and raises the question, whether the CR improvement is only an artifact of lower AOVs.

Figure 19. (Color online) Prediction of AOV Development per Channel

For RPS, we find oLLM gradually produces more valuable sessions. Similar to CR predictions, the linear model is more optimistic about oLLM outcomes, whereas the other three models consistently predict oLLM will not converge with the next best channel, organic search, within 12 months (Figure 20).

Figure 20. (Color online) Prediction of RPS Development per Channel

6. Cross-Website Variation in oLLM Outcomes

The preceding analyses document oLLM’s lower CR, AOV, and RPS relative to all traditional channels except paid social, as well as changes over time. Multiple mechanisms could explain these patterns, ranging from trust deficits and privacy concerns to differences in search intent and funnel position. We focus on two mechanisms grounded in prior research and amenable to empirical exploration with our data.

First, LLM recommendations may offer limited usefulness due to inaccuracies or interface friction (Nguyen et al. 2022, Search Engine Land 2025). Second, consumers are still developing proficiency in using LLMs (Chatterji et al. 2025, Foroughi et al. 2025, Yang et al. 2025). To investigate these conjectures, we exploit cross-sectional variation across4 the 973 websites in our data set. These analyses are descriptive rather than causal: They offer suggestive evidence on potential mechanisms but remain vulnerable to alternative explanations and should not be interpreted as definitive proof.

6.1. Product Complexity

We examine how oLLM outcomes vary with product complexity, under the assumption that LLMs are more useful when purchases require substantial information synthesis and deliberation. For simple products with straightforward decisions, conversational assistance may add little incremental value while introducing friction.

We proxy product complexity using website categories classified by three LLMs (Claude, Gemini, ChatGPT). Heavy industry and engineering, business and consumer services, vehicles, law and government, finance, and jobs and career are categorized as high-complexity categories, whereas reference materials, news and media, adult, sports, and arts and entertainment are categorized as low-complexity (see Table A.2 in the Online Appendix for complete rankings). Websites in high-complexity categories exhibit 4.6 times higher oLLM traffic shares on average, consistent with our usefulness assumption and classification.

Across all three financial metrics, oLLM outcomes are more favorable on high-complexity websites (see Online Appendix D for detailed results). When product complexity is high, oLLM CR exceeds paid social, referral, email, organic search, and direct channels (Figure 21); AOV differences relative to other channels narrow (Figure 22); and RPS moves closer to organic and paid search levels (Figure 23).

Figure 21. (Color online) CR by Channel (log-Odds), Split by Websites’ Category Complexity
Figure 22. (Color online) AOV by Channel Split by Websites’ Category Complexity
Figure 23. (Color online) RPS by Channel, Split by Websites’ Category Complexity

Taken together, these patterns are consistent with (currently) limited usefulness for simple products as one potential mechanism behind oLLM’s weaker financial metrics in Section 3. However, this interpretation does not align with the temporal pattern in Section 5, which shows decreases in AOV.

A caveat is that, although all conversions in our data are financial transactions, what constitutes a conversion can still vary across product categories (see Table A.2 in the Online Appendix), so the complexity split may partly reflect these category-level differences in conversion type rather than differences in LLM assistance per se.

6.2. LLM Proficiency

We assess whether consumer LLM proficiency moderates channel outcomes using two website-level proxies: the share of “technophiles” among visitors (from Google’s affinity segments) and average visitor age. Websites in the top technophile quartile exhibit 3.8 times greater oLLM traffic share than those in the bottom quartile. Similarly, websites with younger visitors show 5.5 times greater oLLM traffic share, supporting our proficiency proxies.

Figure 24 shows that CR gaps between oLLM and traditional channels are substantially smaller when LLM proficiency is high; oLLM CR even exceeds organic search in both analyses. In contrast, Figure 25 indicates that higher proficiency is associated with larger AOV gaps, meaning oLLM generates lower order values than traditional channels. Although the technophile split yields relatively imprecise AOV estimates, the age split produces clearly significant differences. Finally, Figure 26 shows mixed RPS patterns: oLLM gains on paid and organic search among high-technophile websites, and shows improved positioning relative to direct, affiliate, and other among younger-visitor websites (regressions in Online Appendix D).

Figure 24. (Color online) CR by Channel (log-Odds), Split by Consumer LLM Proficiency Proxies
Figure 25. (Color online) AOV by Channel, Split by Consumer LLM Proficiency Proxies
Figure 26. (Color online) RPS by Channel, Split by Consumer LLM Proficiency Proxies

These patterns mirror Section 5, where rising CR accompanies declining AOV,5 yielding only moderate RPS improvements. The parallel cross-sectional and temporal findings indicate that growing consumer LLM proficiency might contribute to oLLM’s development over time.

7. Key Insights and Limitations

Our research examines oLLM relative to traditional digital channels using 12 months of data from 973 e-commerce websites with more than 50,000 oLLM transactions. Two key patterns emerge.

7.1. Current Positioning

One year after launch, oLLM exhibits a higher CR and RPS than paid social, but lower CR and RPS than all other traditional channels. Engagement metrics show comparatively favorable BRs for oLLM, but fewer PVs and intermediate SD. Extensive robustness checks and the geographic and category diversity of the data set support the stability and external relevance of these findings. Online Appendix E discusses why results may differ from industry reports.

Gaps between oLLM and traditional channels are more pronounced in low-complexity product categories. In contrast, oLLM shows more favorable relative outcomes and approaches the RPS of organic search for high-complexity categories, which may benefit more from the extensive context, superior synthesis, and adaptive conversational interface LLMs can offer. Although exploratory, these findings indicate that oLLM’s comparatively weak financial metrics may reflect limitations in how users currently use and value LLMs as a shopping tool.

7.2. Temporal Trajectory

Over the first 12 months, CR and RPS increased, whereas average order value (AOV) declined. Projections suggest that the CR gap relative to traditional channels will continue to narrow, offset by a widening AOV gap. Thus, RPS is expected to improve only moderately, reaching parity with organic search—the next-ranked channel—only under the most aggressive forecasting scenario.

This trade-off—higher conversion but lower order value—is also evident when segmenting websites by visitor LLM proficiency. Websites serving more LLM-proficient consumers display the same combination of elevated CR and reduced AOV. The robustness of this dual pattern across temporal and cross-sectional analyses points to growing consumer LLM proficiency as a likely mechanism behind the observed evolution of oLLM.

7.3. Implications

7.3.1. For Retailers.

oLLM currently offers the greatest value for retailers operating in high-complexity categories where consumers demand extensive comparison and guidance. In these settings, oLLM referrals combine comparatively strong CR and RPS with favorable BR, suggesting that they can serve as a high-intent complement to organic and paid search, even if volumes remain small. For retailers focused on low-complexity, routine purchases, current low traffic shares and weaker CR and RPS imply limited short-term gains from channel optimization alone.

7.3.2. For LLM Platforms.

Our analyses suggest that lower usefulness in low-complexity product categories aligns with the observed gap between oLLM and traditional channels. LLM platforms have launched several initiatives since the end of our observation period targeting this limitation, including instant checkout and agentic shopping. Early research indicates that agentic shopping tools may prove particularly useful for routine purchases, given a substantially higher share of observed shopping-related queries than with general-purpose LLMs (Yang et al. 2025). Such platform-based agents will compete with retailer-embedded alternatives like Amazon’s Rufus, which benefit from keeping users in familiar on-site interfaces while providing optional conversational support.

Further, recent announcements on the introduction of paid advertising formats within ChatGPT introduce strategic considerations that go beyond our organic-traffic data. Although oLLM is unpaid and therefore attractive for companies concerned about return on advertising spend, paid LLM traffic has the potential to redefine LLMs’ role in the channel ecosystem (Hermann et al. 2025). Depending on implementation, sponsored placements could reduce friction for simple purchases through one-tap offers or shift the balance between organic guidance and promotional content in ways that negatively affect usefulness and trust and thus also oLLM’s positioning.

7.4. Limitations and Future Research

All analyses in this paper are descriptive rather than causal. As such, the analysis of oLLM metrics is vulnerable to confounding factors, including systematic differences across channels’ users and usage situations.

The time-trend analysis presented in Section 5 can only provide an indication of future channel outcomes. LLM platforms, retailer strategies, and consumer behaviors evolve rapidly. Although our 12-month observation period captures the initial maturation phase, the documented patterns may not persist. Disruptive developments, such as major interface redesigns, widely adopted checkout features, or paid advertising rollouts, could fundamentally alter these trajectories.

Similarly, the exploratory heterogeneity analyses in Section 6 should only be seen as initial indications of potential mechanisms. The regression framework controls for website, device, and time effects, but the analysis remains descriptive, and the product complexity and proficiency measures are constructed at the website level rather than at the level of individual products or consumers. Understanding whether the observed patterns reflect channel-specific user behavior, LLM-induced behavioral changes, or unobserved website characteristics requires further investigation.

Finally, our analyses rely on last-click attribution, which underestimates oLLM’s contribution to the purchase funnel (Li and Kannan 2014, Li et al. 2016, Berman 2018). Future work using multitouch attribution or experiments on individual websites could clarify the role of oLLM in the upper funnel and its interaction with traditional channels and supplement our broad study with more granular, in-depth insights.

Beyond these limitations, our study raises new questions on the emerging role of oLLM in the digital channel mix—including whether treating oLLM as merely another channel understates its unique ability to shape consumer decision processes through its conversational interface. In Online Appendix F, we outline promising directions for future research on how LLMs may reshape digital commerce, channel strategy, and consumer welfare.

Acknowledgments

The authors thank Daniel Schmeh and Jan Seifried for their brilliant ideas and helpful discussions on this paper; Daniel Blaseg, Alexander Bleier, and Michel Clement for helpful feedback and early comments; and Grips Intelligence for supporting academic research by providing the extensive data used in this study entirely free of charge. This research was supported by Grips Intelligence through the provision of proprietary data and computing resources. No direct financial support was provided to the authors. No additional third-party funding was received.

The first author is employed by Grips Intelligence, a market intelligence company serving multiple clients, including various technology platforms. Grips Intelligence’s business model depends on delivering accurate, unbiased market intelligence; it has no vested interest in making any particular platform, including Google, OpenAI, or others analyzed in this study, appear more or less favorable.

The company’s involvement was limited to: (1) providing access to data, (2) covering computing costs for data processing, and (3) offering technical guidance on data collection and segmentation methodology. Grips Intelligence did not commission the study. The company had no role in formulating the research questions, designing the study, selecting analytical methods, interpreting results, or drafting the manuscript, and it did not approve the manuscript prior to submission. The authors declare no other financial or non-financial competing interests.

Endnotes

1 LLM platforms had not yet introduced affiliate-like systems or ads during our study period (Hermann et al. 2025).

2 The representativeness of Grips’ data relative to e-commerce in general has been previously assessed by comparing it with trusted proprietary data sets (e.g., Similarweb) and verifiable public ones (e.g., Shopify Quarterly Reports, U.S. E-Commerce Census). For details, see Online Appendix E in Aridor et al. (2025).

3 A session refers to a continuous period of website activity by a consumer, potentially comprising multiple page views. Sessions typically end after 30 minutes of inactivity. A single consumer may generate multiple sessions on the same website within a day.

4 Our data do not allow us to segment by product or customer characteristics and channels simultaneously. We therefore analyze differences across websites rather than within websites.

5 The declining order values could reflect increased purchase targeting that reduces exploratory browsing or heightened price consciousness from LLM transparency about alternatives.

References

Maximilian Kaiser is a PhD candidate in quantitative marketing at the University of Hamburg Business School. His research examines global e-commerce, marketing analytics, and external shocks using big data and quantitative methods. This article forms part of his cumulative doctoral dissertation. He has published in Management Science and the Journal of International Economics and previously served as Director at Grips Intelligence and Consultant at the World Bank Group.

Christian Schulze is associate professor of marketing at Frankfurt School of Finance & Management. His research spans customer strategy, e-commerce, and the digital economy, emphasizing profitable customer management and data-driven decision making. He has worked closely with companies, linking rigorous research to practical questions faced by organizations. He has published in Marketing Science, Journal of Marketing, International Journal of Research in Marketing, MIT Sloan Management Review.

CORRECTION

In this article, “Frontiers: ChatGPT Referrals to E-Commerce Websites: How Do LLMs Compare Against Traditional Channels?” by Maximilian Kaiser and Christian Schulzec, (first published in Articles in Advance, April 21, 2026, Marketing Science, DOI: 10.1287/mksc.2025.0489, the author’s acknowledgments were updated (page 16).