The robots.txt setting that hides your catalog from ChatGPT | Dan Ornstein, Retail Industry Leader, Pivotree | Ep. 4
Data vs. CommerceJune 17, 2026
4
20:0018.31 MB

The robots.txt setting that hides your catalog from ChatGPT | Dan Ornstein, Retail Industry Leader, Pivotree | Ep. 4

A customer asks ChatGPT how to fix his style, gets sent to three stores, and nobody on the retail side can explain why those three and not the other thirty. That gap is where this episode sits. Matt Johnson and Floyd Blaikie talk with Dan Ornstein, Retail Industry Leader at Pivotree, about how large language models decide which retailers to recommend and what that means for the people who own product data.

ã…¤

Dan took the data side, and the friction surfaced fast. The marketers want brand, lifestyle photography, and emotional copy to carry the search. The machine starts with UPC codes, GTINs, inventory, and shipping policy before it cares about any of that. The conversation covers the robots.txt settings that quietly block AI crawlers, the attribution problem when a shopper leaves ChatGPT and goes straight to a store, and why content marketing still earns its place once the data foundation is solid.

ã…¤

👤 Guest Bio

Dan Ornstein is Pivotree's Retail Industry Leader, focused on helping retailers grow revenue through unified commerce, customer experience, product data, and practical AI. Before Pivotree he was a Partner at KPMG Canada and a Director at Publicis Sapient, working across e-commerce, omnichannel, and loyalty. On this episode he took the data side, arguing that product data completeness, not brand copy, is what gets a retailer surfaced by AI shopping agents in the first place.

ã…¤

📌 What We Cover

  • Why retailers suddenly see traffic from ChatGPT, Perplexity, and Gemini without doing anything to earn it
  • The order an LLM works in: product data first, then price and availability, then third-party trust signals from sites like Vogue, GQ, or Reddit
  • The robots.txt problem, where fraud and denial-of-service settings block the AI crawlers before they ever reach your catalog
  • How subjective attributes like "soft" or "puffy and warm" have to become data the model can read, like down fill rate and temperature rating
  • The attribution gap when a shopper exits ChatGPT and goes straight to the store, and why LLM referrals still convert at a higher rate
  • Which categories suit agentic shopping now (grocery, hardware) versus where brand still drives the decision (fashion, home furnishings)
  • What an e-commerce or merchandising leader should check tomorrow to confirm they show up at all

ã…¤

🔗 Resources Mentioned

  • ChatGPT, Perplexity, Gemini (AI assistants surfacing retailer recommendations)
  • Shopify (embedded LLM referral analytics)
  • Vogue, GQ, Reddit (third-party reference sites the models check)
  • Amazon (marketplace-seller comparison)
  • TikTok, YouTube, Instagram (social channels referenced)

[00:00:00] Welcome to Data vs. Commerce, where we explore the messy middle between database and doorstep. I'm Matt Johnson. And I'm Floyd Blakey. Let's dig in. Guys, welcome back. Dan, I'm so excited to get into a conversation with you today about what's happening in the world of retail and agentic commerce.

[00:00:21] And I thought before we started this conversation, I thought it might be fun to just tell you a little bit of an anecdote about an experience that I had recently, real experience around agentic shopping. So, you know, sitting with my kids one night and I said, you know, hey, guys, we launched this new podcast. And, you know, I was, you know, looking at myself on the camera and I'm like, I look so basic, you know.

[00:00:47] And, of course, I'm asking my teenagers, you know, so I'm trying to get their feedback. And I'm like, I look so basic, you know, I'm just another bald guy with a beard, you know. Like, what can I do to, like, amplify my style? And they said, well, why don't you ask ChatGPT? And so that's what I did. I took a picture of myself and uploaded it to ChatGPT. And I said, hey, I need to revamp my style. I need a totally different look.

[00:01:15] Like, what are some things that would work for me? And it gave me back some really good suggestions. I mean, one of them was get glasses, but I'm, you know, then I'm like, I'll look just like Dan, you know. So I can't do that. But it gave me back really good suggestions. And then I was like, okay, okay, I like this. I like this. You know, but where should I get those? Like, I, you know, I don't want to spend a fortune. You know, I'm not going to go to Nordstrom's, but I don't want, you know, some Chinese knockoff.

[00:01:44] So tell me, like, where do I get it? And it sent me back a list of links in different storefronts. And it got me thinking, how does it know which storefronts to send me to, right? And why were some visible in there and why weren't others? And I'm pretty sure there wasn't any ads going on. If there was, then, you know, I have a complaint, but maybe there is. But you're here to fill us in, Dan. Dan, what's up with that? How's this whole thing work? Fascinating question.

[00:02:14] So in my role as, you know, our industry lead for retail, I spent a lot of time over the last few months trying to figure that out. And it is like the question I get from our retail customers. They're trying to figure out in the first place why they're even getting traffic from ChatGPT or Perplexity or Gemini. Because in many cases, up until a couple of months ago, they weren't doing anything in particular to drive traffic out of the AI platforms. And in other cases, they want to know how to get recommended.

[00:02:41] And it's really a very interesting rabbit hole to lose a lot of your life down trying to figure out how these large language models determine why they're sending me to, you know, my local running room store as opposed to some other place. And I guess it kind of is unwraps in layers.

[00:03:02] If you use ChatGPT a lot and it's relatively present on your device, it does kind of keep track of where you go and maybe where you shop or your preferences. So it's possible that some of those results are based on somewhere you went, you know, visiting recently, just in terms of trying to get a sense. It's a library because it recommended glasses.

[00:03:26] And but beyond that, it looks at a number of different components to decide who to recommend and then why. And so much like a human shopper, it's trying to solve a mission of, you know, what are you looking for? How quickly can you get it? What's the best price? And so it's trying to optimize this model, especially in a general search like you provided, which is, hey, you know, I want to change my look around. Give me some ideas. Now give me some stores that represent those ideas versus a more specific mission.

[00:03:56] Like, hey, I'm looking for, you know, a specific black shirt and maybe check these five stores. Right. In which case it's a different scenario. But the first thing it'll look at is is the core product data. You know, let's say we're looking for the shirt that magnificent shirt you're wearing today, Matt. Right. Find me a black shirt, you know, in this size. Those two dimensions are the first thing it'll look for. You know, OK, so who has this shirt in this color in my size?

[00:04:25] And it'll usually try to do it. And then the next thing it'll start to look at is price availability and then trust signals like return policies and certifications about what the product quality is, where it's made, where it's manufactured, where is it sustainable? Different things that drive consumer decision making.

[00:04:46] And the last thing it'll start to look at, and this became even came more to the fore in the last in the last couple of months, is it looks for third party references to validate that information that this store is a place I should recommend to you. And so, you know, if it's someone like Nordstrom, for example, it doesn't necessarily know Nordstrom's brand positioning or, you know, it doesn't care about any of that because it's a machine.

[00:05:10] But it will look for reference sites in, say, Vogue or GQ or Reddit that say, yes, Nordstrom is a high quality seller. That's a place that people go for, you know, good quality, higher price items. And I'm going to recommend it because there's one near you. And I can see that the inventory is available and, you know, it's at a price point that you've indicated is something that you're going to pay.

[00:05:34] I think there's a really interesting disconnect between Matt, what you tried to do with ChatGPT and then Dan, what you were saying about how it looks for information. So when Matt bought his beautiful black collared shirt, you know, he probably bought it the old fashioned way. But he could have gone and typed in, I need a black button down shirt. Here's the size of the specs. But he didn't. He did a vibes based search, right? He went on a chat to BT and said, hey, how do I make myself, you know, look cool?

[00:06:02] He roasted you a little bit in the process, Dan. Like, I don't know if you just stand for this bald guy with a beard hate, but you can sort that out offline. It takes much to know. That's very much kind of like a vibes sort of thing. It's very subjective. And I think that a lot of retailers have invested a lot in this, right? Because, you know, Matt's probably going to be swayed by like beautiful lifestyle photography. Matt's going to be swayed by a brand story because, you know, he's trying to become something.

[00:06:32] It's inherently an emotional search. Sorry, Matt, but it is. So how does that work in agentic commerce search? Do AI agents care about all of that? They don't care so much about they care about it in the context of looking for a third party signal to indicate that that this is a this is a brand or a store to recommend.

[00:06:53] They don't care that, you know, the copy that might be on the website of that retailer creates this amazing, you know, emotional association to the shirt. And it's going to make you be the most coolest, amazing person ever that it's not going to care about. But it will look for other the way we think of information and certain adjectives that we would associate that are kind of meaningless to a machine like soft or, you know, the drape or things like this.

[00:07:22] Those become attributes that need to be added into the product information in a way that the language model is going to be able to interpret it. So that when the search is something like, I'm looking for a winter jacket that, you know, is really puffy and warm and the rest, it can go and figure that out based on a downfill rate and a temperature rating. And so there's an interpretation there that needs to happen. The other thing that, you know, when all this AI stuff came out and we all started talking about it and you see this all over LinkedIn, data is the key thing.

[00:07:52] If you don't have product data or company, you know, you'll never be found. And that's only partly true. What we found running our agentic readiness assessment for our retail clients and other companies we work with is, first thing is, you better make sure your security settings aren't set in such a way that they block all robots. Right? So robot.text.

[00:08:13] There are many, many sites that are set up for fraud and denial of service and other cyber security reasons that just stop A, that stop perplexity, stop ChatGPT, stop Gemini from even coming to your site. And so the first step is, hey, I need to let these things in, keep out the malicious actors because otherwise you won't even be considered to start with, you'll just be skipped.

[00:08:33] The next layer is the product information, the basic unit of measure, UPC codes, GTIN, you know, looking for a pair of size eight and a half, succiny, gel Keanu running shoes. Yes, this is in fact this product. And then it moves on to get information that is stored in other places, but that matters to completing the mission in terms of, you know, inventory availability, shipping costs, taxes, other elements of that.

[00:09:01] But the emotional elements of it still remain important when, as a seller, you can train these models to look for you. There are ads coming out, Matt, to the comment you made earlier, ads are starting to come out. I think it's ChatGPT or Gemini for sure. They will be at Google eventually, but ChatGPT has started it where you can influence the models to a certain degree with ads.

[00:09:24] But even beyond that, to get the recommendation does require content marketing because the models look at that content marketing and you can train the models to look for certain keywords so that they will recommend you. And it takes time and there are tools out there that help you monitor how you're doing. And the other big difference is most people use an example like Matt's now when they're going shopping on ChatGPT or going for a search.

[00:09:49] They have a wide open context and intent as opposed to an SEO type search, which is very specific usually to a kind of product, at least if not a brand or a store. These are intent oriented, meaning I'm going to a beach wedding in July. What should I bring? Right? Right. That's wide open.

[00:10:08] And trying to understand whether or not you're going to show up in these prompts and which prompts are resulting in the models recommending your store or your brand or someone else has become a big part of what marketing and merchandising teams are starting to try to figure out. That makes sense. I'm not losing. Yeah. And while, you know, it's the percentage growth in traffic is high, what we've seen talking to our retail customers is the numbers aren't necessarily high.

[00:10:37] You know, they're not selling millions through these channels yet, but it's growing fast. And so they end up with yet another channel to support. So at first it was my store and, you know, analog advertising. And we all know and love. Then it became e-commerce. And how do I influence search engines to find me and my site? And what do I do on my site? Then we had social channels. How do I sell through and promote myself either through influencers or ads or other things in TikTok and YouTube and Instagram? And now there's another channel.

[00:11:05] And this channel has its own behavior and its own quirks and things that need to be put in place to influence it. And the marketing race just continues and expands. And then it becomes a question of where should I spend my dollars and what sort of and how can I get the most bang for my buck? The good thing is that at least as it relates to on-site information, so product data, availability, price, all of that also helps SEO. So there's no, it's not like I've got to have, you know, all this investment is only down one channel. It helps all of it.

[00:11:34] And so there's benefit to be gained across the effort. Can that be measured yet? Because, like, I'm, you know, my background is marketing. So I can go into a tool and I can see, you know, am I number three or number four for this high-intent search term? But I don't think there's really a way for retailers to do that with LLM search. How can they know that it's working so that they can double down on that channel? There are emerging platforms out there that are starting to, that are measuring your recommendation rates.

[00:12:03] And how much are you getting recommended versus others? Where's that traffic coming from? And the prompts that are resulting in your brand or store coming up. Some of the platforms like Shopify have recently released embedded analytics that show your traffic coming from the different LLMs in addition to all the other channels. So as part of that referral traffic, the key missing piece in the journey in from an LLM to your store is the scenario where I don't click through.

[00:12:32] And so trying to figure out the attribution of somebody went and did a search on, let's say, ChatGPT and came up with a bunch of examples. And I chose to go to ABC store. If I exit ChatGPT and then go straight to ABC store, then I don't know necessarily where it came from. And so in those scenarios, companies are starting to try to figure out by looking at the shopping journey from their site. I'm trying to gauge because those people who are coming from ChatGPT to the store have a very high intent.

[00:13:02] They're going to move down the funnel to convert much faster. And we're seeing that referrals from the LLMs have converted a much higher rate than they do through organic or paid search. But it is still a bit hard to attribute. And there's still a lot of effort going on to try to figure out how you're ranking, how to continuously train those models so that you don't fall behind. And like we did with SEO, continuously make sure that all the keywords that could be out there are there. Now it's which prompts and in which scenarios am I showing up.

[00:13:30] And you can get pretty specific with them with some of these tools down to you can introduce personas and segments into them as well to see if your target segment is actually being influenced by the effort you're putting out. But it is certainly early days to figure out this race. Dan, this all makes so much sense. You know, one of the things I was thinking about as hearing you talk about conversion and everything, you know, through LLMs was the fact that as like as consumers, we're changing our buying behavior.

[00:14:00] We're changing the way we think. Like I think for so many years we've been like just overrun by so many options that at least like from my own personal experience, like going and having my chat GPT tell me exactly the five shirts and two pairs of jeans, you know, that I need to buy. To me, that is an unbelievable time saver instead of spending hours potentially shopping or going to different stores in person.

[00:14:29] What is this, you know, trend to AI tell us about the consumer and how do retailers need to think about where to spend their time? Particularly, I think around content marketing, you said, because isn't that where like this context is coming from? How does it know that I like rugged industrial outfits if there is not some sort of rugged industrial blog or Reddit, you know, thread out there? That's exactly right.

[00:14:58] So, so I think it's somewhat category dependent. Like if you think of certain of all the different things that we buy as consumers, some lend themselves to this AI agent really easily because they're very defined and very repetitive. So I think of grocery as like a prime candidate where, you know, my list is already in my grocery app. If it takes me five minutes to place my order every week and have it delivered, it's a lot.

[00:15:21] But the extra step of, hey, order the same stuff and add, you know, a watermelon this summer that this week because I'm having a picnic all through voice command when it gets there is a major time saver with categories that are very well defined like that. And there's many consumer categories like that when you think of, you know, hardware stores and other things that are very less emotional driven and more specification driven.

[00:15:45] And those become very important when it comes to the data piece and figuring out as a retailer in those categories how to make sure that I'm going to be the one that gets chosen. In fashion, apparel, home furnishings, things that are a little more emotionally oriented and have a lot of certain, you know, a lot of style and taste to them becomes a bit different. But I think it many in many ways it comes back to the age old question of brand.

[00:16:12] Right. If you are a brand oriented company, you are going to continue to need to need to invest in brand and brand awareness and brand positioning so that the consumer actually puts you into the search criteria. Right. You want to be a favorite who's there and recommended. And if not, then it becomes much like it is today. How do I get into that consideration set? What do I need to know about those models to get there?

[00:16:38] In some ways, depending on the category, sellers who've been selling on Amazon for all these years have a bit of a leg up because they're playing the same game in the LLM models that they're playing in Amazon, leaving the ads aside for the moment. All about how do I get to the top of that list? It's all about the content and the product and the trustability and the visuals and the rest of that in the LLM context. You know, still need to understand what those are that's going to get you to the top of that recommendation list.

[00:17:04] Those who don't compete in that channel or in marketplaces, they're back to they have that dual challenge of brand awareness positioning across different sources, as well as their own content on their own website to make sure that, you know, I could be recommended. I could have the product. But if I don't have shipping policies, I'm going to get lowered in the recommendation list than someone who has very clear shipping policies that are advantageous to the consumer.

[00:17:33] The marketer in me loves to hear that brand isn't dead, loves to hear that the content is still important. And I think like we've barely scratched the surface. I think, Dan, especially with the level of your expertise and how you can optimize for a kind of agentic search. But I don't know, like, let's meet a lot of retailers where they're at because this is so new. I think step one is just making sure you show up at all.

[00:17:55] So if you're if you're an e-commerce leader, if you're in charge of merchandise, if you're in charge of digital, like what do you check tomorrow to make sure that you're showing up in the first place? You check the basics of product information and making sure that it's complete in terms of the attributes that the models will be looking for to confirm that this thing that the shopper is looking for is in fact at your store. You check the other information that rounds out that decision in terms of price inventory availability.

[00:18:24] If pickup in store delivery to me is important, you know, what are the is there story inventory that can be surfaced as well? And as a second step, start looking at the places that you're going to be cross referenced and make sure that you're showing up. Some of it is also geography based. You know, most of these models are trained in the United States. And so there's much more contents about them, about the brands in the United States if you operate there. If you don't operate there, you're at a bit of a disadvantage in some ways.

[00:18:50] Something to consider for global operators who sell into into North America. But really, it comes down to the basics of product data first and then the contents on your site and other information we don't necessarily think about as marketers that determine that decision making online price availability and other policies that influence my decision to buy from you. Man, this has been so great, Dan.

[00:19:14] I think we really need you to come back and tell us more about, you know, what this looks like in terms of converting, right? Like it's one thing to get found, right? And then we can, I think we can talk about the other end of it in terms of like, what does a good customer experience look like? What do we need on the page to make a decision? But this was so helpful. Thank you. It was my pleasure. Looking forward to coming back. Thanks for having me. See ya.

[00:19:43] Thanks for tuning in to this episode of Data vs. Commerce. New episodes drop weekly. So if you're responsible for any part of how products get from a database to a doorstep, subscribe now on Apple, Spotify, or wherever you listen.