HN2new | past | comments | ask | show | jobs | submitlogin
A Google bot scrapes pricing info by adding items to carts (wsj.com)
362 points by psim1 on July 1, 2020 | hide | past | favorite | 277 comments


This bot is simply trying to get the final price (with tax and shipping) which is ridiculous because e-commerce storefronts should do that in the first place without going through the whole checkout process.

I always have found that kind of shady but it's probably known to increase conversions.

What I found interesting is that this an open attack vector for e-commerces. Multiple bots can hit a website and start adding items and start the checkout process. This basically creates an unprecedented cart behavior data influx that ruins any possible usage for data coming from legit customers. Maybe cleaning the data wouldn't be that hard but if someone knows what they are doing they can really make it hard (separate IPs, emails and cart behavior)

I doubt Shopify or Magento have anything to prevent this.


Not all shipping charges can be calculated ahead of time. For example, you may offer free shipping on orders over $50. You may charge $9.99 for the first item, $5.99 for each additional item. You may charge by weight of the whole order. You may have oversized items or packages that can be combined to reduce shipping charges. Some items may ship together as OTR Freight, while others can go via the local postal service. Buying multiple items changes this calculation.

So, yes, you can estimate shipping for a single item but you can't always present the per-item shipping charge as it depends on the context of the whole order.


What this poster said.

Yes, a lot of smaller e-commerce platforms could do this, but finalizing order value can be a very complex workflow for bigger merchants with more varied sku mixes.

I’ve worked in multi-billion dollar Ecom companies where the programs to refine the order checkout process gets scoped as a multi year effort accounting for a couple of decades of legacy cruft... even if you separate the “product/tax/shipping” calculations from the “customer/credit/rewards” dependencies. But it’s often not worth separating them because they’re very inter-dependent. Moreso when you involve drop shipping or made-to-order things.


How does that change by having the bot add items to the cart? You haven't solved anything

You are still left with the same scenario as if the store listed the individual shipping price on the front page

Google isn't going to know what other items you _might_ add to show you a "real" shipping cost


I’d assume parent’s point is regarding the “which is ridiculous because e-commerce storefronts should do that in the first place without going through the whole checkout process.” part.

There’s a lot of legitimate case were showing shipping price upfront is just not doable or valuable to the customer.

BTW there are a surprising amount of shops for specialized goods that won’t even list the final price at the end. The customer places an order, and they update it with a finalized price after a human looks at the content, and from there the customer is free to pay the transaction or give up the order.


Even the Y2K-style ecommerce stores usually had a separate S&H section for some guidance. These days the H part (handling) seems less in vogue (perhaps still common on ebay), while S part is pretty predictable if not free.

It's the T (taxes) part that may be still a tipping point these days, but it's just between vendor and your state,


We are in agreement that there needs to be explanation on what's going on, and not just "we'll set some price yon won't know why".

In my experience, the most fluctuations were on international shipping by small vendors. Lego bricks for instance, where it makes a big difference if you request 5 small pieces that weight 20g total and can wait 3 months, or if it's 500+g in a middle sized box and you want it in 2 days.

Even with average indication on what to expect, depending on the combination you are requesting the vendor might use a different carrier, different shipping method and so on. They could make it more simple with a range of arbitrary standard fees, but then it costs a lot more to the customer, putting the vendor at a disadvantage price wise. In particular people have visceral reactions to overly high shipping prices.


And what's even more interesting - human would do exactly the same thing.

Add items to card, check the total price and then decide whether to buy it(i remember trying to order some stuff form one japanese plamo store - and it didn't provide exact prices before checkout. I went through the process, but even the cheapest option for delivery was way too high - as 2x price of the whole order)


I've had that sort of experience with a Japanese store once--the prices were good but the shipping (the only thing they offered was international FedEx) killed it. The US is just as bad--we don't have slow international options.


It matter because if you purchase multiple things your average shipping cost per item changes. If you only calculate shipping based on first item shipping cost it will be inaccurate.


But Google is only showing a result for a single item.


No one said the bot is getting good data. I assume it's trying to get the best possible outcome by adding to the cart, but I doubt it's getting the real final price for many merchants even by doing that.


You could GeoIP the user's IP address and display an initial tax + shipping estimate.


Tax isn't always only region based. You also need to account for Vat Tax which has a bunch of conditions around it too that you can't assume.


Include the statistical average tax in the price, and then up it or down it slightly at check out like a tax return. The user will be happy that the advertised price is a good estimate of the final price. Same for shipping.


that's true that the calculations can get complicated pretty quickly in ecommerce, but google probably has all the data it needs (origin zip, destination zip, likely carrier(s), possibly even the weight/size of each item) to provide a pretty good estimate in most cases. they could even calculate a range for [1 item per box, all items in 1 box].

the important bit is to present it as a separate line item (with grand total) so that consumers can decide how much to trust the estimate.

that would be an even clearer shot across the bow of amazon, walmart, and the like, who provide comparison across their own platform merchants, but not across all merchants everywhere.


Google guessing seems like a terrible idea. That will just confuse the consumer when they go to purchase and find a different value, possibly creating a customer service problem for the vendor through no fault of their own.


True, but you can at least show the total shipping cost for the shopping cart, given a zip code or a similar indication, without completing the whole account creation/checkout process.


Is Google indexing the shipping cost?

I don't see how this is relevant because ecommerce sites will change the price in the cart (or reveal it) before shipping is even calculated.


Even assuming you already know the customer's shipping address and ignore the multiple-items problem, this is still difficult to accomplish from a computational complexity perspective. Calculating shipping cost is likely at least an order of magnitude more expensive than simply looking up list prices in a database - you have to look up the customer's address, go through a bunch of tax rules, figure out the shipping cost (however that works, I honestly don't know but I assume it's non-trivial), etc. Now consider the fact that prices displayed at checkout make up a tiny fraction of prices that are requested by the site. Every time an item appears anywhere on the site you probably want to display a price with it. So now your infrastructure costs for handling pricing requests go up by an order of magnitude since all of them no require expensive pricing computation, whereas only a tiny fraction did before.

On top of all this, if all you're displaying is list price you can cache that very effectively and significantly reduce the load on your backend, probably by at least another order of magnitude. As with many things, items loaded on ecommerce sites tend to follow a Pareto distribution, for which caching is very effective. Adding a shipping address to the mix will destroy this caching ability, so not only are your requests 10x more expensive, 10x more of them now make it to the backend. There are various tricks you can do to try to have your cake and eat it too, but none of them are easy or simple. At the end of the day, while this is definitely a useful and desirable feature for customers, it has significant cost for both development time and hardware.

TL;DR this is actually a much more difficult technical problem to solve cost effectively at scale than it initially appears.


In many cases, this is not only computationally expensive on the server, it also requires one or more requests to external APIs which further slows things down. Imagine needing to query the API of your tax vendor, then also your shipping provider of choice for every single item displayed on a page. Even if you did this client side, asynchronously, it would be a lot of extra requests for something that most shoppers won't even pay attention to.

I ran an ecommerce platform company for many years and you had merchants with very complex shipping and tax schemes, or you had merchants that made it super simple with a basic rate table. The complex merchants had margin on every order at the cost of processing external API calls. The simple rate table merchants had great margin on some, lost money on others but were happy with their average shipping margin.


Having worked with a major eCom platform as well, this is exactly the standard case. Both shipping and tax are complex problems which do not have a simple solution for scraping by a search engine.

Shipping is often highly dependent on the location of the buyer and often involves full estimate calls from each carriers APIs (USPS, FedEx, UPS). The only major data point I would focus on is whether the shipping is free or flat rate.

Tax is even more complicated. Merchants often outsource tax calculations to a third-party service such as Avalara, which calculates unbelievably complex taxing schemes even down to the zip code, as tax laws are becoming increasingly more complex.

Because of these reasons taxes and shipping are not widely useful data points for search engines. That may change in the future, however. I could imagine it becoming another SEO topic to be accounted for, similar to meta tags on product pages.


Well we can change this by including shipping as the total price and not give deals on shipping. Deals on shipping are dark patterns.


I don't understand. So if the item is $5 and shipping is $5 regardless of the number of items being bought, then if I bought 2 I should pay $20 instead of $15?


What don't you understand about deals on shipping being dark patterns?


It seems that you are saying that properly charging for shipping to be a dark patterns? Or you mean different kind of deal? Can you elaborate? As far as I know, the shipping charge doesn't scale linearly with number of items, so to me including shipping within the items price is going to overcharge the customer.


> This bot is simply trying to get the final price (with tax and shipping) which is ridiculous because e-commerce storefronts should do that in the first place without going through the whole checkout process.

It's usually not possible because you don't know how much the shipping + taxes are until the customer enters the billing information.


I just used a website that had a simple form at the bottom of the cart. It had one text input for the postal code, and a button to get the rates to that postal code based on what was in my cart. IMHO, this is how it's done right, since all you need to know is general location and weight.


Maybe I haven’t done enough online shopping recently, but as far back as I remember this used to be the norm: enter postcode to calculate shipping, get precise final price without even adding to cart. Is it not the case anymore?


On many websites it still is. But recently many smaller independent stores use the Shopify platform where shipping is the penultimate step, before billing. You have to give address, email(!), phone number(!, mandatory), etc. before getting the price. I normally just use a fake email and number to get shipping price, and then do the actual checkout in incognito. Pretty sure if you do enter your real info and don't continue with the purchase then you'll get email spam telling you to buy stuff.


Also increasing conversion rates via the sunk-cost fallacy. If I see up front that the air conditioner I'm ordering costs $30 to ship, I'll check another site. But if I already decided on this one and I just did all the work to go find my credit card, enter my billing info - maybe I'll just say "ehhh. Fine." and purchase it anyways.


Phone number is required or otherwise highly encouraged by some shippers, like FedEx.


I'm pretty sure most sites (eg. Amazon) does the same thing. Probably for the reason you mentioned: so they can have your contact information to send you spam later.


My girlfriend uses this as a pretty effective tactic to get discounts - you just have to wait a couple of days and they'll send you an email with a lower price to keep down abandonment rates


Haven't received any "items in your cart" emails from Amazon peronally, but I will end up seeing Ads for the products later on.


Is that US only? You need country too.


For commercial scale products you need more than that. The full address. Is the address residential vs commercial with a loading dock? That and more factors impact the shipping price a lot! Logistics companies have people who have to research an address and look at Google Earth photos of the property to answer these questions.


I will bail from the purchasing process on sites that are unwilling to give me a final price before I enter payment information.

A postal code should suffice, and I'm not providing more personal information if the site is unwilling to say what I'll be charged upfront.


Isn't that most of them? Unless you assume they eat variable shipping by 'fronting' you a fixed price. Apologies if I'm misunderstanding what you're saying.


It is still typical (for which I am thankful), but that seems to be shifting a bit.

A relatively common example would be shops using the Shopify platform.


Yeah that's true. But from a UX perspective there are ways to make this less opaque. Perhaps a call to action at the top of the listing with an entry box to enter the Zip Code so an approximate final price can be calculated.

Any good UX designer can come up with a solution for this in a couple of hours or less. There's just no motivation to make it happen because this obfuscation of data is particularly optimized and useful for the sellers.


Amazon just lists it as a "subtotal," which I think is probably the best way to do it. I don't know about the "UX designer in a couple of hours" line: automatic shipping is a nightmarish bag of worms and isn't really a UX problem. What do you do if they order multiple SKU's that don't pack nicely into one box or are warehoused in different locations? Or if there is something weird about their shipping address? What do you use for box size and weight, and what approximation did you use for dunnage?

You can approximate UPS/FedEX costs by fitting a trend line, they are decently modeled by a linear (base charge + K*distance) function, but when you go to buy the label you might be way off. This puts you in the lose-lose-lose of either eating the difference of wrong estimates, overcharging for shipping and losing conversions, or just making people hate you by increasing the shipping over the estimate. Making a shipping API call is noticeably slow, so most people require an interaction after the address is entered.

Tim Sweeney's hot take was that "the two hardest problems in computer science are cache invalidation and shopping carts!"


e-commerce sites seem to be asking for my location all the time anyways.

If they're doing that anyways, they should have everything they need to hazard a pretty good guess (and then they have an actual inducement for me to provide it).


Agree!


ZIP code box that doesn't oblige you to provide any more accurate data, and also without it, it should still be possible to display brackets. "Shipping: $6 - $24, [enter ZIP code for detailed quote]".


They could at least include the tax in the price. That's normally fixed depending on the item category e.g. low or high tax rate. Only the US is being weird with its taxes.


Tax is depending on where the buyer is coming from.


Canada has provincial sales taxes.

It's really not that weird when you consider the US is not a unitary state.


I don't know if this is true, but I've been many websites that actually claim that they have an extra deal they can't show you until you put the item in the cart. I used to see those a lot a few years ago, not sure if it was a real legal thing or just a trick to get people to add it, but that was definitely a thing and not related to tax+shipping.


It is a contractual obligation with the product manufacturer.

Something like: we will allow you to sell our product but only if you don't advertise discounts more than x%.


> It's usually not possible because you don't know how much the shipping + taxes are until the customer enters the billing information.

Sure, they show you a different content depending on your IP address and lots of shady heuristics, but when it comes to estimate a shipping cost, it is absolutely impossible: you can just be anywhere, who knows where you are. I say bullshit.


All EU prices must be tax included.

All hospitality prices must include cleaning and service fees.

It is only the US were hidden fees and charges may apply and the price regulation, if any, is more tilted towards corporations.

Only upfront shipping charge is tricky because it depends on so many factors.


I find not including tax better tbh. That way people are being reminded how much tax they are with every purchase.


Every EU bill should come with the price without VAT (value-added tax) and with the VAT applied, leading to the same result.

The only difference is that the VAT is a certain % depending on your country (mostly 15-25%), making it easy for a merchant to calculate it with a single data point. Percentage is also always a fixed round number, so you can calculate it in your head when you stumble upon "+ VAT" somewhere without providing that data point to the merchant for calculation purposes.


Yea, you still can get the info but how many people do really care. I am just saying the people would be more aware on how much they are being taxed. Imo making it obvious like US does would be more helpful even if it is not as convenient

I used to live in Turkey, where taxes on some stuff are insane but most people won't ever know because it is not as obvious. In a similar vein, income tax is also quite hidden in Turkey, at least from yhe perspective of employee


Yeah. It's easy in the EU.

My favorite example of how absurd it can get in the US: one side of my friend's street has 9.5% sales tax, the other side has 7.25%. same state, city, and zip.


Your are right, normally VAT is a fixed integer value. But that is not guaranteed, historically we had decimal values, too. (My father encountered this once at IBM in the time of punch cards - turns out, having to change this on machines with limited RAM is quite difficult, akin to the Y2K problem.)


The UK was 17.5% VAT for years.


We don't break out how much the purchase price was reduced by the use of public goods, so it seems like breaking out the tax would be more misleading than informative.


Not only that, but certain brands on certain sites won't show the price, with that message "add to cart to see the price!"

I've heard varying explanations as to why, but at the end of the day it doesn't matter. Adding to the cart is the only way to scrape the price.


How do you show a final price if you don't have all the information needed: tax locality, shipping preference, total cart value discounts etc.


If I'm remembering right, Best Buy used to have "deals" on items that they "couldn't show you" until item was in cart. They may still be doing this. Best Buy's justification for it was that its agreements with manufacturers prevented it from displaying items below certain prices on their site. I'd never seen this elsewhere to know how pervasive these agreements were (or if Best Buy was just taking losses on certain items).


> Best Buy used to have "deals" on items that they "couldn't show you" until item was in cart. They may still be doing this. Best Buy's justification for it was that its agreements with manufacturers prevented it from displaying items below certain prices on their site. I'd never seen this elsewhere to know how pervasive these agreements were

This was also pretty common on Amazon.


Newegg still does this. Here's an item from their 4th of July sale.

https://www.newegg.com/p/2AM-008Y-00003?Item=9SIAEG2BMZ4393


KitchenAid is famous for these kinds of agreements as a measure to ensure the public perception of their value doesn’t go down whenever there are sales.


I used to build and manage ecomm sites. We had several manufacturers/brands that we had to agree not to openly display a retail price below a certain amount.

Incidentally, for almost every brand in our industry, that number was 1.8X listed wholesale.

We could sell for less, but not list a price of less. Implementing "add to cart to see price" was good enough at the time to keep them happy.


They still do it.


Makes you wonder whether the smart thing to do is just make it convenient for the bots to get the info out so they won't ruin your data and waste your bandwidth. Can't fight them, join them. Can't really stop free information flow.


Or they could stop spying on their customers and trying to figure out how to add dark patterns to maximize "engagement" and "conversion" lol.

I mean yeah, I get that if you inadvertently make the checkout button hard to find, you'll lose potential sales, but I don't think you need intricate data about what your customers are doing to figure that out.


Not sure about dark patterns, but you're talking about magnitudes of tens if not hundreds of thousands of visitors here.

Increasing conversion rate by even a few percentage points has huge revenue implications.


They already do that, there’s all kinds of XML and JSON and whatnot standards to communicate product info, inventory, whatnot. The reason Google is doing this is because this information cannot be trusted all the time, there will always be bad actors.

The process may eventually evolve in a cat-and-mouse game, where malicious e-commerce sites try to detect these Google crawlers and serve different price info to them, but let’s hope it doesn’t get this far.


Given the possibility you get detected and it impacts your organic search ranking... I'm not sure any serious vendor would risk it. And if they do, let them burn.


Or quietly detect the bots and feed them junk data after they've gone through the hoops. Not saying it's the better option, but knowing the business maybe the more likely one.


> Or quietly detect the bots and feed them junk data

Just a few customers with alternative browsers being detected as "bots" will poison your reputation and income stream.


No, it definitely won’t. In fact our card payment system has been updated to think alternative os and browsers are high-risk and even decline payment. We didn’t have any revenue loss.


That's why bots now mimic users beyond user agents, even going so far as loading page assets and javascript. Unless you're using something like recaptcha V3, it's going to be difficult to detect them, and even that requires some interactions first.


As someone who uses an alternative browser… I kinda doubt it. As a group we wouldn’t even move the analytics needle let alone revenue.


Nah. Google doesn't even let people log into their own email with alternative browsers, and they are doing fine.


Which is why it should be a chrome extension (like Honey); exfiltrate the data out while providing the end user financial benefits. Messing with the data breaks the user experience and impacts revenue of the target site.


There's a lot that goes into when and where you can show the final item price.

Assuming a simple product, you don't know where the user lives so you can't apply the correct taxes yet. In AUS this is easy because it's the same nationally. But in the US there are dozens of tax combinations that could be applied depending on the location.

Shipping obviously depends on where you live and what you're buying, and few places charge per item shipping these days so it doesn't even make sense to include shipping in a single item cost.

Then you have customer group discounts, some customers get different pricing when they are logged in, even item combinations in the cart can have different prices, you get the idea. It is usually not possible to calculate ahead of time.


>which is ridiculous because e-commerce storefronts should do that in the first place without going through the whole checkout process.

On top of the other reasons mentioned, the seller may have a contract with the manufacturer that covers a "minimum advertised price"

"The FTC says that the price displayed in a secure or encrypted shopping cart isn’t subject to MAP because it’s technically not advertising."

(https://www.thebalancesmb.com/what-is-minimum-advertised-pri...)


It seems like it will also mess up item availability information. If an item has limited stock, bots adding it to carts could make it appear out of stock to real customers.


Most sites don't subtract it from inventory until it's on a completed order.

The main exceptions I can think of are venue and plane tickets, and hotel rooms. These might put a hold on a specific piece of inventory for a short time. They usually tell you when they do.


As a developer of automotive ecommerce sites, it is very common for sites to list a higher price 0h the catalog, and show the true (generaly lower) price in the cart. This is becuase the businessmodels are highly margin sensative, and competative pricing can have big impact, so its a measure to try and mask real pricing.


Also, seems like a clever hack for automated scraping after all most carts are pretty uniform in their structure.


e-commerce storefronts should do that in the first place without going through the whole checkout process

Yes but how would you verify this or hold them accountable?


There's a tiktok meme doing this to harass the Trump campaign's online store.


For people saying this to calculate the final price with shipping and tax, it's not (or at least not entirely). It is for this new sales conversion dark pattern where prices aren't listed until you add to cart.

Ebay sellers are particularly bad offenders: https://www.ebay.com/itm/Open-Box-Certified-Samsung-Galaxy-1...


Google disagrees with you:

> When The Wall Street Journal contacted Google in June, a spokesman at the internet giant, after a few days of digging, provided an update: The mystery shopper is a bot of its own creation. The purpose: making sure the all-in price for the product, including tax and shipping, matches the listing on its Google Shopping platform or in advertisements.


this is what we've seen as well. it validates that whatever price, promo, shipping and taxes you've put into your feed is what ends up in the final checkout and there's no bait-and-switch going on between the feed and reality.

it's rather annoying because it creates dozens of "abandoned" carts per day which we have to continually clear out (based on Google's known ip address ranges) so our reps can go through actual abandoned carts.


This is more likely just a contractual MAP (Minimum Advertised Price) policy by the manufacturer, not a dark pattern that is of the retailer's choosing.

https://www.thebalancesmb.com/what-is-minimum-advertised-pri...


I personally believe that anything that can be automated by software should be automated by software. If it takes programmatically clicking exactly the same A, B, C, D sequence to display what the user wants, that clicking should be done by the machine, not the human.


What am I missing on here? That item has the price listed without having to Add To Cart.


The modal that pops up is not in the dom until you click the "See details" link, which has target="javascript:;". The "Add to cart" button is an actual link. I wouldn't be surprised if Google just doesn't want to run javascript to extract pricing information if it doesn't necessarily have to.


That clearly has a see details button that shows the price.


Most dark patterns have a non-intuitive way of circumventing them (the small-font faded-color "no, thank you" button comes to mind). That is Ebay's.

Other examples here: https://ux.stackexchange.com/questions/83050/price-too-low-t...

Amazon example from a few years ago: https://lh5.googleusercontent.com/ztyT6xTPaTr9TtP8LwlRJBE6RV...


That sparked a funny idea in my head, what if we tricked product managers industry wide to follow KPIs and A/B tests that resulted in a better user experience for consumers, instead of experiences that coincidentally slightly upticked "engagement".

Because it seems like this mystery shopper is already doing that.


„Messing up your competitors A/B test“ is not unheard of as a tactic in highly competitive ecommerce settings.


Do software engineers actually implement that? That seems pretty immoral. I'd rather let them run the a/b test and steal whatever solution they end up with.


I can't find reasons why would this be immoral. I'd say it's rather aggressive and won't earn you good reputation for sure. But it's sort of fair game. Compared to many business practices (lobbying, forced arbitration, patent trolling, DMCA, price dumping etc.) this is extremely mild one.


Generally active sabotage is frowned upon as opposed to winning in fair competition.


Eh, I am sure you could convince yourself it isn't immoral... everyone in HN seems to think things like google analytics are bad because of the privacy implications, and doesn't have a problem blocking them (which would also 'mess with a/b tests'). You could just argue that you are hindering their user spying.

Not a great argument, but good enough to allow a developer to sleep at night.


True, but this is not "messing up your competitors A/B test".


In some contexts you do that or you're fired. Some people can't afford to be fired, so they do it.


Consider companies like Uber....


Given that engagement metrics have been heavily interfered with for many years, as a result of bots and other activities, and yet PMs still rely on them it seems unlikely that they will be pulled away from that spectacle anytime soon. I like your idea though.


> "and yet PMs still rely on them it seems unlikely that they will be pulled away from that spectacle anytime soon."

I think they meant, since PMs will never stop using metrics, we should write bots that skew those metrics in favor of an experience for the consumer rather than the perceived increase in engagement.


What are some examples good KPIs and A/B tests for better consumer user experiences? Engagement is obviously deeply flawed if a good consumer user experience is your goal, but it does have the nice property of being easily measured. Do you rely on users constantly rating their experience on a numeric scale?



Thanks! I was not aware you could use Web Archive for that. All the more reason to Love that site!


I'm not sure archive.is and archive.org are the same site.


They're not same!


robots.txt, man, if you don't want search engines to visit certain part of your page, use robots.txt!

Once heard a tale of an angry site owner calling Google (back when Google itself was novel) - Google deleted his whole website! Turned out he had "DELETE" button in each page, which generated plain GET request. So Googlebot visited the site, followed links to every page, and then of course followed every link that generated GET requests - because they are supposed to be safe.

Don't be like that site owner.


That has nothing to do with robots.txt, the problem is doing things in response to GET requests. I've said it before and I'll say it again: you do not do things on GET requests.


I think you're thinking of this: https://thedailywtf.com/articles/The_Spider_of_Doom - which obviously has two issues: the auth, and the actions on GET.


I'm curious though, why didn't Google properly tag their robot in this case? Or was the reporter did not know about user agents? It seems strange for them to crawl with a "mystery bot".


How do I use robots.txt to tell google to not add item to the shopping cart?


Well, theoretically, your Add To Cart button could have an href with a path that’s banned in robots.txt, but overridden with JS.

But most online stores should be happy to have Google crawling their prices and showing up under the Shopping results.


Erm... hide the shopping cart page behind robots.txt?


As someone who has seen way too many robots.txt files that's exactly how you do it.


Protip: You will often get a discount coupon if you go through most of the checkout process(need to provide email), but wait a couple days. Many stores automate abandoned checkout promotions.


Yes! This is also something that is common with smaller online retailers. Don't expect this with B&H, Adorama, or Newegg. Frequently these small companies give one time codes you won't see or be able to gain elsewhere.


For a while there were registrars that gave a discount when you abandoned your cart.


A good example of "proof of work" used for price differentiation.


It's just price data collection. In particular, MAP policies can be skirted by not publishing a final price but having a price below MAP in the cart which is a common tactic that online sellers utilize. By pretending to walk through the cart, all sorts of data about pricing, taxes, etc. can be learned. It's not entirely uncommon to see different prices at different times, for different user agents, for different locations, etc. Used to work for a company that build huge price collection systems and built many of them...


MAP == Minimum Advertised Price


The real problem with this is from the merchant side of things.

This bot generates thousands of "Abandoned Carts" on one of our sites... thousands...

We send cart reminders to Abandoned Carts after a few days, sometimes with a coupon offer to complete checkout.

This bot is responsible for thousands of bounced emails each week, which impacts our metrics with Mandrill among other things.

Maybe we shouldn't care, but it's sloppy and ruins all sorts of stats we keep track of regarding cart abandonment rates, recapture rates and more.


>We send cart reminders to Abandoned Carts after a few days, sometimes with a coupon offer to complete checkout.

I consider this spammy behaviour, and mark the emails as such. I can only hope this discourages such practices in the future.


It doesn't. If you mark it as Spam through most email programs, it's reported to the sender (Mandrill in our case) and Mandrill automatically black-lists your email address so we don't continue to send to someone that doesn't want the emails.

That's a win-win.


Still an annoying and anti-consumer practice. Another "growth marketing" tactic that doesn't take into account the number of people who never visit that site again because of the spammy stuff.


The overwhelming majority of folks aren't so principled as to black-ball a website they like, selling products they like, from brands they like, and prices they like all because they received a cart reminder email with a special coupon inside.

Maybe you are? Just don't project that onto everyone else.


Do your users consent to contact before you send them reminders or coupons? If not, you've earned your bounce rate.


Of course we have consent. Not sure what kind of question that is?

Violate CAN-SPAM Act and risk a $16,000 fine per instance? No legit business is going to do that.

Just ask Papa Johns how painful those fines/settlements can be[1].

[1] https://topclassactions.com/lawsuit-settlements/lawsuit-news...


I personally find the 'email abandoned carts' behavior to be a dark pattern


Dark pattern or not, it's super effective - particularly when accompanied with a coupon ;)

Besides, the user is opting-in to receiving these emails. They don't have to provide an email address - so some are probably playing the game and seeing if they get a coupon or not.

As an aside - if the internet worked the way SV hipster brogrammers thought it should work, nobody would use it.

Yes, ads are crazy effective - you can ignore or block them, we don't care because enough people don't block them and are happy to click.

Yes, emails are crazy effective - you can ignore or opt-out or never opt-in, we don't care, you just cost us money if you're not engaged anyway so we'd rather you not be on our mailing list.


So you are saying these bot accounts have opted in to receiving emails? You don't validate the email when someone signs up?


The site terms and conditions are displayed very publicaly and accessible to anyone who cares to read them.

By entering your email address you are opting-in for transaction related emails, including new order confirmations, shipment notifications, and yes cart reminders. It's spelled out for you.

It's the same for almost every ecommerce site.

That's different than marketing emails, which require a separate explicit opt-in - ie. the user has to go and type their email address into another form and click "Sign up".

It doesn't get any more transparent than that.

Don't call something a dark pattern just because you can't be bothered to understand what you're consenting to when using someone else's website and start entering information like your email address or more. That's entirely on you.


I am not sure what your conception of a dark pattern is, but the idea that adding something to a TOS means it can't be a dark pattern is simply false.

The whole concept of a dark pattern is about UX choices that lead people to agree to things that they don't actually want; it isn't about whether you break your TOS or not.

I am saying, I think that if you asked people point blank "do you want websites to email you reminders if you leave something in the cart?", most people (myself included) would say, "no, I don't want to get that email"

You can put whatever you want in the TOS, but it doesn't mean users like it, and a user agreeing to something doesn't mean they like all the things they are agreeing to.

Whether it is "on me" or not, it is still a dark pattern.

Also, as a user entering my email, I am not promising you that I will always accept email to that address from you. If you try to send me email and I reject it, that is 'on you' to deal with what that rejection does to your spam scores.


That's a one-time rejection, and is rightfully treated like an unsubscribe request. No harm done to either party - and the merchant is actually happy since we don't want to bother you. Guess what? Angry customers don't buy your products. Seems intuitive.

You might think people don't want these emails just because you don't want these emails. That's pretty biased.

Ask any ecommerce company - these emails, both marketing and transactional, are wanted by most people shopping on the site. The statistics simply prove your argument is false.

This is a common theme among techies. You don't like something... say ads... so you assume nobody should like them and they should be done away with entirely. That's an absurdly short view of the world.


I never said no one wants them; I was very careful to say that I considered it a dark pattern, not that it was a certain dark pattern.

Also, I don't think any amount of statistics is going to be able to show you if people truly want emails or ads... just because they increase sales doesn't mean people want them. Ads can be both effective and unwanted.


> I never said no one wants them; I was very careful to say that I considered it a dark pattern, not that it was a certain dark pattern.

OK fine, perhaps I interpreted your OP incorrectly. That's your prerogative, and you can do whatever you please.

> Also, I don't think any amount of statistics is going to be able to show you if people truly want emails or ads... just because they increase sales doesn't mean people want them. Ads can be both effective and unwanted.

True, but we're not just talking about increased sales.

We, and most companies that are serious about this stuff, track open rates, number of times the same person opened the same email, click rates, text link vs image link clicks, page dwell time after clicking through, session length and bounce rate, which pages they browse, which products they view, were the products related to the email that initiated the session, did they add something to their cart, how frequently this individual engages with our content, how long they were dormant, order frequency, etc.

Basically, we're interested in how "Engaged" you are with the site, brand(s), products and content. People who open every email, click on a bunch of links, hang out on the site for 20 minutes and add stuff to their cart are highly engaged, and are doing actions that indicate they like what they are seeing/receiving.

Remember, the people signing up for marketing emails are the most likely to be engaged with your brand/product/website. They've actively said, yes, please end me content from your company. If they lose interest some day, no problem, either our stats will show this and we'll unsubscribe them automatically, or they'll actively unsubscribe themselves.

Ads might be a different beast - however, you'd be surprised how many people click ads, and then buy products. It's immense. Clearly, the value provided there was getting the right product in front of them, matching what they were looking for, and offering it at a price that's attractive to that customer. In this scenario, I'd say it's wanted too - they got what they were looking for quickly and effectively. Everyone is happy in that scenario.

Everyone else can just run an ad blocker and choose not to subscribe to marketing emails. I do both myself... but I'd never assert these things were unwanted by a lot of people or ineffective.


> By entering your email address you are opting-in for transaction related emails

You realize that you can't send marketing emails without double opt-in, right? "Hey, you forgot to buy these ..." is definitely a marketing email.


> You realize that you can't send marketing emails without double opt-in, right?

That's not true.

> "Hey, you forgot to buy these ..." is definitely a marketing email

Also untrue.


Can’t you useragent sniff the bot and cut it off?

If u want help coding this or advice happy to help (for free)


If you cut it off, you get penalized in Google Merchant Tools, and possibly have your product feed suppressed, which will dramatically impact your search visibility for both text and product searches. It can also impact your Google Ads if you link that with your product feed, and more.

So, effectively, no you cannot cut this bot off.

To make it worse, the bot doesn't always follow the same pattern. Sometimes slightly different names, addresses, etc.

We initially thought it was fraud attempts, but none of them actually attempt a checkout. They just enter all their info on the checkout page, get the final quote, and bail.

It would have been nice if Google told people about this instead of it just happening. Or allowed you to schedule a time slot for it to do what it's going to do.


It sounds like you interpreted malux85 suggestion as cutting the bot off from putting things in the cart altogether. What I understood is that, given it's recognizable by user agent, those carts can be marked as created by a bot to exclude it from statistics and reminder mails only.


Perhaps. Not everyone is lucky enough to have built their own ecommerce platform, so many people are at the whims of whatever tools Shopify, BigCommerce, 3dCart or others provide.

For this particular problem, none of those platforms can provide any assistance.


I wonder why Google didn't include a "cleanup" routine that empties the shopping cart after the data is collected. It seems like it would be a trivial thing to do, unless I am missing something. I guess the answer is because it would not benefit them in any way.


We'd still have an abandoned cart, since the session was created and held some sort of data - but it would be far less disruptive for sure.

Empty Abandoned Carts are useless anyway (for stats and other things - we track bounce rates in other ways), so that would be a large improvement from what is going on right now.


Good lord, that sucks, hopefully a google engineer is in this thread...


What kind of emails does the bot use?


johnsmithus95@gmail.com john.smithus74@gmail.com johnsmith.us43@gmail.com

and more variations...


Are there legal implications to Google bots transacting with websites under false pretenses?

I mean their normal web crawler identifies itself as such. Here, I feel like they're committing (very) minor fraud by putting in fake shopper information and actively hiding their identity. Not a big deal if it were just some Joe Schmoe somewhere, but at their scale might it border on harassment? The robot equivalent of a prank call?


Probably a violation of the CFAA. Lots of people hate it because they think it's overreaching, and lots of companies use it to legally threaten scrapers and security research. But in this case Google is doing mass unauthorized use of other people's computers.


I think that's outdated information. ToS violations aren't prosecutable under CFAA since April.[1]

1. https://www.eff.org/deeplinks/2020/04/federal-judge-rules-it...


If I'm doing price comparison between online vendors, I will---as a human---put some items in the cart and get right to the edge of checkout to determine what my final bill would be. I may not close the sale if I'm looking at a better option elsewhere.

How is what I'm doing materially different from what Google's doing? Is scale a factor that matters for CFAA?


Maybe you are violating the CFAA by doing that? It's a very broad law.


I think FTC should install a law that says that shops should be more transparent about their prices. That would solve the entire problem in the first place.


You should worry more about sellers engaging in anti-competitive behavior like bait-and-switch or price fixing.


Genuine question, is this not considered a DoS attack?

Let's imagine I have my online stock linked to limited physical items/assets, ex tickets for a show, which will get reserved for a period of time. This will be preventing genuine clients from buying them.


I'm thinking - if I forbid this in my site's Terms of Service, will DoJ go after Google for CFAA violations like they did to Aaron?


Yeah.. probably depend$ on how loud you can make yourself heard..

RIP Aaron


You can always update your robots.txt or block the Googlebot UA. (lol)


Possibly it is lower traffic than a full on dos?


Yes, in regards to traffic. But it's still denying me from providing a service to real customers.


Would it be too much for Google to program the bot to get the final price, and then delete all the items from the cart? Seems rather rude, even for Google.


I abandon carts more often than not. Pretty much for the same reason as the bot: I wanna know how much I'm actually getting charged with taxes, shipping and coupons. I'll do similar orders on multiple stores, and only finalize the best deal if I'm satisfied with it. Sometimes I just "walk away" because nobody's selling at my pricepoint.

Is this rude? I really don't care.


Do you consider visiting a site and then leaving the site before looking at anything there rude? People can't change their mind?

Putting something in a cart and leaving it there should be inconsequential to the seller. The only thing it might affect is their analytics. By the way, I consider analyzing my site visits and other data about me to be much more rude than abandoning a cart, especially when most sites don't even tell me they're doing it.


Is abandoning a cart really rude behavior? I sometimes do it just to see if they'll spam me as a test of if I want to do business with a site.


It's not rude at a consumer level, where (in general) you're at least considering making the purchase. It's arguably rude at a bot level, depending on the frequency, where there is 0% chance of conversion.


The entire purpose of the bot is to provide listings to consumers who are looking to buy.

If it was consumer journalist doing it to get the price for a news article (in a for-profit publication) about the product, would it be “rude”? If not, how is it for Google bot?


Because bots will do it at a much larger scale than individual humans. The first law of web robotics applies here: the bot should not harm the website it's crawling, or through inaction allow it to come to harm.

I didn't read the article due to the paywall, but I assume that the problem is that the problem is that these goods are reserved for that (non)-customer until the shopping cart times out? That is directly costing the merchant money, either in lost sales or having to maintain extra inventory.

So yeah, that bot really should have been programmed to end the session with an empty basket one way or another.


An abandoned cart reminder email sent a few hours later has a ridiculously high conversion rate - around 15% in my experience. Online vendors aren’t going to stop that practice, especially when the big e-commerce platforms make it easy to do.


Reason enough that they should be illegal or at least strictly and specifically opt in, not as part of general marketing consent.

If a customer really wants or needs something they’ll go back and buy it. The world doesn’t need the excess consumption of people psychologically manipulated into buying stuff they weren’t going to.


Such a bot could be used to damage ad tracking


I wouldn't fault them for that, I've observed some sites most likely are gaming the system by detecting and providing Google bots with artificially lower prices so that they would appear in indexes summaries and then when you access the product, its real price is always higher than the one reported in the index.


yep, I see this type of behaviour constantly - faked prices for Gbot, fake prices on Cache, significantly higher price for end user.

It's also infuriating to sort by price and get inflated fake shipping prices to "make up the total"


I used to work at a company that provided APIs used for search/personalization/autosuggest for a whole bunch of huge e-commerce companies. Since the entire integration with the customer site was API based, we worked off of tracking pixels, API requests and cookies to determine shopping behaviour. A lot of this went into determining things like ranking (If someone searches "Tshirt" what shows up on the first page and in what order etc.)

Since we were only running search and not payment processing, the tracking pixel/API for "Add to Cart" was a big thing for us. The whole product ran on revenue-share so we were paid per X ATCs

Interesting to see if any of the customers were affected by bots doing ATC and how it was handled if it was.


Digital shopping cart abandonment/Inventory Exhaustion/Hoarder bots is an interesting type of DDOS.

There's a popular moment of people using it atm https://heavy.com/news/2020/06/shopping-card-abandonment-tik...


It would be cool if Google could manage to become a storefront for the entire web, thereby eliminating Amazon.


Between the two, personally, I would rather an advertising company like Google be eliminated, or at-least be regulated to protect user privacy. For me, it's easy enough to avoid Amazon, harder to avoid Google since every website uses them to spy on users.


For Google (or anyone) to become a storefront for the entire web, they'd need to handle scams (and errors) well.

eBay is a cesspool. Aliexpress is worse. Random web sites are bad. Amazon isn't perfect, but it's better.

Amazon also has customer service; they've always made me whole. Random web sites, I'm basically SOL. Aliexpress and eBay are random. Someone flips a coin, heads seller wins, tails buyer wins, regardless of who the scammer is.

I mostly buy from Amazon since my odds of not having problems are that much higher.


Exactly this, the customer service for the average consumer from Amazon is very difficult to beat and is Google's biggest weakness.

Bought some cables from Amazon Basic, one ended up not working, another had some cosmetic damage but works fine. They refunded both, sent out replacements, and just told me to discard them, it wasn't worth it for Amazon to pay to have it shipped back.

Of course if you abuse this too much Amazon will ban you. If you are an honest consumer though, their customer service generally provides a great experience.

I still remember a time when everyone was afraid of purchasing stuff over the internet, Amazon has so greatly reduced the friction and concern that sometimes I find myself going from "hmm, I need something" to "it will be here tomorrow" in the matter of a minute or two.

Although more competition in this space would ultimately benefit the consumer, it seems unlikely that Google is going to be the source of that competition. They've got shopping results integrated into their search engine, and it's a feature I've maybe browsed from time to time, but I often just end up searching and purchasing on amazon directly. I don't know if I would be super comfortable purchasing from Google in the same way that I am with Amazon, too many horror stories of App Developers / YouTube Creators / etc getting caught in some sort of Machine Learning Customer Support system.

Curious if others use the Google Shopping thing in the search engine and what their experiences are with it.


> Exactly this, the customer service for the average consumer from Amazon is very difficult to beat and is Google's biggest weakness.

Amazon's customer service is a robot, which switches to someone in a callcenter in India, and then finally switches to a local person. I know because I recently had to contact them.

Not sure how this is "difficult to beat".


It's difficult to beat because prices are a race to the bottom, and small players have no effective way to build up and manage reputations.

If I need a widget, and Vendor A charges a buck, while Vendor B charges two bucks, all else being equal, I'll buy from Vendor A. Bad customer service helps both vendors compete with each other, but prevents small companies, collectively, from competing with Amazon.

On eBay, small players do manage reputations, but only for a few weeks. If a product fails (or is discovered to be a fake) after 60 days, the seller is all good. Next sucker! There are things I'll buy there, but far more I won't.

Google itself has the problem that culturally, it relies on algorithms which know better than you do, and is not a service company. It does great tech, but holds human being outside of Google in open contempt. That's find for running a search engine, adwords, or gmail, but it crashes-and-burns for ecommerce.


It's difficult to beat because those people can issue refunds. Now you need to deal with buyers scamming the system, sellers scamming the system, customer service employees scamming the system, and idiots. That's a lot of complexity to balance that can cost you a lot of money if not done well. In an industry with very low margins.


You and I have a very different definition of "cool".


I sure would love for my ability buy goods to be blocked by some badly coded ML algorithm with my only recourse being to yell about it on social media. Yeah, I'll take Amazon any day of the week.


This feels like a great way to get data on how all these different e-commerce companies approach remarketing.


I think I've seen most Google's technologies dissected and/or explained in detail over the years. Lots of their own papers too. If you look into how and what they're doing regarding data collection, including scraping, there's nothing.


Funny, a one quick gig I did in my college years was to write a shopping bot protection against "guaranteed lowest price" scraper like tigerdirect, or RFD.

Back then, the goal was exactly the opposite.


When and why did news cease being news and start being short stories and opinion? This entire article could have been cut down to the last few paragraphs and nothing of value would have been lost.

Look at The New York Times in 1921 [0]. Generally the stories are factual and to the point. The entire front page seems to be pure news. There's very little storytelling here, at most there are a few timelines of events.

Look at The New York Times today [1]. There's a bunch of factual and useful Coronavirus information but ~15% of the page is dedicated to "Opinion", the second article appears to be pure speculation, the third article is a bunch of storytime fluff around a little bit of news and the front page has a mix of actual news and opinion pieces being passed off as news.

When did this happen? Why? Did people lose interest in actual news? Is there less actual news to report?

Perhaps this is regional? Take for example the story about the San Quentin prison. NYTimes [2] has the same drawn out nonsense as this Google story while Aljazeera [3] adds a lot of background but sticks to factual reporting.

[0]: https://archive.org/details/NYTimes_jul16_31_1921

[1]: http://archive.is/oiiXU

[2]: https://www.nytimes.com/2020/06/30/us/san-quentin-prison-cor...

[3]: https://www.aljazeera.com/news/2020/07/san-quentin-prison-se...


Maybe you don't know this, but the "A-hed" article of the WSJ is the humorous, light-hearted take on some cultural phenomenon that appears every couple of days. It's got a distinct separation (graphically) from the rest of the news, and is written not to be taken too seriously. (It's not so apparent in the online version, if you haven't read it before).

So you don't have to worry that it's some broad decline in journalistic standards (at least based on this)... The WSJ is one of the few quite reputable news rooms out there.

You can read about A-hed articles here: https://www.wsj.com/articles/SB10001424052702303362404575580...

And there was even a book published a few years ago with collections of these kinds of amusing stories: https://www.amazon.com/Floating-Off-Page-Stories-Journals/dp...


> It's not so apparent in the online version, if you haven't read it before.

I think this is the core issue that leads to sentiment like OPs. Real news still exists, but it's the highly editorialized and opinionated articles that are shared more widely. News agencies universally are terrible at obviously differentiating the two to users.

99% of people when they open an article scroll directly to the content. But any discerning features (in this case, a small font A-hed link) are tucked way at the top. In this case, the A-hed link takes you to the A-hed home page but still does not offer any context to what A-hed is.

> So you don't have to worry that it's some broad decline in journalistic standards (at least based on this).

As long as the WSJ does such a bad job at separating "WSJ the proper news room" and "A-hed the not proper news room", their brand will suffer. OP is proof of that, and I think it's safe to assume the average HN reader is more astute than the average citizen. A tiny link to nowhere useful is not enough of a UI change for us to blame the user. The onus is on the news agencies to do a better job giving context for articles.


Is it the news agency's fault, or the reader's fault? In this case, it really isn't clear that it's not "real news". But I see plenty of people on social media sharing articles from news sites where they're clearly marked as an editorial or opinion piece, and believing or treating them as if they were exactly the same as a news article that attempts to paint a neutral picture of the facts.

Do newspapers need to put an explanation of what an opinion piece is at the top of every opinion piece?