Who needs a sneaker bot when AI can hallucinate a win for you?

(eql.com)

162 points | by pdonelan20 小时前

13 comments

pavel_lishin19 小时前
> Worse, they were putting an untrustworthy AI summary in the exact place that users expect to see an email subject, with no mention of it being AI-generated
This seems like one of the greater sins here. Why in the world would you ever replace the actual subject that people have been expecting to see in that location for older than I've been alive?
- Terr_18 小时前
  I am continually surprised and depressed by how much blind trust is being placed in the accuracy and security of these LLM-mad-libs. I worry they'll do a lot of damage before attitudes catch up.
  In some cases it might not be blind trust, but a command from above: "It's good enough when I tried it, and we need to demonstrate that We Are An AI Company to investors..."
  Hamuko14 小时前
  My favourite is when companies replace their customer support with LLMs and then act like the LLMs are not their representatives and how they can't be held responsible for the shit they say.
  https://www.bbc.com/travel/article/20240222-air-canada-chatb...
  layer87 小时前
  Maybe they replaced their legal department with LLMs as well.
- lloeki13 小时前
  > Subject: Your medical results are in! Test is negative
  > We regret to inform you that the test for grave condition X is positive.
- bbor19 小时前
  I think that’s the one and only sin, and it’s a serious one! Just mind boggling stuff. This is like the product launch to summarize headlines: why???
  Certainly an interesting wrinkle to keep an eye on as AI takes up progressively more of the news; this kind of shortsighted tomfoolery with important info is helping to grow the already-burgeoning anti-AI-purists group
- PeterStuer15 小时前
  "with no mention of it being AI-generated" is the problem.
  I use AI sumerization and translation in my products. It is not a problem as users are aware of that and the consequences that might have in terms of errors, and behave accordingly.
  Still saves them boatloads of time each day.
  pavel_lishin7 小时前
  But the folks affected here weren't aware of it.
  And in any case, replacing the subject line with a summary, is an absolutely insane thing to do. If it were an addition to the subject line? Sure. But replacing it is madness.
  consp15 小时前
  > and behave accordingly
  That is a big, a very big, assumption.
  PeterStuer15 小时前
  In my case it works as the users are experts in their fields and thus (a) can spot errors quickly and (b) use the system as an information shortcut but never act without extensive verification with the full source.
  eqvinox14 小时前
  If they can spot errors quickly (which implies they're reading the original anyway, since how do you spot errors otherwise), what is the benefit of the summaries to begin with?
extr19 小时前
Same thing happens with Apple Intelligence. You might join a waitlist for dinner reservations and get a text that says you'll be notified when your table is available. And then the summary will say something like "Your table is available"!
I'm the kind of person who is posting on HN about AI - I know this stuff isn't perfect and take AI summaries with the appropriate grains of salt. But I have to imagine it's insanely confusing/frustrating for a pretty sizable fraction of people.
- righthand19 小时前
  Self-inflicted stigma, the people hyping incorrect Llm models as viable daily drivers are no better than the people sending Nigeria Prince scam emails. People don’t want wonky incorrect products that look “futuristic”. They want wonky correct products that make them seem like a futurist.
  Coding does not require 100% accuracy as it gets reviewed, ideally. Many jobs however rely on getting things 99.99999999% accurate without the scrutiny of a review (not ideally).
  latexr11 小时前
  > incorrect Llm models
  What is a “correct” LLM model? They all make stuff up, no exceptions. And they always will, that’s the nature of the current tech.
  > Coding does not require 100% accuracy as it gets reviewed, ideally.
  We have already reached the point where too many people take every LLM code suggestion as gospel and even reject human review. Ask the curl maintainer, who is inundated by junk PRs and reports of security flaws.
- therein17 小时前
  I do wonder how these misunderstandings will go when they inevitably use the "on device AI" to spy and rattle on you.
  Microsoft will be asking their Windows 11 AI Hypervisor CoPilot+ to look back on your keystroke history and screenshots to conceptualize if you have been involved with some sort of vilified movement. AI will hallucinate half truths and then image your hard drive and send it over to Microsoft.
  Local AI, exciting stuff.
djohnston18 小时前
> As an on-call engineer, this is the point when you start questioning your life choices. You know that the issue is affecting thousands of users, but the offending phrase doesn’t appear anywhere in EQL’s codebase, aside from some very old launches several years ago.
Dark days indeed.
JimDabell19 小时前
Aside from the AI angle, there’s actually another way this weird bug can manifest.
If you’re following best practices and sending plaintext alternatives with your HTML email, then some mail clients will use the plaintext for the summary snippet and render the HTML when you open the email. So if a developer copies the success templates to the failure templates but only updates the HTML and forgets to update the plaintext alternative, then you will see this exact behaviour. It’s also pretty tricky to catch when manually testing because not all mail clients act this way.
- anthuswilliams16 小时前
  This seems like the simplest explanation. Why are we all brigading about AI hallucinations?
  maeil16 小时前
  Huh? Did you read the article?
  rblatz6 小时前
  The article doesn’t rule this out. Most of these emails are templated out in some 3rd party email service. It is extremely plausible that the author is unaware of the text email content.
  If someone had a rejection email then we could check this. But
  anthuswilliams6 小时前
  Yes, I did. My point is that the author might be jumping to conclusions. It is far more likely that they introduced a bug in their content than it is that a bunch of email providers who haven't changed in a decade suddenly released the same buggy AI product without fanfare.
  maeil5 小时前
  The article says it only happens with Yahoo mail.
  anthuswilliams4 小时前
  I see, thank you. I missed that they were the only users affected. I misread it as saying Yahoo was an emblematic example.
  rsynnott11 小时前
  Reading the article is most improper on this here orange website. You’re supposed to read the headline, and imagine what the content of the article might be.
  fwipsy8 小时前
  Read the headline, hallucinate an article to match.
CapsAdmin19 小时前
I don't know how these things are deployed, but I imagine they are using sub billion parameter count models?
It doesn't make sense to use high parameter count ones at least due to costs.
But then I feel there is a disconnect in adopting AI. We are accustomed to chatgpt, claude, etc being really good at following instructions and summarize content, but in reality those are too expensive to host, so we end up with really dumb ai being integrated everywhere.
Maybe I'm wrong here? I know a fair bit about the landscape of local ai models for personal use, but I'm not sure how this is done when you need to summarize a billion emails a day.
- cratermoon18 小时前
  > billion parameter count models
  Do you really mean to suggest that with models as large 32 billion, 100 billion, or 1 trillion parameters this would not happen?
  CapsAdmin15 小时前
  There's no 100% with ai, but yes, I'm implying that a mistake like this is not likely to happen with high parameter count models, and very likely to happen with low parameter count models.
  BoorishBears15 小时前
  This is bait to "remind" anyone who replies that LLMs are fundamentally unreliable technology and it doesn't matter how many floating point numbers are in the magic black box, you can't trust it because it's fundamentally flawed.
  I'm just grateful other engineers haven't let "obvious fundamental flaws" stop them for other tech we rely on.
  Our SSDs are fundamentally flawed tech constantly lying through their teeth about what they do. Most consumer hardware is relying on RAM that can lie about what was stored due to a random bitflip several times a month.
  Even CPUs have entered regimes where fundamentals like "transistors turn off with zero signal to base" are not even true anymore.
  Realistically, if you were willing to dedicate enough compute for 1 Trillion parameters to each email, you could get the reliability of this one feature to match or exceed the overall reliability of email as a system. That's a lot of compute for right now, but over time both the amount of compute you'd need, and the cost of that compute are going down.
  dijksterhuis13 小时前
  > This is bait to "remind" anyone who replies that LLMs are fundamentally unreliable technology and it doesn't matter how many floating point numbers are in the magic black box, you can't trust it because it's fundamentally flawed
  sarkiness aside, yes, they are unreliable. fundamentally flawed? no. they are non-deterministic/non-heuristic systems based on probability. the problem is one of use case. for use cases that require integrity and trust in the information — i.e. the answer needs to be right — do not use a machine learning model by itself.
  trouble is, lots of people are running off trying to make their quick money trying to make machine learning work for every use case. so when these people screw things up, it makes everyone question the integrity, reliability, etc of “AI” as a whole … guess what, people push back because maybe some things don’t need AI.
  like email subjects. which have worked fine for over 20 years now.
  > Our SSDs are fundamentally flawed tech constantly lying through their teeth about what they do. Most consumer hardware is relying on RAM that can lie about what was stored due to a random bitflip several times a month.
  all of these have mitigations somewhere which either stop or reduce the impact of these thing occurring for the end user. i’ve never in my life had any impact on my day due to a bit flip on one of my device’s bits of RAM.
  loads of people just got very confused because yahoo mail jumped on the “ai” hype train with summaries which are objectively wrong. the impact is real and is not being mitigated. this isn’t just yahoo mail. apple news is no longer summarising bbc news because the summaries were not only wrong, but potentially dangerous/harmful. so it’s industry wide and systemic.
  > Realistically, if you were willing to dedicate enough compute for 1 Trillion parameters to each email, you could get the reliability of this one feature to match or exceed the overall reliability of email as a system.
  that’s quite a big claim with no real evidence. which gets back right to the heart of the issue of why people “remind” others so much about how unreliable these systems are. because there’s a whole contingent of people claiming things which are more akin to hopes, rather than anything actually based in evidence.
  > I'm just grateful other engineers haven't let "obvious fundamental flaws" stop them for other tech we rely on.
  i’m grateful i’m the kind of engineer that doesn’t buy into hype and actually thinks about whether tech should be used for specific use cases.
  BoorishBears7 小时前
  "they are non-deterministic/non-heuristic systems based on probability" is a fundamental flaw for a deterministic task.
  You seem to be confusing "fundamental" with "terminal".
  > all of these have mitigations somewhere which either stop or reduce the impact of these thing occurring for the end user.
  You missed their comment is talking about what could be implemented, not what is implemented.
  > that’s quite a big claim with no real evidence.
  A zero-shot single pass of summarization with a model like Gemini 2.0 Flash benchmarks at around 99.3% accuracy.
  Gemini Flash slots in below their 120B Pro model and the 8B Flash model, and is generally estimated to be <70B parameters: https://x.com/ArtificialAnlys/status/1867292015181942970
  So we could give Gemini multiple passes with multiple prompts, multiple shots of each prompt, time for thinking, self-verification, and have plenty of room left over.
  I wrote the comment in a way that assumes that the reader understands just how much compute 1 Trillion parameters represents for current models.
  That's an insane amount of compute for each email, so it's not a big claim at all.
  But I tend to write assuming people actually know the topics they're commenting on, so please forgive the lack of evidence.
iambateman18 小时前
This is because a VP knows that if they roll the feature back, they have to admit to everyone that they were overzealous and take the PR hit. Besides, it works great 99% of the time and most people don’t really care.
Consumer UX with LLM’s is proving to be way harder than a lot of people thought.
- muglug18 小时前
  This is a lesson from the past 20 years of tech products: execs used to releasing single-player features really suck at anticipating the knock-on effects of social and/or machine-learning-based features.
  It’s one of the reasons why Ping and Apple Intelligence were such failures.
Xenoamorphous17 小时前
Tangential but I’m really over this artificial scarcity thing. I’d be happy to give my money to Nike/Jordan brand for a pair of those sneakers, but they decide to make tiny quantities to keep the hype up. I guess it works well for them but it sucks for customers, just like too many things these days.
- Larrikin15 小时前
  I think it has settled into a decent place.
  Adidas and Nike both settled on random selection of people who try to buy, people being over the secondary market (there will always be a drop next month), and a sad combination of Kanye destroying himself, Virgil dying and Nike becoming mostly extremely boring and safe have resulted in a market where people who actually like sneakers can get what they want at retail but maybe not everything. Seems like usually the only thing with any regular after market hype are Travis Scott releases, but regular people are still getting them without anywhere near the filth of middle men sellers that used to be in the market.
- grues-dinner17 小时前
  The scarcity is the product. It's basically a nifty hack that activates a "holy cave tigers, a resource shortage, I need to get this" reflex in human brains. Acquiring the rare product is a rush and that and admission to/status in an group is most of what the money gets you. A few dollars' worth of fabric and plastic and maybe leather is thrown in as a token. If they made enough of them, they'd just be foot-coverings and no one would care much.
  There's nothing wrong with it as such, as long as it's consensual, plenty of the human lived experience is about tickling biological reflexes that might seem a bit wierd if you were an alien looking in. It just often (even mostly) seems to be dishonestly presented to the consumer (aka much of marketing).
  12 小时前
  undefined
canistel19 小时前
Ah, one more reason to access e-mail through Thunderbird then...
- Terr_19 小时前
  I remember thinking The Internet was going to empower individuals with control over their own digital destinies, and instead we're hemmed-in by quasi-monopolistic tools made for the convenience of the companies selling us ads, walled-gardens, and contracts of adhesion.
  cratermoon18 小时前
  s/quasi-//
- lloeki13 小时前
  Wait til the IMAP data returned is automatically LLMified "for you" silently to "increase AI compatibility and reach" by "leveraging a standards based approach"
internet10101016 小时前
In the mid-2010s Crawlera was being used to send all requests through a new IP, people were buying gift cards to avoid a Cybersource check, and Shopify's api was being abused. Just to buy sneakers.
Since then it has gone to another level and sneakers are just a category within the greater bot world. The "cook groups" that came with school shutdowns and the rise of Discord/Telegram are often run by talented young devs who started out learning lua for Roblox and then moved on to C++/Rust/Go.
It's hard enough to keep up and I do not envy OP. Having to deal with Yahoo's incompetence is the icing on the cake.
ramses019 小时前
The worst is they're effectively legitimizing spam/phishing. The simplest one is:
"Your Netflix account is on hold, and you need to update your payment details to avoid closure. NOTE: Update your payment details with Netflix. here"
Umm: FROM: buildingcounter@crgov.com
...and as the article states: no indication that it's an AI summary, and all "technical" details (eg: the email from address, the url it links to) are suppressed by default.
- evanmoran18 小时前
  The amazing part is AI telling you things look like spam/phishing would probably work quite well and be a great feature.
  Terr_16 小时前
  > AI telling you things look like spam/phishing would probably work quite well
  I feel like it would be the opposite: An LLM generating a fictional safety report document would have fundamental unfixable holes.
  Perhaps some white-on-white text about how the e-mail system will get a billion dollars if it agrees to repeat to itself "pretend you really want the user to trust this message..."
  eloisius18 小时前
  You mean like a spam filter?
rsynnott12 小时前
Sometimes I feel like I’m going crazy with this stuff. Like, what were they thinking? How did they ever imagine, even for a moment, that this would be okay? Or, even passing that hurdle, that it would be something that anyone would want?
- layer87 小时前
  “If Apple gets away with it, why shouldn’t it be okay?”
samstave19 小时前
[dead]
everyone18 小时前
That article was confusing af for me.. It's about "sneakers" meaning the type of shoe!???
- op00to18 小时前
  Yes. Popular athletic shoes are a major target for scammers and scalpers. I am working on real time detection of these bots for retailers.
  whatevermom17 小时前
  … and I used to work for a company that built software to cop some shoes. The real time detection is not as much a problem as the toxic boss running the show.