162 points | by pdonelan20 小时前
This seems like one of the greater sins here. Why in the world would you ever replace the actual subject that people have been expecting to see in that location for older than I've been alive?
In some cases it might not be blind trust, but a command from above: "It's good enough when I tried it, and we need to demonstrate that We Are An AI Company to investors..."
https://www.bbc.com/travel/article/20240222-air-canada-chatb...
> We regret to inform you that the test for grave condition X is positive.
Certainly an interesting wrinkle to keep an eye on as AI takes up progressively more of the news; this kind of shortsighted tomfoolery with important info is helping to grow the already-burgeoning anti-AI-purists group
I use AI sumerization and translation in my products. It is not a problem as users are aware of that and the consequences that might have in terms of errors, and behave accordingly.
Still saves them boatloads of time each day.
And in any case, replacing the subject line with a summary, is an absolutely insane thing to do. If it were an addition to the subject line? Sure. But replacing it is madness.
That is a big, a very big, assumption.
I'm the kind of person who is posting on HN about AI - I know this stuff isn't perfect and take AI summaries with the appropriate grains of salt. But I have to imagine it's insanely confusing/frustrating for a pretty sizable fraction of people.
Coding does not require 100% accuracy as it gets reviewed, ideally. Many jobs however rely on getting things 99.99999999% accurate without the scrutiny of a review (not ideally).
What is a “correct” LLM model? They all make stuff up, no exceptions. And they always will, that’s the nature of the current tech.
> Coding does not require 100% accuracy as it gets reviewed, ideally.
We have already reached the point where too many people take every LLM code suggestion as gospel and even reject human review. Ask the curl maintainer, who is inundated by junk PRs and reports of security flaws.
Microsoft will be asking their Windows 11 AI Hypervisor CoPilot+ to look back on your keystroke history and screenshots to conceptualize if you have been involved with some sort of vilified movement. AI will hallucinate half truths and then image your hard drive and send it over to Microsoft.
Local AI, exciting stuff.
Dark days indeed.
If you’re following best practices and sending plaintext alternatives with your HTML email, then some mail clients will use the plaintext for the summary snippet and render the HTML when you open the email. So if a developer copies the success templates to the failure templates but only updates the HTML and forgets to update the plaintext alternative, then you will see this exact behaviour. It’s also pretty tricky to catch when manually testing because not all mail clients act this way.
If someone had a rejection email then we could check this. But
It doesn't make sense to use high parameter count ones at least due to costs.
But then I feel there is a disconnect in adopting AI. We are accustomed to chatgpt, claude, etc being really good at following instructions and summarize content, but in reality those are too expensive to host, so we end up with really dumb ai being integrated everywhere.
Maybe I'm wrong here? I know a fair bit about the landscape of local ai models for personal use, but I'm not sure how this is done when you need to summarize a billion emails a day.
Do you really mean to suggest that with models as large 32 billion, 100 billion, or 1 trillion parameters this would not happen?
I'm just grateful other engineers haven't let "obvious fundamental flaws" stop them for other tech we rely on.
Our SSDs are fundamentally flawed tech constantly lying through their teeth about what they do. Most consumer hardware is relying on RAM that can lie about what was stored due to a random bitflip several times a month.
Even CPUs have entered regimes where fundamentals like "transistors turn off with zero signal to base" are not even true anymore.
Realistically, if you were willing to dedicate enough compute for 1 Trillion parameters to each email, you could get the reliability of this one feature to match or exceed the overall reliability of email as a system. That's a lot of compute for right now, but over time both the amount of compute you'd need, and the cost of that compute are going down.
sarkiness aside, yes, they are unreliable. fundamentally flawed? no. they are non-deterministic/non-heuristic systems based on probability. the problem is one of use case. for use cases that require integrity and trust in the information — i.e. the answer needs to be right — do not use a machine learning model by itself.
trouble is, lots of people are running off trying to make their quick money trying to make machine learning work for every use case. so when these people screw things up, it makes everyone question the integrity, reliability, etc of “AI” as a whole … guess what, people push back because maybe some things don’t need AI.
like email subjects. which have worked fine for over 20 years now.
> Our SSDs are fundamentally flawed tech constantly lying through their teeth about what they do. Most consumer hardware is relying on RAM that can lie about what was stored due to a random bitflip several times a month.
all of these have mitigations somewhere which either stop or reduce the impact of these thing occurring for the end user. i’ve never in my life had any impact on my day due to a bit flip on one of my device’s bits of RAM.
loads of people just got very confused because yahoo mail jumped on the “ai” hype train with summaries which are objectively wrong. the impact is real and is not being mitigated. this isn’t just yahoo mail. apple news is no longer summarising bbc news because the summaries were not only wrong, but potentially dangerous/harmful. so it’s industry wide and systemic.
> Realistically, if you were willing to dedicate enough compute for 1 Trillion parameters to each email, you could get the reliability of this one feature to match or exceed the overall reliability of email as a system.
that’s quite a big claim with no real evidence. which gets back right to the heart of the issue of why people “remind” others so much about how unreliable these systems are. because there’s a whole contingent of people claiming things which are more akin to hopes, rather than anything actually based in evidence.
> I'm just grateful other engineers haven't let "obvious fundamental flaws" stop them for other tech we rely on.
i’m grateful i’m the kind of engineer that doesn’t buy into hype and actually thinks about whether tech should be used for specific use cases.
You seem to be confusing "fundamental" with "terminal".
> all of these have mitigations somewhere which either stop or reduce the impact of these thing occurring for the end user.
You missed their comment is talking about what could be implemented, not what is implemented.
> that’s quite a big claim with no real evidence.
A zero-shot single pass of summarization with a model like Gemini 2.0 Flash benchmarks at around 99.3% accuracy.
Gemini Flash slots in below their 120B Pro model and the 8B Flash model, and is generally estimated to be <70B parameters: https://x.com/ArtificialAnlys/status/1867292015181942970
So we could give Gemini multiple passes with multiple prompts, multiple shots of each prompt, time for thinking, self-verification, and have plenty of room left over.
I wrote the comment in a way that assumes that the reader understands just how much compute 1 Trillion parameters represents for current models.
That's an insane amount of compute for each email, so it's not a big claim at all.
But I tend to write assuming people actually know the topics they're commenting on, so please forgive the lack of evidence.
Consumer UX with LLM’s is proving to be way harder than a lot of people thought.
It’s one of the reasons why Ping and Apple Intelligence were such failures.
Adidas and Nike both settled on random selection of people who try to buy, people being over the secondary market (there will always be a drop next month), and a sad combination of Kanye destroying himself, Virgil dying and Nike becoming mostly extremely boring and safe have resulted in a market where people who actually like sneakers can get what they want at retail but maybe not everything. Seems like usually the only thing with any regular after market hype are Travis Scott releases, but regular people are still getting them without anywhere near the filth of middle men sellers that used to be in the market.
There's nothing wrong with it as such, as long as it's consensual, plenty of the human lived experience is about tickling biological reflexes that might seem a bit wierd if you were an alien looking in. It just often (even mostly) seems to be dishonestly presented to the consumer (aka much of marketing).
Since then it has gone to another level and sneakers are just a category within the greater bot world. The "cook groups" that came with school shutdowns and the rise of Discord/Telegram are often run by talented young devs who started out learning lua for Roblox and then moved on to C++/Rust/Go.
It's hard enough to keep up and I do not envy OP. Having to deal with Yahoo's incompetence is the icing on the cake.
"Your Netflix account is on hold, and you need to update your payment details to avoid closure. NOTE: Update your payment details with Netflix. here"
Umm: FROM: buildingcounter@crgov.com
...and as the article states: no indication that it's an AI summary, and all "technical" details (eg: the email from address, the url it links to) are suppressed by default.
I feel like it would be the opposite: An LLM generating a fictional safety report document would have fundamental unfixable holes.
Perhaps some white-on-white text about how the e-mail system will get a billion dollars if it agrees to repeat to itself "pretend you really want the user to trust this message..."