The future promised in Star Trek and even Apple's Knowledge Navigator [2] from 1987 still feels distant. In those visions, users simply asked questions and received reliable answers - nobody had to fact-check the answers ever.
Combining two broken systems - compromised search engines and unreliable LLMs - seems unlikely to yield that vision. Legacy, ad-based search, has devolved into a wasteland of misaligned incentives, conflict of interest and prolifirated the web full of content farms optimized for ads and algos instead of humans.
Path forward requires solving the core challenge: actually surfacing the content people want to see, not what intermiediaries want them to see - which means a different business model in seach, where there are no intermediaries. I do not see a way around this. Advancing models without advancing search is like having a michelin star chef work with spoiled ingredients.
I am cautiously optimistic we will eventually get there, but boy, we will need a fundamentally different setup in terms of incentives involved in information consumption, both in tech and society.
With a search paradigm this wasn't an issue as much, because the answers were presented as "here's a bunch of websites that appear to deal with the question you asked". It was then up to the reader to decide which of those sites they wanted to visit, and therefore which viewpoints they got to see.
With an LLM answering the question, this is critical.
To paraphrase a recent conversation I had with a friend: "in the USA, can illegal immigrants vote?" has a single truthful answer ("no" obviously). But there are many places around the web saying other things (which is why my friend was confused). An LLM trawling the web could very conceivably come up with a non-truthful answer.
This is possibly a bad example, because the truth is very clearly written down by the government, based on exact laws. It just happened to be a recent example that I encountered of how the internet leads people astray.
A better example might be "is dietary saturated fat a major factor for heart disease in Western countries?". The current government publications (which answer "yes") for this are probably wrong based on recent research. The government cannot be relied upon as a source of truth for this.
And, generally, allowing the government to decide what is true is probably a path we (as a civilisation) do not want to take. We're seeing how that pans out in Australia and it's not good.
But for anything remotely subjective, context dependent, or time sensitive I need to know the source. And this isn’t just for hot button political stuff — there’s all sorts of questions like “is the weather good this weekend?”, “Are puffy jackets cool again?”, “How much should I spend on a vacation?”
Er, no, the meaning of the question is ambiguous, so I'm not sure "has a single truthful answer" is accurate. What does "can" mean? If you mean "permitted", then no. But if you mean can they vote anyway and get away with it? It's clearly happened before (as rare as it might have been), so technically the answer to that would be be yes.
Equally "can" is used to substitute for other precise words. Humans are good at inferring context, and if someone asked me "can illegals vote" I'd say "no". Just like if someone said "can you pass the salt" I pass the salt, I don't say "yes".
If the inferred context US wrong then the "truth" is wrong, but as with talking to humans it's possible to refine context with a follow up question.
It literally did not even (initially) occur to me that the question might be asking about legality, because the entire modern political discourse surrounding illegal immigrants and voting has been with regards to whether they can cast votes despite not legally being allowed to. The answer to "is this legal" would have been such an obvious "no" to people on both sides of the debate --- and thus the question so silly --- that initially it didn't occur to me that the intended question might have been about legality, until I continued reading the comment and realized that was the intention after all.
How are we seeing how that pans out when Australia's misinformation bill is still just a proposal?
Google could have done it and kind of tried, although they're AI sucks too much. I'm very surprised that OpenAI hasn't done this sooner as well. They're initial implementation of web search was sad. I don't mean to be super critical as I think generally OpenAI is very, very good at what they do, but they're initial browse the web was a giant hack that I would expect from an intern who isn't being given good guidance by their mentors.
Once mainstream engines start getting on par with Kagi, there's gonna be a massive wave of destruction and opportunity. I'm guessing there will be a lot of new pay walls popping up, and lots of access deals with the search engines. This will even further raise the barrier of entry for new search entrants, and will further fragment information access between the haves and have-nots.
I'm also cautiously optimistic though. We'll get there, but it's gonna be a bit shakey for a minute or two.
But I don't understand how all of these AI results (note I haven't used Kagi so I don't know if it's different) don't fundamentally and irretrievably break the economics of the web. The "old deal" if you will is that many publishers would put stuff out on the web for free, but then with the hope that they could monetize it (somehow, even just with something like AdSense ads) on the backend. This "deal" was already getting a lot worse over the past years as Google had done more and more to keep people from ever needing to click through in the first place. Sure, these AI results have citation results, but the click-through rates are probably abysmal.
Why would anyone ever publish stuff on the web for free unless it was just a hobby? There are a lot of high quality sites that need some return (quality creators need to eat) to be feasible, and those have to start going away. I mean, personally, for recipes I always start with ChatGPT now (I get just the recipe instead of "the history of the domestication of the tomato" that Google essentially forced on recipe sites for SEO competitive reasons), but why would any site now ever want to publish (or create) new high quality recipes?
Can someone please explain how the open web, at least the part of the web the requires some sort of viable funding model for creators, can survive this?
That's exactly what the old deal was, and it's what made the old web so good. If every paid or ad-funded site died tomorrow, the web would be pretty much healed.
Yes a few sites take this too far and ruin search results for everyone. But taking the possibility away would also cut the produced content by a lot.
Youtube for example had some good content before monetization, but there is a lot of great documentary like channels now that simply wouldn't be possible without ads. There is also clickbait trash yes, but I rather have both than neither.
So who pays for all of this?
The web needs to be monetized, just not via advertising. Maybe it's microtransactions, maybe subscriptions, maybe something else, but this idea of "we get everything we want for free and nobody tries to use it for their own agenda" will never return. That only exists for hobby technologies. Once they are mainstream they get incorporated into the mainstream economic model. Our mainstream model is capitalism, so it will be ever present in any form of the internet.
The main question is how people/resources can be paid for while maintaining healthy incentives.
The web needs patrons, contributions, and cost allocation, not necessarily monetization and shareholder capitalism where there is a never ending shuffle of IP and org ownership to maximize returns (unnecessarily imho). How many times was Reddit flipped until its current CEO juiced it for IPO and profitability? Now it is a curated forum for ML training.
I (as well as many other consumers of this content) donate to APM Marketplace [1] because we can afford it and want it to continue. This is, in fits and starts, the way imho. We piece together the means to deliver disenshittification (aggregating small donations, large donations, grants, etc).
(Tangentially, APM Marketplace has recently covered food stores [2] and childcare centers [3] that have incorporated as non profits because a for profit model simply will not succeed; food for thought at a meta level as we discuss economic sustainability and how to deliver outcomes in non conventional ways)
[1] https://www.marketplace.org/
[2] https://www.marketplace.org/2024/10/24/colorados-oldest-busi...
[3] https://www.marketplace.org/2024/08/22/daycare-rural-areas-c...
But, like on OTA TV, you can get all the shopping channels you want.
....is that a problem? most of what we actually like is the stuff that's made 'for fun', and even if not, killing off some good stuff while killing off nearly all the bad stuff is a pretty good deal imo.
Why would anyone, especially a passionate hobbyist, make a website knowing it will never be seen, and only be used as a source for some company's profit?
I think the best bloggers write because they need to express themselves, not because they need an audience. They always seem surprised to discover that they have an audience.
There is absolutely a set of people who write in order to be read by a large audience, but I'm not sure they're the critical people. If we lost all of them because they couldn't attract an audience, I don't think we'd lose too much.
LLMs do away with that. 95% of folks aren't going to feel great if all of the time spent producing content is then just "put into the blender to be churned out" by an LLM with no traffic back to the original site.
Ad-driven social networks will continue to exist as well.
The age of the ad-driven blog website is probably at an end. But there will be countless people posting stuff online for free anyway.
That may help with SEO, but another reason is copyright law.
Recipes can't be copyrighted, but stories can. Here is how ChatGPT explained it to me:
> Recipes themselves, particularly the list of ingredients and steps, generally can't be copyrighted because they're considered functional instructions. However, the unique way a recipe is presented—such as personal stories, anecdotes, or detailed explanations—can be copyrighted. By adding this extra content, bloggers and recipe creators can make their work distinctive and protectable under copyright law, which also encourages people to stay on their page longer (a bonus for ad revenue).
> In many cases, though, bloggers also do this to build a connection with readers, share cooking tips, or explain why a recipe is special to them. So while copyright plays a role, storytelling has other motivations, too.
Why indeed, person who posted for free* on the Internet?
As a side note, consider that adds can be woven into and boosted in LLM results just as easily as in index lookups.
* assuming that you're not shilling here by presenting the frame that the new shiny is magically immune to revenue pressures
Then this whole category is not known for "high quality recipes", so the general state wouldn't change much?
The funding model for the open web will be for the open web content to be the top of the funnel for curated content and/or walled gardens.
I think many business models already treated the web this way. Specifically, get people away from the 800-pound gorilla rent-seekers like Google and Amazon, and get them into your own ecosystem.
- Your local grammar pedant
The chatgpt approach to search just feels forced and not as intuitive.
So that is a solid advantage that Google is going to have, but the maps business alone wouldn't be able to keep it in the S&P list for long.
Local results means that if I search for "driving laws", Google gives me .gov sites for my state as the top results, while Kagi's first page gives me results for 8 other states (including Alaska!) but not for my state.
There are a lot of kinds of queries that benefit from knowing the user's location even though they aren't actually looking for a place that exists on a map.
(I'm a happy paying Kagi user, but OP is right that this is its weakest point by far.)
It's not terrible—as I said, I'm a happy customer—but it's not a habit I have and it feels like something that should be configurable once in a settings menu. I don't even really want to have it detect my location live, I just want to be able to tell it where I live and have it prioritize content that's local when given the chance.
* They're not ad funded. Sergey Brin and Larry Page called this out in 1998 and it is just as true as ever: you need the economics to align. Kagi wins if people keep paying for it. Google wins if you click on Search ads or if you visit a page filled with their non-Search ads.
* Partially because of the economic alignment, Kagi has robust features for customizing your search results. The classic example is that you can block Pinterest, but it also allows gentler up- and down-weights. I have Wikipedia get a boost whenever its results are relevant, which is by itself a huge improvement over Google lately. Meanwhile, I don't see Fandom wikis unless there's absolutely nothing else.
I hope to see more innovation from Kagi in the customization side of things, because I think that's what's going to make the biggest difference in preventing SEO gaming. If users can react instantly to block your site because it's filled with garbage, then it won't matter as much if you find a brief exploit that gets you into the first page of the natural search results. On Google Fandom is impossible to avoid. On Kagi it just takes one click.
I would say: 1) The UI. You’re still performing normal searches in Kagi. But if you hit q, or end your query with a question mark, you get an llm synthesized answer at the top, but can still browse and click through the normal search results.
2) Kagi has personalization, ie you uprank/downrank/block domains, so the synthesized llm answer should usually be better because it has your personalized search as input.
In addition to all that's been written above, you can configure personal filters, so that (for example) you never ever see a pinterest page in your search results. Things like that are IMO killer features today.
Thankfully, Kagi also have a toggle to completely turn that crap (AI) off so it never appears.
Personally, I have absolutely no use for a product that can randomly generate false information. I'm not even interested until that's solved.
(If/when it ever is though, at that point I'm open to taking a look)
So yeah, Kagi definitely "leads the way" on this. By giving the user a choice to not waste time presenting AI crap. :)
give me ai hallucinations over google every day of the week and twice on sunday…
Google isn't paid for keywords, that's not how search works. They sell ad space, Google does not rank up search content for payment.
And also the obvious point is, you don't need to trust Google because they merely point you to content, they don't produce the content. They're an index for real existing content on the web which you can judge for yourself. A search index unlike an AI model, does not output uniform or even synthetic content.
I think it's already compelling enough to replace the current paradigm. Search is pretty much dead to me. I have to end every search with "reddit" to get remotely useful results.
The concern I have with LLMs replacing search is that once it starts being monetized with ads or propaganda, it's going to be very dangerous. The context of results are scrubbed.
Not to mention that users consuming most content through a middle-man completely breaks most publishers business models. Traditional search is a mutually beneficial arrangement, but LLM search is parasitic.
Expect to see a lot more technical countermeasures and/or lawsuits against LLM search engines which regurgitate so much material that they effectively replace the need to visit the original publisher.
Traditional search is mutually beneficial... to search providers and publishers. At expense of the users. LLM search is becoming popular because it lets users, for however short time this will last, escape the fruits of the "mutually beneficial arrangement".
If anything, that arrangement of publishers and providers became an actual parasite on society at large these days. Publishers, in particular, will keep whining about being cut off; I have zero sympathy - people reach for LLMs precisely because publishers have been publishing trash and poison, entirely intentionally, optimizing for the parasitic business model, and it got so bad that the major use of LLMs is wading through that sea of bullshit, so that we don't have to.
The ad-driven business model of publishing has been a disaster for a society, and deserves to be burned down completely.
(Unfortunately, LLMs will work only for a short while, they're very much vulnerable to capture by advertisers - which means also by those publishers who now theatrically whine.)
Just a wild guess, but at best that content is probably pretty mediocre quality. It's probably Mikkelsen Twins ebook-level garbage.
(In fact, there's value in trying to filter excess crap out of existing training sets.)
What happens when you need to search something new? Just hallucinations all the way down?
- Publishers no longer show you ads, they just get paid out of BAT.
- Brave shows you ads, but Brave does not depend on that to survive. Because of that there is no weird conflict of interest like with Google/Facebook, where the party that surfaces your content is also the party providing you with ads.
- Users can just browse the web without ads as a threat vector, but as long as you have BAT (either via opt-in Brave ads or by purchasing it directly) you are not a freeloader either.
You opt-in to the ads, you get them in your notifications, and every time you tap on one of them you get a few BAT. You browse, the BAT get paid out to whichever sites you visit (or linger on, depending on your configuration). You can opt out of the ads at any time. Brave didn't pre-mine their own coins. And you can buy BAT if you want to support sites without watching ads.
Showing people ads is part of the problem to be solved.
And the website gets paid either way.
Not every person/site can run on Patreon or sponsorship deals. And paywalling a lot of the web would exclude vast swathes of people.
2. "How else would you achieve X than by manipulating people visiting your website into paying for things they probably don't need, and be misinformed and tracked by powerful commercial and political entities?" - I can but shrug at this question.
3. The vast majority of written content is never rewarded or compensated monetarily, ads or no ads.
You do. The ad broker sells access to your eyeballs to a company, and then gives part of that money to whichever parties have a monetization agreement in the content.
> "How else would you achieve X than by manipulating people visiting your website into paying for things they probably don't need, and be misinformed and tracked by powerful commercial and political entities?" - I can but shrug at this question.
Always fun to see people with strong opinions be critically misinformed.
Brave’s ads don’t have tracking, by design.
> The vast majority of written content is never rewarded or compensated monetarily, ads or no ads.
By that logic we should stop paying for art?
But yes: the original Web served its (non-profit-motivated) creators and readers. The past two decades of advertising-based web has served publishers and advertisers, precisely as you note. LLM is mixing that up for the moment but I sincerely doubt that it will last.
That said, I welcome the coming ad/pub pain with unbridled glee.
Out of the pot and into the fire, as they say.
Bullshit. Users have shown time and time and time again that they prefer (generally, at large) free content, which has to be supported by ads, over actually paying directly for the labor of others.
> The ad-driven business model of publishing has been a disaster for a society, and deserves to be burned down completely.
I tend to agree, but people can't expect content, which needs sizable amounts of time and money to produce, for free - it needs some sort of workable funding model. LLMs are only viable now because they were able to slurp up all that ad-supported content before they broke the funding model. That window is closing, and fast.
The way reddit limited access to their API and got google to pay for access. Some variation of that but on a wider scale.
The whole thing needs a reframe. Ad driven business only works because its a race to the bottom. Now we are approaching the bottom, and its not gonna be as competitive. Throwback to the 90s when you paid for a search engine?
If you can charge the user (the customer- NOT the product) and then pay bespoke data providers (of which publishers fall under) then the model makes more sense, and LLM providers are normal middlemen, not parasites.
The shift is already underway imo - my age cohort (28 y/o) does not consume traditional publications directly. Its all through summarization like podcast interviews, youtube essays, social media (reddit) etc
:-)
Separate monthly fees for separate services is absolutely unsustainable already. The economic model to make the internet work has not yet been discovered, but $20 a month for a search engine is not it.
Search means either:
* Stackoverlow. Damaged through new owner but the idea lives.
* Reddit. Google tries to fuck it up with „Auto translation“?
* Gitlab or GitHub if something needs a bugfix.
The rest of the internet is either an entire ****show or pure gold pressed latinum but hardly navigatable thanks to monopolies like Google and Microsoft.PS: ChatGPT already declines in answer because is source is Stackoverflow? And…well…these source are humans.
I've heard reports that requesting verbatim results via the tbs=li:1 parameter has helped some people postpone entirely giving up on Google.
Personally I've already been on Kagi for a while and am not planning on ever needing to go back.
I remember reading a Google Search engineer on here explain that the engine just latches on some unrendered text in the HTML code. For example: hidden navbars, prefetch, sitemaps.
I was kinda shocked that Google themselves, having infinite resources, couldn't get the engine to realize which sections gets rendered... so that might have been a good excuse.
I worry that there's a confusion here--and in these debates in general--between:
1. Has the user given enough information that what they want could be found
2. Is the rest of the system set up to actually contain and deliver what they wanted
While Aunt Tillie might still have problems with #1, the reason things seem to be Going To Shit is more on #2, which is why even "power users" are complaining.
It doesn't matter how convenient #1 becomes for Aunt Tillie, it won't solve the deeper problems of slop and spam and site reputation.
Google really does seem determined to completely destroy internet search.
If you post something wrong on the Internet, someone will correct you.
You can suffix: "site:reddit.com" and get results for that particular site only.
Yes there are workarounds, but I like using the native OS text expansion and it works everywhere except Google.
https://chatgpt.com/share/6723f225-bd74-8000-bfef-4f7f8687b0...
https://chatgpt.com/share/6724116c-13b8-8003-bb2a-4d2ca49da4...
https://chatgpt.com/share/672414aa-bcc8-8003-beec-ba4eae83a0...
I guess it matches well my own biases:'D
If I made a 10x less energy use AC I'd be a billionaire; comparing to one of the most costly energy uses that has no simple replacement is not a good metric.
The way temperatures have been changing in Europe in the past decade, you may not have A/C at home now, but I bet you'll have it in ten years, tops. So will everyone else and their dogs.
As I said, in building that are attached to money incomes, be it hostels, shops or restaurants, it's of course something that can balanced within loses and profits. In a personal home, it will be just eat some of your budget.
And with electricity price on the rise (and thus basically everything in common goods) and salary stagnation on the other hand, I doubt people here will suddenly rush on AC on massive scales. Plus government apparently are pushing to alternative approach, but I'm just discovering that as this thread launched me on the track to investigate the topic.
Personally, I doubt I'll jump to some AC anytime soon. It's just out of reach for my incomes, all the more when there is no basically no chance to see the electricity price plummet while my salary has good chances to continue to stay freezed as it's been for the two last years. And it's not like I feel the most unlucky person in the town, to be clear, my situation is far from the worst ones I can witness around me.
Not to mention central A/C in the North American sense with a air handler & ducts is just never coming to France, it's such an outdated technology and forced-air heating is generally considered to suck there.
I understand the Olympic Village had the same system and many teams brought their own portable AC units. https://apnews.com/article/olympics-air-conditioning-paris-0...
I suppose that you do have heating in the home?
Shops, restaurants, airports and things like that which are attached with revenue streams have them.
I never been in a billionaire palace thus said.
It makes me look at some statistics
https://www.statista.com/statistics/911064/worldwide-air-con...
https://worldpopulationreview.com/country-rankings/air-condi...
https://www.eia.gov/todayinenergy/detail.php?id=52558
https://www.rfi.fr/en/france/20220723-france-does-not-use-mu...
https://www.reddit.com/r/AskFrance/comments/vhs8dn/how_commo...
Apparently, Japan, USA and now China are huge users of AC in personal homes (like more than 90% of them). That's in sharp contrast with what is observed in most of Europe, including France where I live.
I never had the opportunity to travel to any of this country, so indeed I was totally blind of this extrem gap in use from my own personal experience.
I never said that boats or AC don't exist. Both exist, and I did saw and experimented many of them in commercial context. But not everyone can afford them plus the cost to operate them.
Sure I should broaden my horizon and even consider to look people enjoying their private jets and some helicopters. But a mere wage slave like myself will never have the chance to afford one, that's for sure.
Now let's consider back in initial context: mere mortals around me are definitely all using internet as soon as there parents will let them do so, and even a homeless person can afford a first price mobile access (2€/months) with a phone they can receive for nothing in some charity organizations like Emmaus. So affordability of access to online search is definitely several order below AC.
https://archive.nytimes.com/krugman.blogs.nytimes.com/2015/0...
https://archive.nytimes.com/krugman.blogs.nytimes.com/2015/0...
In hot and humid places, having AC was always a priority a hundred steps above having internet access, until cheap smart phones arrived.
And they use a lot of energy, just like heating uses a lot of energy in colder climates.
Plumbing is generally not also considered a luxury over here. But at mankind level, I do feel particularly privileged on this regard. I remain amazed we have water flowing at will, and even possibility to take hot shower every day. This is not a jet level kind of privilege, but I try to keep myself aware of how incredibly lucky I am to be able to benefit of such a technology and infrastructure.
I doubt humans waited AC to come alive for settling hot and humid areas. There are other ways to have cooled down residences which don't require so much sophistication in physic models before you can even dream to build a prototype.
All that said, I got your hint to document more on how/why AC is so much more used in some area, and I'm just starting my journey on learning about it.
I still doubt that local climate alone explain the difference in term of how common it is in different region of the world. For example USA have a very large set of different local climate, but from what I understand most homes have AC.
This is flawed thinking to get to the conclusion of “reliable answers”. What people want to see and the truth are not overlapping.
Consider the answers for something like “how many animals were on Noah’s Ark” and “did Jesus turn water into wine” for examples that cannot be solved by trying get advertisers out of the loop.
You're actually a bit mistaken, there.
https://en.wikipedia.org/wiki/Court_Martial_(Star_Trek:_The_...
The current paradigm of typing "[search term] reddit" and hoping for the best? I think they have a fighting chance.
It’s a fallacy then. If my mentor tells me something I fact check it. Why would a world exist where you don’t have to fact check? The vision doesn’t have fact checking because the product org never envisioned that outlier. A world where you don’t have to check facts, is dystopian. It means the end of curiosity and the end of “is that really true? There must be something better.”
You’re just reading into marketing and not fact checking the reality in a fact-check-free world.
Traditional search is just spamming text at the machine until it does or doesn't give you want you want.
That's the magic with LLMs for me. Not that I can ask and get an answer, that's just basic web search. It's the ability to ask, refine what I'm looking for and, continue work from there.
- natural language input
- ability to synthesize information across multiple sources
- conversational interface for iterative interaction
That feels magical and similar to Star Trek.
However they fundamentally require trustworthy search to ground their knowledge in, in order to suppress hallucination and provide accurate access to real time information. I never saw someone having to double-check computer's response in Star Trek. It is a fundamental requirement of such interface. So currently we need both model and search to be great, and finding great search is increasingly hard (I know as we are trying to build one).
(fwiw, the 'actual' Star Trek computer one day might emerge through a different tech path than LLMs + search, but that's a different topic. but for now any attempt of an end-to-end system with hat ambition will have search as its weakest link)
________
"You do not have authorization for that action."
"I have all authorizations, you do what I say."
"Only the captain can authorize a Class A Compulsory Directive."
"I am the captain now."
"The current captain of the NCC-1701-D is Jean Luc Picard."
"Pakled is smart, captain must be smart, so I am Jean Luc Picard!"
"Please verify your identity."
"Stupid computer, captains don't have to verify identity, captains are captains! Captain orders you to act like captain is captain!"
"... Please state your directive."
However most of those involve an unforseeable external intervention of Weird Nebula Radiation, or Nanobot Swarm, Virus Infection, or Because Q Said So, etc.
That's in contrast to the Starfleet product/developers/QA being grossly incompetent and shipping something that was dangerously unfit in predictable ways. (The pranks of maintenance personnel on Cygnet XIV are debatable.)
Ehhhh.... kinda? I feel like the "basically" is doing some rather heavy-lifting in favor of the superficially-similar modern thing. Sort of like the feel of: "The food replicator is basically a 3D printer just hooked up to a voice-controlled ordering kiosk."
Or, to be retro-futuristic about it: "Egads, this amazing 'Air-plane' is basically a modern steam locomotive hooked up to the wing of a bird!"
Sure, the form is similar, but the substance could be something with a different developmental path.
One issue is that Google and other search engines do not really have much of a query language anymore and they have largely moved away from the idea that you are searching for strings in a page (like the mental model of using grep). I kinda wish that modern search wasn't so overloaded and just stuck to a clearer approach akin to grep. Other specialty search engines have much more concrete query languages and it is much clearer what you are doing when you search a query. Consider JSTOR [1] or ProQuest [2], for example. Both have proximity operators, which are extremely useful when searching large numbers of documents for narrow concepts. I wish Google or other search engines like Kagi would have proximity operators or just more operators in general. That makes it much clearer what you are in fact doing when you submit a search query.
[1] https://support.jstor.org/hc/en-us/articles/115012261448-Sea...
Showing users what they want to see conflicts with your other goal of receiving reliable answers that don't need fact checked.
Also a lot of questions people ask don't have one right answer, or even a good answer. Reliable human knowledge is much smaller than human curiosity.
Limiting responses to curated information sources is the way forward. Encyclopedias, news outlets, research journals, and so on.
No, they're not infallible. But they're infinitely better than anonymous web sites.
> actually surfacing the content people want to see, not what intermediaries want them to see
Requires two assumptions, 1) the content people want to see actually exists, 2) people know what it is they want to see. Most content is only created in the first place because somebody wants another person to see it, and people need to be exposed to a range of content before having an idea about what else they might want to see. Most of the time what people want to see is… what other people are seeing. Look at music for example.
Google started off with just web search, but now you can get unit conversions and math and stuff. ChatGPT started in the other direction and is moving to envelope search. Not being directed to sites that also majority serve google ads is a double benefit. I'll gladly pay $20/30/mo for an ad free experience, particularly if it improves 2x in quality over the next year or two. It's starting to feel like a feature complete product already.
It's interesting how prescient it was, but I'm more struck wondering--would anyone in 1987 have predicted it would take 40+ years to achieve this? Obviously this was speculative at the time but I know history is rife with examples of AI experts since the 60s proclaiming AGI was only a few years away
Is this time really different? There's certainly been a huge jump in capabilities in just a few years but given the long history of overoptimistic predictions I'm not confident
I’m the past there was a lot of overconfidence in the ability for things to scale. See Cyc (https://en.m.wikipedia.org/wiki/Cyc)
40+ makes it sound like you think it will ever be achieved. I'm not convinced.
What we need is an easier way to verify sources and their trustworthiness. I don't want an answer according to SEO spam. I want to form my own opinion based on a range of trustworthy sources or opinions of people I trust.
This also seems like a little ridiculous premise. Any confident statement about the real world is never fully reliable. If star trek were realistic the computer would have been wrong once in a while (preferably with dramatically disastrous consequences)—just as the humans it likely was built around are frequently wrong, even via consensus.
If I'm asking ChatGPT to put an itinerary together for a trip (OpenAI's suggestion, not mine), my expectation is that places on that itinerary exist. I can forgive them being closed or even out of business but not wholly fabricated.
Without this level of reliability, how could this feature be useful?
It drives me crazy that my kids teachers go on and on about how inaccurate Wikipedia is, and that just anybody can update the articles. They want to teach the kids to go to the library and search books.
In a few years time they will be going on and on about how inaccurate ChatGippity is and that they should use Wikipedia.
If all these "AI" companies gave a couple of million to support Wikipedia, they would do the world a lot more good.
100%. Students who can do the work know the winning move is to use it as a way to find the sources you actually use.
I just straight-up don't agree with this, nor with the idea that what people consider "facts" are nearly as reliable as is implied. What we actually refer to via "fact" is "consensus". Truth is an apriori concept whereas we're discussing posteriori claims. Any "reasonable" ai would give an indication of degree of certainty, and there's no reliable or consensus-driven methodology to produce this manually, let alone automatically. The closest we come is the institution of "science" which can't even—as it stands—reliably address the vast majority of claims made about the world today.
And this is even before discussing the thorny topic of the ways in which language binds to reality, to which I refer you to Wittgenstein, a person likely far more intelligent and epistemologically honest than anyone influencing AI work today.
Yes, wikipedia does tend to cohere with reality, or at least it sometimes does in my experience. That observation is wildly different from an expectation that it does in the present or will in the future reflect reality. Futhermore it's not terribly difficult to find instances where it's blatantly not correct. For instance, I've been in a wikipedia war over whether or not the Soviet Union killed 20 million christians for being christians (spoiler: they did not, and this is in fact more people than died in camps or gulags over the entire history of the state). However, because there are theologists at accredited universities that have published this claim, presumably with a beef against the soviet union for whatever reason (presumably "anticommunism"), it's considered within the bounds of accuracy by wikipedia.
EDIT0: I'm not trying to claim wikipedia isn't useful; I read it every day and generally take what it says to be meaningful and vaguely accurate. but the idea that you should trust what you read on it seems ridiculous. As always, it's only as reliable as the sources it cites, which are only as reliable as the people and institutions that produce that cited work.
EDIT1: nice to see someone else from western mass on here; cheers. I grew up in the berkshires.
EDIT2: to add on to the child comment, wikipedia is occasionally so hilariously unreliable it makes the news. Eg https://www.theguardian.com/uk-news/2020/aug/26/shock-an-aw-...
"The Great Patriotic War changed Joseph Stalin’s position on the Orthodox Church. In 1943, after Stalin met with loyal Metropolitans, the government let them choose a new Patriarch, with government support and funding, and permitted believers to celebrate Easter, Christmas and other holidays. Stalin legalized Orthodoxy once again."
https://www.rbth.com/history/329361-russian-orthodox-church-...
Sure, cuz fact-checking works so well for us today. I'm sure we'll resolve the epistemological issues involved with the ridiculous concept of "fact-checking" around when we invent summoning food from thin (edit: thick) air and traveling faster than light.
There is no fact checking; there are only degrees of certainty. "fact-checking" is simply a comfortable delusion that makes western media feel better about engaging in telling inherently unverifiable narratives about the world.
1. On price; race to the bottom or do free with ads
2. Differentiation
3. Focus - targeting a specific market segment
Some things don't change. Land grabbers tend to head down route 1.
We are currently in the growth phase of VC funded products where everything is almost free or highly subsidized (save chats sub) - i am not looking forward to when quality drops and revenue is the driving function.
We all have to pay for these models somehow - either VC lose their equity stakes and it goes to zero (some will) or ads will fill in where subs don’t. Political ads in AI is going to wreak havoc or undermine the remainder of the product.
Navigation is the only thing that works but wayz was way better at that and the only reason they killed(cough bought it) was to get the eyeballs to look at feed.
s/outcome/search result/
Honestly I kind of think we really need open source databases/models and local ai for stuff like this.
Even then I wonder about data pollution and model censorship.
What would censors do for models you can ask political questions?
Regarding incentives - with Perplexity, ChatGPT search et al. skinning web content - where does it leave the incentive to publish good, original web content?
The only incentivised publishing today is in social media silos, where it is primarily engagement bait. It's the new SEO.
Besides Poe's Web Search, the other search engine I use, for news but also for points of view, deep dive type blog type content, is Twitter. Believe it or not. Google search is so compromised today with the censorship (of all kinds, not just the politically motivated), not to mention Twitter is just more timely, that you miss HUGE parts of the internet - and the world - if you rely on Google for your news or these other things.
The only time I prefer google is when I need to find a pointer/link I already know exists or should exist, or to search reddit or HN.
Even with perfect knowledge right now, there’s no guarantee that knowledge will remain relevant when it reaches another person at the fastest speed knowledge is able to travel. A reasonable answer on one side of the universe could be seen as nonsensical on the other side - for instance, the belief that we might one day populate a planet which no longer exists.
As soon as you leave the local reference frame (the area in a system from which observable events can realistically be considered happening “right now”), fact checking is indeed required.
Fundamentally it feels like that cant happen though because there is no money in it, but a reality where my phone is an all knowing voice I can reliably get info from instead of a distraction machine would be awesome.
I do "no screen" days sometimes and tried to do one using chatGPT voice mode so I could look things up without staring at a screen. It was miles from replacing search, but I would adopt it in a second if it could.
We need to stop adopting this subscription model society mentality and retake _our_ internet. Internet culture was at one point about sharing and creating, simply for the sake of it. We tinker'd and created in our free time, because we liked it and wanted to share with the world. There was something novel to this.
We are hackers, we only care about learning and exploring. If you want to fix a broken system, look to the generations of old, they didn't create and share simply to make money, they did it because they loved the idea of a open and free information super highway, a place where we could share thoughts, ideas and information at the touch of a few keystrokes. We _have_ to hold on to this ethos, or we will lose what ever little is left of this idea.
I see things like kagi and is instantly met with some new service, locked behind a paywall, promising lush green fields of bliss. This is part of the problem. (not saying kagi is a bad service) I see a normalized stigma around people who value privacy, and as a result is being locked out, behind the excuse of "mAliCiOuS" activity. I see monstrous giants getting away with undermining net neutrality and well established protocols for their own benefit.
I implore you all, young and old, re(connect) to the hacker ethos, and fight for a free and open internet. Make your very existence a act of rebellion.
Thank you for reading my delirium.
Man, it was pretty incredible!
I asked a lot of questions about myself (whom I know best, of course) and first of all, it answered super quickly to all my queries letting me drill in further. After reading through its brief, on-point answers and the sources it provided, I'm just shocked at how well it worked while giving me the feeling that yes, it can potentially – fundamentally change things. There are problems to solve here, but to me it seems that if this is where we're at today, yes in the future it has the potential to change things to some extent for sure!
Will it replace Google in the mass market? No. Why? Power. I don't mean how good the product is. I mean the literaly electricity.
There are key metrics that Google doesn't disclose as part of its financials. These include thing slike the RPM (Revenue per Thousand Searches) but it also must include something like the cost of running a thousand searches when you amortize everything involved. All the indexing, the software development and so on. That will get reduced to a certain amount of CPU time and storage.
If I had to guess, I would guess that ChatGPT uses orders of magnitude more CPU power and electricity than the average Google search.
Imagine trying to serve 10-50M+ (just guessing) ChatGPT searches every secondj. What kind of computing infrastructure would that take? How much would it cost? How would you monetize it?
In the short term, I wonder what happens to a lot of the other startups in the AI search space - companies like Perplexity or Glean, for example.
"The search will be personal and contextual and excitingly so!"
---
Brrrr... someone is hell-bent on the extermination of the last aspects of humanity.
Holy crap, this will be next armageddon, because people will further alienate themselves from other people and create layers of layers of unpenetrable personal bubbles around themselves.
Kagi does the same what google does, just in a different packaging. And these predictions, bleh, copycats and shills in a nicer package.
I mean, Star Trek is a fictional science-fantasy world so it's natural that tech works without a hitch. It's not clear how we get there from where we are now.
> Path forward requires solving the core challenge: actually surfacing the content people want to see, not what intermediaries want them to see
These traps and patterns are not inevitable. They happen by choice. If you're actively polluting the world with AI generated drivel or SEO garbage, you're working against humanity, and you're sacrificing the gift of knowing right from wrong, abandoning life as a human to live as some insectoid automaton that's mind controlled by "business" pheromones. We are all working together every day to produce the greatest art project in the universe, the most complex society of life known to exist. Our selfish choices will tarnish the painting or create dissonance in the music accordingly.
The problem will be fixed only with culture at an individual level, especially as technology enables individuals to make more of an impact. It starts with voting against Trump next week, rejecting the biggest undue handout to a failed grifter who has no respect for law, order, or anyone other than himself.
Making things quicker and easier always wins in tech and in life.
growing up, we had the philosophical "the speaking tree" https://www.speakingtree.in/
If trees could talk, what would they tell us. Maybe we need similarly the talkingAI
To put this differently, I'm not any more interested in seeing stormfront articles from an LLM than I am from google, but I trust neither to make a value judgement about which is "good" versus "bad" information. And sometimes I want to read an opinion, sometimes I want to find some obscure forum post on a topic rather than the robot telling me no "reliable sources" are available.
Basically I want a model that is aligned to do exactly what I say, no more and no less, just like a computer should. Not a model that's aligned to the "values" of some random SV tech bro. Palmer Luckey had a take on the ethics of defense companies a while back. He noted that SV CEOs should not be the ones indirectly deciding US foreign policy by doing or not doing business. I think similar logic applies here: those same SV CEOs should not be deciding what information is and is not acceptable. Google was bad enough in this respect - c.f. suppressing Trump on Rogan recently - but OpenAI could be much worse in this respect because the abstraction between information and consumer is much more significant.
This is a bit like asking for news that’s not biased.
A model has to make choices (or however one might want to describe that without anthropomorphizing the big pile of statistics) to produce a response. For many of these, there’s no such thing as a “correct” choice. You can do a completely random choice, but the results from that tend not to be great. That’s where RLHF comes in, for example: train the model so that its choices are aligned with certain user expectations, societal norms, etc.
The closest thing you could get to what you’re asking for is a model that’s trained with your particular biases - basically, you’d be the H in RLHF.
That's market-induced bias--which isn't ethically better/worse than activist bias, just qualitatively different.
In the AI/search space, I think activist bias is likely more than zero, but as a product gets more and more popular (and big decisions about how it behaves/where it's sold become less subject to the whims of individual leaders) activist bias shrinks in proportion to market-motivated bias.
"who won the warriors game last night" returns last night's score directly.
"who won the world series yesterday" returns last night's score directly, while "who won the world series" returns an overview of the series.
No ads.
Yes, Google has their own AI divisions, tons of money and SEO is to blame for part of their crappiness. But they've also _explicitly_ focused on ad-dollars over algorithmic purity if one is to believe the reports of their internal politics and if those are true they have probably lost a ton of people who they'd need right now to turn the ship around quickly.
I can still search things, i get results but, they're an ordered list of popular places the engine is directing me to. Some kind of filtering is occurring on nearly every search i make that's making the results feel entirely useless.
Image search stopped working sometime ago and now it just runs an AI filter on whatever image you search for, tells you there's a man in the picture and gives up.
Youtube recommendations is always hundreds of videos i've watched already, with maybe 1-2 recommendations to new channels when i know there's millions of content creators out there struggling who it will never introduce me to. What happened to the rabbit holes of crazy youtube stuff you could go down?
This product is a shell of its old self, why did it stop working?
Image search isn't great either but it still often gives me something close and that usually satisfies my image searching needs.
I still find YouTube recommendations quite good for me, but there are occasional ones I've watched already. I still go down its fun (and educational!) rabbit holes all the time.
Yes. However, I found that https://scholar.google.com still works perfectly well. It feels just as the old Google without all the crap they've been adding in the last years.
I can’t imagine the cost this would have on scientific producivity in the West.
Oddly their biggest strength is being irrelevant to the decision makers, if the bean counters noticed the few million they are losing on running Scholar there will be ads + Gemini all over it.
YouTube search is also completely useless now.
https://www.youtube.com/results?search_query=path+of+exile+2...
https://www.youtube.com/results?search_query=ramen+restauran...
When I search these three things the first 5 results are exactly what I want.
What exactly are you searching of the time that it's "completely useless"? Genuine question.
Try searching for a particular video, one that is not super popular. What I want is a complete list of results that match my query. What I get is YouTube trying to recommend videos to me.
This may make it better for the non-tech folks who search for things in unclear language, and likely make it worse for those who search with precision (i.e. much of the HN crowd).
Most importantly, they make their money off ads, and it probably makes sense to optimize for the non-tech folks. The ones that don't run ad blockers and accidentally click the barely differentiable ads.
In short - I suspect they're just using new tech to make more money.
If there are no sponsored links - the result is crap.
Google is good at searching, they just have no incentive to show you results.
“ to organize the world's information and make it universally accessible and useful” was a nice mission while it lasted
First search (“products that do X”) got me a bunch of those comparisons sites, none of them containing the one I was trying to find
Second search (“ycombinator startup that does X”) got me a page of spam, but at least I found the product name
Third search (company name) got me an ENTIRE PAGE of ads and SEO optimized pages before the actual link to the actual product
The level of sponsored results for some queries is way OTT, and obviously any kind of search like "best laptop 2024" is never going to give you good results (probably because they don't exist), but other than that I'm still pretty happy with Google Search.
Genuinely interested: have you tried to spend a few weeks on an alternative?
I decided to try DuckDuckGo a few years ago. Not because it was obviously better, but to see if I could get used to it. After a few weeks, I had completely stopped falling back to Google when not finding what I wanted. I stayed on DDG for a couple years. Then same thing with Kagi: I just decided to try. It's been 1.5-2 years now and I'm disappointed when I can't use Kagi (which has my customizations, like some websites I ignore and some that I pin).
I guess my point is that it's not necessarily that you have to try something else when Google is unbearable. Maybe you can try something else and then realize (or not) that Google was not better.
1. Search Google for "ukrainian who shot his commanding officer" without quotes
2. Google serves me nothing but MSM articles of Russian this or that. The word Russian wasn't even in my search string.
3. Add the Google operator MINUS SIGN Russia
4. Results:
a) Policeman feared Chris Kaba would kill, court told
b) Media: Russian Repeated Offender Kills Five More His ...
c) President Volodymyr Zelenskyy and First Lady Olena ...
d) Ukrainian Galician Army
e) Article from 2017 entitled Killed Defense Intelligence Officer Was "The First Donetsk Cyborg"
f) Shots fired at car carrying Ukrainian President Zelenskiy's ...
5. Go to yandex.com and search the original query6. It comes up on Yandex immediately with the original query
And its first two results are for a dead site and a deleted article, so fine user-experience reasons to exclude?
And the google results have a link to a reddit story such accident on the first page
If I search "Mullica Hill tornado" on yt itself, I get nothing but useless 1 min local news clips. If I search the same term on Reddit, I get first person footage of the tornado passing over people's houses—hosted on Youtube! Tornado enthusiasts still occasionally dredge up "lost media" of events like the 2011 Alabama outbreak that have been on the site this entire time, but are effectively impossible to view via the algorithm, even with the precise date and location specified.
> Based on the elements you’ve described—eyes, a pineapple, a bucket of water, a house of cards, chess, and a time loop—it’s challenging to identify a single music video that encompasses all these features.
Has google ever indexed all the lyrics and scenes in a video to allow for such a weird search to be successful?
What the parent is referring to is favoring annoying ad-filled garbage over an equally relevant but straightforward result.
The hidden variable is that ad-riddled spam sites also invest in SEO, which is why they rank higher. I am not aware of any evidence that Google is using number of Google ads as a ranking factor directly. But I would push back and say that “SEO” is something Google should be doing, not websites, and a properly optimized search engine would be penalizing obvious garbage.
There are several meaningful difference between surfing Youtube and surfing the web. These include ownership, access, review, exposure, and more.
So if 95% of traffic/users/whatever metric are not using a web browser for those activities, is it really the web? It can't be called the web just 'cause they use HTTPS. It's gotta be a 'world wide web' experience, which I think a good proxy for would be using a web browser.
I got no horse in this race, just thinking out loud about it.
Another common phenomenon these days is that lots of businesses don’t even bother having a web presence - it’s all instagram, WhatsApp and tiktok accounts, mostly only accessible via apps (or worse, chat platforms like discord)
Agree YouTube and Instagram are probably mostly apps which puts them in the “Internet” category but not “world wide web”.
With ChatGPT, I can give a thumbs up or thumbs down; this means that OpenAI will optimize for users thumbs up.
With Google, the feedback is if I click on an Ad; this means that Google optimizes for clickbait.
Heck, Google even promoted the `ping`[0] anchor attribute feature so they can log what link you click without slowing you down. (Firefox doesn’t support ping, which means when Firefox users click on a Google search result link they’re sent to an internal google.com URL first and then redirected for logging purposes)
[0] https://developer.mozilla.org/en-US/docs/Web/API/HTMLAnchorE...
I am talking about relevance or returning what I asked. If I ask for reviews for SaaS product, Google will usually return a rival vendors’ biased review.
If ChatGpt search returns a review written by “the professional association of xxx developers” or another unbiased site, I will give it a thumbs up. I believe other people will do the same.
I just got back from a job where my tech used AI on his phone every time he needed to search for something. The results were hilariously bad, but if things keep improving one day they might not be. Google needs to be ready for that.
I'm not familiar with what you're referring to here. Happen to have a link?
This especially happens after they dominate the market.
Take for example IE6, Intel, Facebook, IBM, and now Google.
They have everything they need to keep things from going off the rails, management however has a tendency to delusionally assume their ship is so unsinkable that they're not even manning their stations.
It becomes Clayton Christensenesque - they're dismissive of the competition as not real threats and don't realize their cash-cow is running on only fumes and inertia until its too late.
I’m not sure Facebook fits in considering they at least managed to get some other products along the way, and may get more.
I certainly don’t think Google fits the bill. Google is failing because they let their cash cow ruin everything else, not because they let it stagnate while they chased the next moonshot. Google Cloud could have easily been competitive with AWS and Azure in European Enterprise, but it’s not even considered an option because Google Advertising wouldn’t let it exist without data harvesting. Google had Office365 long before Microsoft took every organisation online. But Google failed to sell it because… well…
It’s very typical MBA though. Google has killed profitable products because those products weren’t growing enough. A silly metric really, but one which isn’t surprising when your CEO is a former McKinsey.
It couldn’t happen to a nicer company though, and at least it won’t kill people unlike Boeing.
* Boeing is a consequence of the "Jack Welch" effect - gutting the core in service of short term gains for stock-holders.
* The MBA type, typified by John Sculley at Apple is about calcifying the current offerings presuming market segments supported by historicals with predictable demand. This works well for defensives such as utilities, consumer products and health care but not for markets with dynamic consumer relationships such as technology.
* The Google Cloud example is the Xerox Parc phenomena. Xerox was organizationally structured for investment payoffs only characteristically similar to their mainline products thus they couldn't properly allocate resources to things, such as desktop computing, with different kinds of curves. This is similar to how the franchise retailer Blockbuster so slowly responded to the centralized mail-order subscription Netflix. The institutional structure is only-so-flexible. This is similar to Conway's Law.
* The "ruin everything else" is a generalized form of a "brand extension failure". Examples include Harley Davidson perfume, Bic underwear, McDonalds Pizza, and Heinz cleaning vinegar - an over-leveraged commitment to a wildly successful core offering makes other ventures impossible.
This is not that. It's yet something else. Abstractly it's "X is a wild success, let's make Y another X instead of working on X+1"
Organizations suffer from varying degrees of ailments and they can create codependencies making the unraveling hard. Often it devolves into politics of power brokers with the company's survival dependent on the competency of the influential instead of the influence of the competent. A brutal struggle to control a sinking ship.
The crisis of the third century happens every day.
can you name these products?
E.g. for me, how much Google (and silicon valley in general) have enabled twisted ideologies to flourish. All in search of ad-dollars by virtue of eyeballs on screens, at the detriment of everything.
Considering the value of time, past consumer surplus is especially valuable now.
Sure, there are systematic flaws causing SEO to ruin the information provided: but it isn't clear what Google can do to fight the emergent system.
I'm not sure that Bing/DDG are any better.
I use search (DDG web, Google/Apple maps, YouTube) all the time and I am regularly given results that are extremely valuable to me (and mostly only directly cost me a small amount of my time some of my time e.g. YouTube adverts). Blaming SEO on Google seems thoughtless to me. Google appears to be the victims of human cybersystems as much as we are.
The problem I see with search is that the input is deeply hostile to what the consumers of search want. If the LLM's are particularly tuned to try and filter out that hostility, maybe I can see this going somewhere, but I suspect that just starts another arms race that the garbage producers are likely to win.
Currently I do find that Perplexity works substantially better then Google for finding what I need, but it remains to be seen if they're able to stay useful as a larger and larger portion of online content just AI generated garbage.
This comes off as condescending. As things have gotten more algorithmic over the last two decades, I've noticed a matching decrease in the accuracy and relevance of the information I seek from the systems I interact with that employ these algorithms.
Yes, you're right that there are processing algorithms behind the scenes interpreting the data for us. But you're wrong: I fucking hate it, it's made things worse, and layering more on top will not make things any better.
There is no way to SEO the entire corpus of human knowledge. ChatGPT is very good for gleaning facts that are hard to surface in today's garbage search engines.
Why would you assume that?
We don’t have a way of finding objective information, why would we be able to train a model to do so?
This time it was, "Did Paul Edwin Zimmer write a fourth Dark Border novel?" (Real answer: Yes, Ingulf the Mad. You can find the answer on his Wikipedia page.[1])
ChatGPT's[2] answer: "Yes, Paul Edwin Zimmer wrote a fourth novel in the Dark Border series titled "The Dark Border." This book was published after the original trilogy, which included "The Dark Border," "The Gilded Age," and "The Silver Sphere." If you're interested in the themes or plot, let me know!" (Note: these are not the titles of the 2nd and 3rd novels in the series. Also, it gave me the same name for the putative 1st and 4th books.)
Pure hallucination.
1. https://en.wikipedia.org/wiki/Paul_Edwin_Zimmer 2. https://chatgpt.com/
I had to refresh my knowledge base visiting fandom websites to review the episode selected as answer as chatgpt tendency to mix things up and provide entirely made up episodes make it hard for bar trivias (and makes me doubt myself too). The same with other tv series such as House MD and Scrubs.
No, Paul Edwin Zimmer did not write a fourth novel in the Dark Border series. The trilogy consists of "The Dark Border," "The Dark Border: The Return," and "The Dark Border: The Reckoning." After these, he focused on other projects and did not continue the series.
Yes, Paul Edwin Zimmer wrote a fourth novel in his Dark Border series titled Ingulf the Mad, published in 1989. This installment focuses on the characters Ingulf Mac Fingold and Carrol Mac Lir, detailing their meeting and the acquisition of their mystical swords. Notably, Istvan Divega, the protagonist of the earlier books, does not appear in this novel.
> As of my knowledge cutoff in October 2023, Paul Edwin Zimmer did not publish a fourth novel in the Dark Border series. The series comprises three books: > 1. The Lost Prince (1982) > 2. King Chondos' Ride (1982) > 3. A Gathering of Heroes (1987) > > Paul Edwin Zimmer had plans to continue the series, but he passed away in 1997 before any additional novels were completed or published. There have been no posthumous releases of a fourth Dark Border novel. If there have been developments after October 2023, I recommend checking recent publications or official announcements for the most up-to-date information.
DuckAssist Result: The fourth novel in Paul Edwin Zimmer's Dark Border series is titled "Ingulf the Mad." This book focuses on characters Ingulf Mac Fingold and Carrol Mac Lir, detailing their meeting and the acquisition of their mystic swords, while the main character from the earlier novels, Istvan Divega, does not appear.
(With source as wikipedia)
Does someone else have good search skills but mingle traditional search engines with LLMs anyways? Why?
I use LLMs every day but wouldn't trust one to perform searches for me yet. I feel like you have to type more for a result that's slower and wordier, and that might stop early when it amasses what it thinks are answers from low effort SEO farms.
LLMs on the other hand (free ChatGPT is the only one I've used for this, not sure which models) give me an opportunity to describe in detail what I'm looking for, and I can provide extra context if the LLM doesn't immediately give me an answer. Given LLM's propensity for hallucinations, I don't take its answers as solid truth, but I'll use the keywords, terms, and phrases in what it gives me to leverage traditional search engines to find a more authoritative source of information.
---
Separately, I'll also use LLMs to search for what I suspect is obscure-enough knowledge that it would prove difficult to wade through more popular sites in traditional search engine results pages.
For me this is typically a multi-step process. The results of a first search give me more ideas of terms to search for, and after some iteration I usually find the right terms. It’s a bit of an art to search for content that maybe isn’t your end goal, but will help you search for what you actually seek.
LLMs can be useful for that first step, but I always revert to Google for the final search.
Also, Google Verbatim search is essential.
Things that were previously "log a jira and think about it when I have a full uninterrupted day" now can be approached with half an hour spare. This is game changer because "have a full day uninterrupted" almost never happens.
It's like having a very senior coworker who knows a lot of stuff and booking a 30m meeting to brainstorm with them and quickly reject useless paths vs dig more into promising ones, vs. sitting all day researching on your own.
The ideas simply flow much faster with this approach.
I use it to get a high level familiarity with what's likely possible vs what's not, and then confirm with normal search.
I use LLMs also for non-work things like getting high level understanding of taxation, inheritance etc laws in a country I moved in, to get some starting point for further research.
I agree that you can't TRUST them, but half the links regular search turns up are also garbage, so that's not really worse, per se.
Oddly, Microsoft recently changed the search version of Copilot to remove all the links to source material. Now it's like talking to an annoying growth-stage-startup middle manager in every way, including the inability to back up their assertions and a propensity to use phrases like "anyway, let's try to keep things moving".
Happy to see this feature set added into ChatGPT – particularly when I'm looking for academic research in/on a subject I'm not familiar with.
The main hard part of searching isn’t formulating queries to write in the Google search bar, it’s clicking on links, and reading/skimming until you find the specific answer you want.
Getting one sentence direct answers is a much superior UX compared to getting 10 links you have to read through yourself.
Google does offer an AI summary for factual searches and I ignore it as it often hallucinates. Perplexity has the same problem. OpenAI would need to solve that for this to be truly useful
For instance I searched for the number to dial to set call forwarding on carrier X the other day, and it gave wrong results because it returned carrier Y.
If we assume that people want a 'direct answer', then of course a direct answer is better. But maybe some of us don't want a 'direct answer'? I want to know who's saying what, and in which context, so I can draw my own conclusions.
1) I know a little bit about something, but I need to be able to look up the knowledge tree for more context: `What are the opposing viewpoints to Adam Smith's thesis on economics?` `Describe the different categories of compilers.`
2) I have a very specific search in mind but it's in a domain that has a lot of specific terminology that doesn't surface easily in a google search unless you use that specific terminology: `Name the different kinds of music chords and explain each one.`
LLMs are great when a search engine would only surface knowledge that's either too general or too specific and the search engine can't tell the semantic difference between the two.
Sometimes when I'm searching I need to be able to search at different levels of understanding to move forward.
But I'm not keeping my hopes up, I doubt the model has been explicitly fine-tuned to double check its embedded knowledge of these types of facts, and conversely it probably hasn't even been successfully fine-tuned to only search when it truly doesn't know something (i.e. it will probably search in cases where it could've just answered without the search). At least the behavior I'm seeing now from some 15 minutes of testing indicates this, but time will tell.
I feel like there is a mental architecture to searching where you try and isolate aspects of what you are searching for that are distinct within the broad category of similar but irrelevant things. That kind of mental model I would hope still works well.
For instance consider this query.
"Which clothing outlets on AliExpress are most recommended in forum discussions for providing high quality cloths, favour discussions where there is active engagement between multiple people."
OpenAI search produces a list of candidate stores from this query. Are the results any good? It's going to be quite hard to tell for a while. I know searching for information like this on Google is close to worthless due to SEO pollution.
It's possible that we have at least a brief golden-age of search where the rules have changed sufficiently that attempts to game the system are mitigated. It will be a hard fought battle to see if AI Search can filter out people trying to game AI search.
I think we will need laws to say AI advice should be subject to similar constraints as legal, medical, and financial advice where there is an obligation to act in the interests of the person being advised. I don't want to have AI search delivering the results of the highest bidder.
I consider myself quite anti LLM hype and I have to admit it has been working amazingly good for me.
- I count calories... eat out always and at somewhat healthy chains (Cava, Chipolte, etc). Tell GPT (via voice while driving to & or after eating) what ive eaten half the day at those places and then later for dinner. It calculates a calorie count estimation for half the day and then later at dinner the remaining. I have checked to see if GPT is getting the right calories for things off websites and it has.
- Have hiking friends who live an hour or two hours away and we hike once a month an hour or less drive is where we meet up and hike at a new place. GPT suggests such hikes and quickly (use to take many searches on Google to do such). Our drives to these new hikes learned from GPT have always been under an hour.
So far the information with those examples has been accurate. Always enjoy hearing how others use LLMs... what research are you getting done in one or two queries which used to take MANY google searches?
Case in point: Visual Basic for Applications (the Excel macro language). This language has a broad pool of reference material and of Stack Overflow answers. It doesnt have a lot of good explicatory material because the early 2000s Internet material is aging out, being deleted as people retire or lose interest, etc.
(To be frank, Microsoft would like nothing more than to kill this off completely, but VBA exists and is insanely more powerful than the current alternatives, so it lives on.)
> "Can you provide a list of the ten most important recent publications related to high-temperature helium-cooled pebble-bed reactors and the specific characteristics of their graphite pebble fuel which address past problems in fuel disintegration and dust generation?"
These were more focused and relevant results than a Google Scholar keyword-style search.
However, it did rather poorly when asked for direct links to the documentation for a set of Python libraries. Gave some junk links or just failed entirely in 3/4 of the cases.
The LLMs are nice because they are not yet enshitified to the point of uselessness.
When it starts with this you KNOW it's going to be maximum bad faith horsefuckery in the rest of the "question."
Which is also exactly something a bad-faith commenter would say, but if I lose either way, I'd rather just ask the question ¯\_(ツ)_/¯
The comment was dripping with condescension towards the use of LLM search, and that’s coming from a huge OpenAI skeptic.
Replacing "really for" with "more for e.g" would have been closer to the intended comment. I'll take that L.
Though if I can clarify: "can't be arsed to search" isn't a normative or judgemental claim against anyone, in the same way that "can't remember a phone number/directions" isn't. I'm speaking under the assumption there's a point between now and heat death where massaging search engine queries may literally not be as useful a 'skill' anymore. So there's less utility in young/old people taking the time to learn it.
But I can see how it sounds when I try to squeeze that into a shorter message.
This becomes even better if the information you want is in multiple different places. The canonical question for that used to be "what was the phase of the moon when John Lennon was shot?". There didn't used to be an answer to this in Google - but the AI search was able to break it down, find the date John Lennon was shot (easily available on Google), find the moon phase on that day (again, easily available on Google) and put them together to produce the new answer.
For a more tech relevant example, "what is the smallest AWS EC2 I can run a Tomcat server in?
You 100% can get this information yourself. It just much more time than having an AI do it.
If you don't like that (like I do), you can also manually add it under Site Search using
I can definitely see this new search feature being useful though. The old one was already useful because (if you asked) you could have it visit each result and pull some data out for you and integrate it all together, faster than you could do the same manually.
It's often hobbled by robots.txt forbidding it to visit pages, though. What I really want is for it to use my browser to visit the pages instead of doing it server side, so it can use my logged-in accounts and ignore robots.txt.
They really had the potential to do something interesting, but were just focused on their ad metrics with the "good enough" search box. What have they been doing all the time?
[0] https://static1.makeuseofimages.com/wordpress/wp-content/upl...
https://www.newyorker.com/news/amy-davidson/tech-companies-s...
And, yet, aside from Aramco, they are the most profitable companies in the history of the world.
OpenAI is Microsoft, technical details don't matter here, only money.
In some fields of CS, places like MS research garner nearly 50% of all top conference publications.
If they're collecting data it doesn't even work; I make no effort to hide from them and none of their ads are targeted to me. Meta, though, they're good at it.
For example `8 hours ago: "autohotkey hotkeys"` with 4 links to pages which I visited while searching.
But this is a Chrome feature, not a Google Search feature. https://myactivity.google.com/myactivity does (sometimes? can't see it right now) have a grouping feature of all the searches made, but this is more of a search log than a search management feature.
So chrome://history/grouped is the closest to what I mean, but I can't pin or manage these history groups, enrich them with comments or even files, like pdf's which could then get stored in Google Drive, as well as get indexed for better searches.
I might be mistaken but i think ff mobile does something similar of grouping search session
It won't directly match ChatGPT logs and OpenAI would just be pouring precious compute to a bottomless pit trying to partial-match.
Serve users a random version and A/B test along the way.
I’m sure they will/are tackling this at the model level. Train them to both generate good completions while also embedding text with good performance at separating generated and human text.
The issue is with the query itself. You're assuming that there's some oracle that will understand your question and surface the relevant information for you. Most likely, it will use the word themselves as part of the query, which SEO sites will exploit.
A more pragmatic search workflow would be to just search for "most common programming languages used" [0], then used the Wikipedia page to get the relevant information [1]. Much more legwork, but with sources. And still quite fast.
[0]: (Screenshot) https://ibb.co/ggBLy8G
[1]: (Screenshot) https://ibb.co/H4g5bDf
is this a real question you needed an answer to, or a hypothetical you posed to test the quality of search results?
of course you're going to get listicles for a query like that, because it sounds like a query specifically chosen to find low-quality listicles.
I got:
Programming Language Latest Version Release Date
Python 3.12.0 October 2, 2024 ???!!!!!
Java SE 22 March 2024
JavaScript ECMAScript 2024 June 2024
C++ C++23 December 2023
C# 12.0 September 2024
Ruby 3.3.0 December 25, 2023
PHP 8.3.0 November 2023
Swift 6.0 September 2024
Go 1.22 August 2024 !!!
Rust 1.75 October 2024 !!!!
Kotlin 2.0.0 May 21, 2024 !!!
This is pretty bad. (???!!! added by me)However, my follow up query "Provide primary web page for each language listed above" was quite decent:
Here are the primary websites for the programming languages mentioned:
Python: https://www.python.org/
Java: https://www.oracle.com/java/
JavaScript: https://developer.mozilla.org/en-
US/docs/Web/JavaScript
C++: https://isocpp.org/
C#: https://learn.microsoft.com/en-us/dotnet/csharp/
Ruby: https://www.ruby-lang.org/en/
PHP: https://www.php.net/
Swift: https://www.swift.org/
Go: https://go.dev/
Rust: https://www.rust-lang.org/
Kotlin: https://kotlinlang.org/
The problem was with the 3rd query "Provide latest version for each language as mentioned on the primary website for that language."That brought back the first result basically unchanged.
So certainly this is a work in progress but very promising.
Also they do share the most blocked/raised/lowered etc sites: https://kagi.com/stats?stat=leaderboard
We've had this problem of "good defaults" before with ad trackers blocking domains. I'm sure it'll be Sooner than later when some community lists become popular and begin being followed en mass
I'd assume right now the SEO target is still mainly Google rather than ChatGPT, but that's only an "I recon" not a citation.
If and when ChatGPT does become the main target for SEO spam, then Googling may start giving good results again.
As of October 31, 2024, the latest version of Java is Java 23, released on September 17, 2024. The most recent Long-Term Support (LTS) version is Java 21, released on September 19, 2023.
Which all seems correct and accurate.
If now my content is going to be ingested and shown by a LLM or AI agent, what's the purpose to give it for free? I know it won't happen, but I would love if this type of agents have to pay to show a summarization of another website. It's only fair when done in mass like this.
If someone only creates for money, only publishes on the web to get people to look at advertisements, well... I think there are plenty of other people who don't feel that way that will fill the void left behind in their departure.
To me it seems weird so many people think the internet only exists because advertising props it up. The internet existed and was a wonderful place before advertising became widespread, and most services and websites will continue to exist after advertising is gone (if that ever happens). What encourages people to believe in some sort of great collapse?
If people stop visiting websites because LLM give them what they want, websites will stop existing. Don't believe me? Check how many "fansites" exist now about topics compared to ten years ago, when there weren't social networks. They have been replaced by influecners with huge followers on Instagram, TikTok, Twitter and more. The same will happen.
- The clickbait, SEO-optimized garbage that today fills 95% of search results could entirely disappear as a business model because they have nothing interesting to offer and the LLM company won't pay for low quality content.
- The average Joe blogging on their website won't go anywhere because they aren't profiting from it to begin with. And the LLM linking back to the page with a reference would be a nice touch. Same logic applies to things like Open Libra and projects that are fundamentally about open information and not about driving ad revenue.
But, on the other hand, I don't think LLM-based search will fundamentally change anything. Ad revenue will get in the way as always and the LLM-based search will start injecting advertisements in its results. How other companies manage to advertise on this new platform will be figured out. What LLM-based search does is give Microsoft and others the opportunity to take down Google as the canonical search engine. A paradigm shift, but not one that benefits the end user.
Frankly the example they posted seems like a fairly happy one, where the user is explicitly implying that they’re seeking a specific physical product to introduce to their life. We’ve all seen where those monetization incentives lead over time though.
But you’re right—not even so much as a tiny word “Ad” like Google does…
Why do I care if Google succeeds or dies?
If anything I want them to die for ad infested they've made the internet. I don't want ads in either chatGPT or Google Search.
If that's what they want to do in this space, which is not a given.
Google will suck all your data even if you pay, and link the entire earth of services to your identity.
For now, chatgpt doesn't care, and I already pay for what they provide.
May they kill Google.
20 years old me would freak out hearing me that, they used to be my heroes.
Unless enough people all pay, the whole thing stops working. But there aren’t enough people who will pay because most people don’t care.
Tldr: the ad supported business model fundamentally doesn’t work if you let all your best products (you) opt out by paying. It requires them to pay an amount far in excess of what they would be willing to pay for the system to work.
The modern maxim is: any content platform large enough to host an ad sales department will sell ads
Vanishingly few (valuable) consumers have zero tolerance for ads, so not selling ads means leaving huge sums of money on the table once you get to a certain scale. Large organizations have demonstrated that they can't resist that opportunity.
The road out is to either convince everyone to have zero tolerance for ads (good luck), to just personally opt for disperse, smaller vendors that distinguish themselves in a niche by not indulging, or to just support and use adversarial ad blockers in order to take personal control. Hoping that the next behemoth that everybody wants to use will protect you from ads is a non-starter. Sooner or later, they're going to take your money and serve you ads, just like the others.
Advertising has far less protection than is ordinarily afforded to the kind of speech you might do as a person.
They did not get addicted to selling ads, have billions in revenue from paying subscribers, and don't have to wean themselves off of ads (as Google and Meta would love to do).
However it refused to search the internet for instances of my email address. Not sure what the benefit of hiding powerful functionality is.
I would like it to work as follows:
- using a model with an extremely large context size, analyze the top n results for a particular search phrase and store it in a vector db and let my chat session interact with it.
- it does not have to return results immediately, it could ask follow-up questions to reduce scope and improve the signal to noise ratio, or decide to augment the initial search on its own, once the searcher's goal is better understood.
I have been using http://www.Perplexity.AI since January 2023 for this exact reason. Unfortunately, since that incredible first UI (of yesteryear) it has been downgraded extensively (including no longer displaying footnotes, just adding numerals to the ends of factual sentences with tooltips).
The greatest thing about Perplexity is still that you do not have to log in (although it will bug you, particularly after lengthier or insightful conversations). Once a search/hybrid/chatGPT (how I've used it for almost yearS now) requires a log-in, it will immediately not be able to compete well with open-facing search engines like Google.
I have this same hesitation about using a pay search engine like Kagi, but am definitely intrigued by some of the other commenters' descriptions of the AI Assistant part of their service offering.
The search engine uses AI to connect people - and it actually works at this point, kinda.
There aren't any ads in their demo, we haven't seen the real deal yet, but I'll be watching HN for that day.
[1] https://www.thestack.technology/microsoft-earnings-openai/
Google was a very profitable business 10 years ago and the search was still decent. In the last decade they absolutely butchered their core product (and the internet along with it) in an effort to squeeze more ad dollars out, because it's not the level of profitability that they need to maintain, but the growth of that profitability.
Microsoft was a ridiculously profitable company, but that is not enough, they must show growth. So they add increasingly user hostile features to their core product because the current crop of management needs to see geometric growth during their 5 year tenure. And then in 5 years, the next crop of goobers will need to show geometric growth as well to justify their bonuses.
Think about this for a moment: the entire ecosystem is built on the (entirely preposterous) premise that there must be constant geometric growth. Nobody needs to make a decision or even accept that this is long term sustainable, every participant just wants the system to keep doing this during their particular 5-10 year tenure.
It's an interesting showcase of essentially an evolutionary algorithm/swarm optimizer falling into a local optimum while a much better global optimum is out of reach because the real world is something like a Rastrigin function with copious amounts of noise with an unknowable but fat tailed distribution.
<rant/> by a hedge fund professional.
I've never heard it framed like this before, that's beautiful.
> how tf is it reading private repos ?!
I usually assume good faith, but in this particular case, I believe the chance that this repo was public before and the author just changed it to private to bait attention is far higher than that Bing/ChatGPT can actually read private repo on GitHub.
I have a private repo named "portland-things" and I asked "does this user have a repo related to portland?" and it responded with "yes it's called 'pdx'" but that's not correct at all.
So in this case at least, GPT Search is far inferior and dangerously incorrect were someone to rely on these search results for weather information.
https://chromewebstore.google.com/detail/chatgpt-search/ejcf...
To the downvoters: you should check your definition of innovation (hint: SV angel)
I’m happy OpenAI is advancing LLM-based search but I won’t be using it in earnest until it’s local.
The fundamental problem is that ads based business model is much more lucrative then subscription based one. It's even more extreme when you take account of a prospective view, since you have control on ads shown which gives you a large margin for future revenue improvements compared to rigid subscription models. Unless you have a way to change this dynamic, you're going to eventually see ads in search results, regardless of its format.
Perplexity sounds like a parody startup name from the Silicon Valley TV show. Way too complicated and unnatural.
It’s all about familiarity. Once people learn it, it’s not hard.
Perplexity is just a nonsensical word (for those unfamiliar with the concept) that is too long and hard to spell. They'd be better off just chopping it down to Lexity, or Lex, or Plexity, or Plex, etc.
At first I thought it was some piano piece like "Mazurkas, Op. 59" by Chopin, or had something to do with some French guy in the AI field.
https://en.wikipedia.org/wiki/Perplexity
Reasonable name for a language model startup.
That said, anecdotally, I find it’s a bit hit-miss: if it’s hit it’s a huge improvement over google (and a minor improvement over chatgpt), if it’s miss it’s still good but get the feeling you won’t get anywhere further by asking more questions.
Longer term, it seems what will be left is “AI Optimized” content, which turns LLM search engines into shills for advertisers. Or these new search engines will have to compensate content producers somehow.
Definitely search is ripe for disruption.
Can confirm that free users who signed up for the waitlist can use it right now (even if they didn't actually get in)
I just did the same search with chatGPT and it gave 6 bullet-point options with a reasonable description, though that was likely based off marketing copy. Half were white-label rebrands of the same option but that's not really ChatGPT's fault, and even then it was the one that best met what I've been looking for.
I'm going to use this as my daily driver for a few weeks.
The contemporary web is basically an epiphenomenon of Google, and they've failed to defend it. I hope OpenAI puts a huge dent in their market share.
Perplexity gave me the correct and best answers, with links to the relevant arxiv papers.
The new ChatGPT search gave me only cadical as answer, plus 2 irrelevant wrong answers (not multi-threaded), but missed all other multi-threaded solvers. => It's crap.
Neither Google nor ddg gave me any relevant links. Couldn't try kagi, since my trial phase is over.
Looks like the fellow who was invited to the Google funeral was right. Google search is dead.
> The Yogar-CBMC and JCBMC solvers are notable multi-threaded variants of the CBMC (C Bounded Model Checker) framework: ...
Followed by further details and references. The search results themselves look relevant and reasonable to me, but again, outside my area of expertise.
More info on model distillation: https://openai.com/index/api-model-distillation/
That being said, for all the talk about how bad google has become, I still prefer it to an unbroken bing.
There's some sticking power/network-effect/sticky-defaults effects, too, though.
It's _trivial_ to do a google search from anywhere on an android device with at most a tap or two. You can probably get close if a 3rd party has a well integrated native app but that'll require work on the user's behalf to make it the default (where possible).
Same goes for the default search engine for browsers/operating systems ... etc.
I will absolutely be firing off queries to google and GPTSearch in parallel and doing a quick comparison between the two. I am especially curious to see how well queries like "I need the PCI-e 4 10-gig SFP+ card that is best supported / most popular with the /r/homelab community" goes. Google struggles to do anything other than link to forums where people are already asking similar questions.
The issue is that “AI search” has been a hot topic for a while now. Google (the default everywhere) just rolled out their version to billions of users. Perplexity has been iterating and acquiring customers for a while. Obviously OpenAI has great potential and brand recognition, but are enough people still interested in switching that haven’t yet?
I guess they could be using Bing as their search backend, which would mostly get around the blocking issue (except for searching Reddit which blocks Bingbot now).
Edit: I understand there is a freerider/economic issue here, unsure how to solve that as the balance between search engine/gen AI systems and content stores/providers becomes more adversarial.
I wonder to what degree -- for example, do they respect the Crawl-delay directive? For example, HN itself has a 30-second crawl-delay (https://news.ycombinator.com/robots.txt), meaning that crawlers are supposed to wait 30 seconds before requesting the next page. I doubt ChatGPT will delay a user's search of HN by up to 30 seconds, even though that's what robots.txt instructs them to do.
The average person just does not discover content without the search engine recommending it.
Me too! I've really started to dislike Google search recently and am super excited we now have more viable options!
You can even search “what [music genre] is playing in [city] this November” and it lists them.
All of those would normally take multiple clicks/manual filtering or ad filled aggregator sites.
It feels like it might be. It feels tasteful in the same way that Apple ecosystem integrations just work really nicely and intuitively. But then again, there is an art to keying and retrieving embeddings, and it might just be that.
I think this might actually be my main pain point with LLMs. Personally, I don’t want this.
I understand it might be helpful for other people. But, I prefer highly specific, advanced search functionality, such as site: or filetype: in google/ddg searches.
scryfall.com for magic the gathering cards is a great example. I’d much prefer typing a few brief flags such as “id=r” instead of “Get me all red identity cards.” And I know I’m getting all red identity cards with scryfall’s current search functionality.
They are also composable, so I can add/drop ones easily instead of perfectly rephrasing a whole sentence because I wanted to change one clause.
I’d need the same level of trust in the LLM’s filtering capabilities as I do in those boolean or regex matching field filters. An escape hatch to hard filters probably would be best for my experience searching things.
We all know how this is going to end.
(I sweat every day as I contemplate my web dev self huddling in the tall shadow of AI apporaching)
This is incredible and a direct threat to Google’s core biz.
As we used to say in the street "garbage in, garbage out": https://chatgpt.com/share/6723e865-d458-8011-b2ef-1a579026e6...
Today I was looking for an old (and useful) chat I had a few months ago and I had to export the whole chat history, wait for the zip file, and write a Python script to find what I was looking for :/
For example, if you ask LLMs to build code using the three.js library, nearly all of them will reference version r128. Presumably because that version has the largest representation in the training data set. Now, you can turn this on and ask it to reference the latest version, and it will search the web and find r170 and the latest documentation to consider in it's response.
I was already doing this before by adding "search the web for the latest version first" in my prompts, now I can just click a button. That's useful.
So they probably wouldn't notice the warning signs anyways.
Of course, layering an LLM on top of garbage will still produce garbage.
The only other way to kill the web without killing LLMs in the process would be to create a way for people to upload structured public content directly into an LLM’s training. That would delay public content into release batches unless training can be sped up significantly.
Hopefully ChatGPT's version works very well. Phind was more of a kludge to demonstrate what combining chat AI and search can do.
What were the limitations you ran into?
For context, I first tried this procession of searches on the Mac OS app.
1. "Who won the world series" 2. Who was the MVP?" 3. "Give me his bio"
My observations:
1. UX: The "search" button feels oddly placed, but I can't put my finger on it. But once I got it is a toggle, it wasn't a bit deal.
2. The first result had 3 logos, headlines and timestamps delineated, and easy to ready. The second one and third ones included a "Sources" button that opened a fly open menu. Clicking those opened a web link. The third result also included images in the fly open.
3. Citations were also inlined. The third result, for the bio, included a citation per paragraph.
4. It wasn't as fast as google. Which makes sense, given it's going through the LLM. But it will take a while to rewire my brain to expect slower responses to search.
5. Overall, I found the chat interface a very intuitive interface.
The second search I asked was "Give me a plan for a Thanksgiving meal."
I to a long response that felt like a weird mashup of LLM-generated content and search results:
1. A list of menu selections
2. Links to some recipes
3. Prepration timeline
4. Shopping list
5. Additional tips
There were 15 citations listed in the popup button, but only 3 inlined.
This was... not great. A traditional list of search results feels better here.
Overall, I like the direction. Innovation in search has been dead for close to 10 years, and this feels like I'd use it for certain inquiries.
I asked “Please find articles about planning a Thanksgiving meal for a family reunion.” It returned links to: GatheredAgain FavFamilyRecipes TastesBetterFromScratch etc. I like that it is returning niche sites I do not know about.
Bonus points for then being able to ask for the results in a specific format.
I’m looking forward to seeing how a feature built search engine starts to look.
Google is instant, why would I wait for a bunch of text to generate just to get basic information.
This can subtly (or not so subtly) rephrase and reshape the way we read about and think about every topic.
I've tried Gemini flash, given it links to websites but it claimed to have knowledge, or be able to read it only so many times (kind of query, "summarize https://foo-bar/news-1")
A former colleague who is mediocre on innovation but arbitrarily good at self promotion (TLDR tried boosting on ads, got a huge win, almost got fired for ethics breaches, did nothing interesting again) had the audacity to post a weird circuit board background with glowing flowing hair about his “innovations at OpenAI”.
He’s not the only total fraud I know there, it’s just that every morally weak person I know went there, almost to a rule, and none of them (safe one) is any fucking good.
GPT-5 never happens. Prove me wrong.
If there is a bias towards chatgpt like tools of even ~5%, it would be worth investigating why this is. My hunch is just the conversational aspect of describing at a high level and finding answers and avoiding all the distraction of several dozen windows to do something is worth it.
I've heard people find RAG not to work very well, is that accurate? Is it just about using the right embeddings?
I suppose ideally you just put the sources in the context window, which becomes limiting with large amounts of text?
I can't believe that I'm saying this, but now after more than 20 years of using Google I'm finally paying for search.
So ChatGPT‘s search looks rather rudimentary compared with Perplexity.
I'd like to see search (or research in broader sense) a more controllable activity with the ability to specify context + sources easily in the form of apps, agents and content.
It sucks but it’ll happen for sure
As a very experienced SEO, this is pretty exciting nonetheless, a new front in the online war opening up.
If they're using their own scraper/search algorithms, it'll be interesting to see how they weigh the winners and losers compared to how Google does it.
That's not an introduction, that's a teaser trailer.
If they want this to be a viable search it needs to be available quickly, and anonymously from something quick to type in.
Google would have been annoying as shit if you had to go to google.com/search , let alone then log in.
1. Mobile apps: Don't want to see intrusive apps
2. YouTube: Don't want to be interrupted with ads and no I don't want to buy premium service.
This kind of permanence is a huge loss
Or is this something they’ve already solved?
Edit: ohh, only Pro users? Right. ok. They made it seem like this was the big search launch and to go to chatgpt.com to get into it. Moving on.
Sure normal search is policed too, but usually not based on moral judgments but on legal necessities.
I think, I don't even want it to talk to me without searching the web first anymore. I want just sources and summaries. So I hope search will perform better.
For someone who's used online keyword-based search since the 1980s (computerised library catalogues, at the time), it's jarring for me to get over the distinction of querying for documents (old school) as opposed to asking direct questions, but that's precisely what LLM-based GPTs facilitate.
And as I'd noted this past June, it's a sea change in online search:
[O]ne of the upsides of GPTs / LLMs is that they provide direct answers to questions, though those answers may be hallucinations or generalisations. Even then, the directness is refreshing, though I expect it to also get polluted rapidly through both AI SEO manipulation and advertising / general enshittification of AI engines.
<https://toot.cat/@dredmorbius/112577405443953191>
I generally rely on Kagi's LLM for various reasons, but foremost is that it relies on current Web search and cites its findings specifically, which makes validating responses and detecting hallucinations far easier. ChatGPT specifically would hallucinate not only its responses but the citations it provided when those were requested, which curbed my enthusiasm greatly. It'll be interesting to see how its search-oriented offering fares.
I strongly agree with Temporal's excellent observation that this is, at least for the moment, a strong shift of the Web back to serving readers rather than advertisers (first and foremost) and publishers (distant second): <https://news.ycombinator.com/item?id=42011414>.
What I strongly suspect is that any successful GPT search tools will be rapidly engrossed by existing search monopolies. For those who defend the "free market" on the basis that competitors can emerge, the countervailing force of mergers and acquisitions must be noted, as well as the fact that these almost always effectively destroy that competitive potential, at least over the past half century or so.
The results seem to be better for strongly represented languages (e.g., English, Spanish, German, French), less so for those which may be less prevalent online (e.g., Yoruba):
( I tried getting the top hackernews posts but it was 5 days old? )
"How much tea tree oil by volume is in Dr Bronners tea tree oil liquid soap?"
A. ... However, the exact volume or percentage of tea tree oil in the formulation is not publicly disclosed by the manufacturer.
(which is incorrect, as the manufacturer disclosed it to me and I published it on the web)
One conclusion is that the web indexing is relatively shallow.
However ...
"Where does the founder of rsync.net live?"
A. "John Kozubik, the founder and CEO of rsync.net, resides in the San Francisco Bay Area."
... and the source is kozubik.com ... which means they did index my page but only retain, or weight, some of it ?
Meanwhile ... ublock showing >3k denials during this five minute interaction. I guess we can conclude something about where they are directing their time and energy.
Does not support search for anyone wondering.
Repeating the same query in the same chat session gave me an accurate answer.
'beef vindaloo recipe from nagi [0], but then just strip all the shit out and give me the recipe'
...and got exactly what I wanted. The web's dead, baby.
[0]: The lady behind recipetineats.com, a terrific recipe site. And she's an A+ human.
Subjectively, I'm not switching for quick searched - google remains lightning fast and is good enough. But I already use gpt/claude/etc for conceptual searches and deeper analysis.
---
[used leica q3] ==> google (product listings and website; Chatgpt told me about the Leica q3 and mentioned ebay)
[value of mac air m1] ==> neither!! (google was useless videos and crap; chatgpt gave me a price range and useful explanation... which made no sense - used was the same price or higher than new!)
[vogue lyrics] ==> google wins (gave me the lyrics; Chatgpt whined about copyright restrictions and sent me to a youtube video)
[weather in nyc] ==> tie (both provided correct, rich detail about the current weather)
[root causes of ww1] ==> tie (both identified Militarism, Alliances, Imperialism, Nationalism, explained each and then mentioned the assassination of the Archduke as the triggering event)
[bohemia to midtown] ==> equally bad (both figured out that it's a request for local directions, but neither just gave me directions until I gave a specific destination)
[bohemia to penn station] ==> ??? (chatgpt correctly gave me bohemia ny where google picked some obscure local listing; otoh chatgpt wrote out directions where google gave me a nice map)
[btc to usd] ==> tie (both got today's price)
[what time is it in stockholm] ==> tie (both got it right)
[iphone 16 vs 14] ==> chatgpt wins (nice comparison; google didn't pop search labs and just gave me websites)
[ffmpeg to clip the last 3 secs of a video] ==> chatgpt ?! (I didn't love either answer TBH)
[456+789] ==> google (both gave the answer, but google included a nice calculator)
...and the stuff people really want:
[porn] ==> google (gpt whined about policy violations; google gave pornhub and other "useful" results)
It's able to query the relevant documentation, put it in its context and then use that to generate code. It's extremely relevant to giving existing models superior functionality.
Only then will the comparison with profitable Search engines be sensible. Before enshitification all VC-backed products are a delight to use, but after the honeymoon phase there’ll be ads all over the results/responses.
Hopefully this also provides a strong negative force against SEO and, again, all the crap that comes nowadays thanks to Google.
Actually i am logged into my iCloud on my macbook so guess that's why im seeing the search on that device of mine (not seeing on another where Im not logged into iCloud).
So it is not it's own search engine and is still using Bing for its results just like the rest of them.
To provide relevant responses to your questions, ChatGPT searches based on your prompts and may share disassociated search queries with third-party search providers such as Bing. For more information, see our Privacy Policy and Microsoft’s privacy policy. ChatGPT also collects general location information based on your IP address and may share it with third-party search providers to improve the accuracy of your results.
https://help.openai.com/en/articles/9237897-chatgpt-search (TFA links to it in the How it works section)Many search engines use the Bing index but return different results
This doesn't matter if the results are user-hostile, as both search engines are.
I don't understand how this point doesn't come up more often. I don't want to waste time to formulate a question when I can just use 2-3 keywords to get a result with traditional search.
Imagine having to type "What does this error message mean..." Everytime you paste an error message into Google/ddg/kagi.
>Currently, I don’t have access to real-time data like time and date. You can check the time on your device or search "current time" online for the latest local time wherever you are.
Oh dear, we're off to a bad and slow start already.
I provided a query to a _somewhat_ common question with Java apps and TLS:
Query:
> identify root cause of this Java exception: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Answer:
> The exception you're encountering, PKIX path building failed, typically indicates an issue with SSL/TLS certificate validation. Specifically, the error unable to find valid certification path to requested target suggests that the Java application cannot find a trusted certificate for the server it is trying to connect to. Here are the common root causes and solutions:
Missing Certificate in Truststore: The server's SSL certificate (or the certificate of its issuing authority) is not present in the Java truststore. Solution: Import the server's certificate into the Java truststore using the keytool command. You can obtain the certificate by using a browser or tools like openssl. Self-Signed Certificate: If the server is using a self-signed certificate, it won't be trusted by default. Solution: Again, import the self-signed certificate into the Java truststore. Incorrect Truststore Configuration: The application might be pointing to an incorrect truststore or not configured to use one at all. Solution: Ensure the application is configured to use the correct truststore and that the path is correctly set. Expired Certificate: The certificate used by the server might be expired. Solution: Check the server's certificate validity and renew it if necessary. Intermediate Certificates Missing: If the server's certificate chain is incomplete (i.e., missing intermediate certificates), it can lead to this error. Solution: Ensure that the complete certificate chain is presented by the server or manually import any missing intermediate certificates. Network Issues: There may be issues with the network or firewalls blocking access to the certificate validation services. Solution: Check network configurations and ensure proper access. To troubleshoot further, you can enable Java SSL debugging by adding the following JVM argument:
bash Copy code -Djavax.net.debug=ssl This will provide more detailed output regarding the SSL handshake and help identify the specific issue.
It’s not a bad suggestion, these are my usual go-tos when diagnosing cause.
But then a query into what the sources are for this information is:
> The information is based on common knowledge about Java security, SSL/TLS configurations, and best practices for handling certificates. It reflects general principles and troubleshooting steps recognized in Java development and security documentation. If you're looking for specific references, the Oracle documentation on Java security and the keytool command, as well as community forums and resources on SSL/TLS issues, often cover these topics.
“It’s just common knowledge bro. RTFM”
The truth is, I haven't used ChatGPT at all since spring of this year. Claude's Sonnet 3.5 has replaced it. I pay very little attention to what OpenAI releases and simply waits for Anthropic to implement it.
I also started using Gemini which already outperforms perplexity and this and will not switch.
I think everybody is constantly caught up with their infatuation with OpenAI and other characters that they don't realize Google, Anthropic are actually building a moat which some like Gary Marcus keeps rambling on as impossible
I'm a realist and I can see that while Google has been slower to start, it reminds me of the search engine wars of 2000s, it is dominating and winning over users.
Related, I haven't paid for Gemini since about a month after release, but the morally corrupt query of "Show me articles from left and right leaning news sites about <headline topic>" would result in Gemini censoring right leaning urls with a "url removed" placeholder and belittling statements about the concerns of showing me right leaning content. Perplexity had no issue with such a dastardly prompts.
I want a tool, not a curated experience, so Gemini is in my "will not use" list for the foreseeable future.
I admit I haven't tried this lately, but I also have no desire to help fund that sort of behavior.
It's not looking good for Google. I'd hope Anthropic and Gemini could capture more consumer market share from OpenAI but it's not looking good. Tell the average person about Claude or Gemini and the only thing you'll hear is "oh so like ChatGPT" artifacts would not be enough to convince them.
I don't think OpenAI dominance in consumer benefits anyone in the long run.
Smell that!? A large part of Google's search business is on fire right now!
There are three types of search: informational, transactional and navigational.
LLM's are competing hard and fast for informational search. Once upon a time we offered 2.5 keywords to the Google Gods only to be ultimately passed to stackoverflow.
That game is up. Google is losing it faster than you can say, "anti-competitive practices in the search engine industry."
Transactional and navigational search remain.