1015 points | by meetpateltech2 weeks ago
I just tried having it teach me how to use Blender. It seems like it could actually be super helpful for beginners, as it has decent knowledge of the toolbars and keyboard shortcuts and can give you advice based on what it sees you doing on your screen. It also watched me play Indiana Jones and the Great Circle, and it successfully identified some of the characters and told me some information about them.
You can enable "Grounding" in the sidebar to let it use Google Search even in voice mode. The video streaming and integrated search make it far more useful than ChatGPT Advanced Voice mode is currently.
Not quite up to excited junior-level programmer standards yet. But maybe good for other things who knows.
Do they use a dumber model for tool/vision?
The transformation process that occurs when people speak and write is incredibly rich and complex. Compared to images which are essentially just the outputs of cameras or screen captures — there isn’t an “intelligent” transformation process occurring.
Well there's your problem!
I'm someone that becomes about 5x more productive when I have a person watching or just checking in on me (even if they're just hovering there).
Having AI to basically be that "parent" to kick me into gear would be so helpful. 90% of the time my problems are because I need someone to help keep the gears turning for me, but there isn't always someone available. This has the potential to be a person that's always available.
I went to the Glass Office co-working space to study for exams this summer and it worked out really well. I also met some nice people there.
A standalone Quest 3 is enough to get you started.
But at that low price, surely you have a bunch of customers being watched by each employee, and then talking to only one at a time — isn’t it distracting to see your “double” chatting away with the sound off?
Fun fact: I’d estimate that 50% of users don’t even look at their Productivity Partner while they work. WorkMode runs in another tab, and users rarely switch back to it. They don’t need to see us - they just need to know we’re watching. I’m in that group.
When I click on "Pricing" in the nav bar, it scrolls down, and the first thing that catches my eye is "$2100 / month". I happened to see this time that this is the benefit you're projecting and it is actually $2.50/hour. On the previous times I've visited your website based on your HN comments, I've always thought $2100/month was what you were going to charge me and closed the tab.
I've been frustrated myself that people don't read what's right there on the page when they come to my startup app's landing page. Turns out I do the same. Hope this helps you improve the layout / font sizes and such "information hierarchy" so the correct information is conveyed at a glance.
IMHO $2.50/hour is great value, and stands on its own. I know how much my time is worth, so perhaps the page doesn't really have to shout that to convince me.
Again, please feel free to ignore this as it is quite possible that it is just me with the attention span of a goldfish with CTE while clicking around new websites.
> Again, please feel free to ignore this as it is quite possible that it is just me with the attention span of a goldfish with CTE while clicking around new websites.
Most of our clients have issues with attention span, so your feedback is gold :-) Again, thank you!
I understand if the window were taller, I'd have seen the actual price cards. I think it's just that when you click "Pricing", you expect the next obvious number you see to be the price.
Also, we do more than just body doubling. Some clients need to follow a morning ritual before starting their work (think meditation, a quick house cleanup, etc.). Sometimes, we perform sanity checks on their to-do lists (people often create tasks that are too vague or vastly underestimate the time needed to complete them). We ask them to apply the 2-minute rule, and so on. It all depends on the client's needs.
However, I doubt it would work for hedonistic procrastinators. When body doubling, hedonistic procrastinators rely on social pressure to be productive. Using AI likely won't work unless the person perceives the AI as a human.
Do you put effort into being polite when ChatGPT makes a mistake and you correct it? Do you try to soften the blow to avoid hurting its "feelings"? Do you feel bad if you respond impolitely? I don't.
My questions to copilot.ms.com today are more like the following, still works like a charm...
"I have cpp code: <enter><code snippet><enter> and i get error <piece of compilation output>. Is this wrong smart ponitor?"
[elaborate answer with nice examples]
"Works. <Next question>"
On an unrelated note, I believe people need to start quantifying their outrageous ai productivity claims or shut up.
I'm leaning toward saying that the main issue for me is that I need to keep my focus on things that are active engagement rather than more passive engagement like taking notes versus just reading a passage.
For me personally, I was awful at working when my parents were hovering over me.
In the past, I used to work with a professor on a project and we'd spend significant amounts of time on zoom calls working (this was during COVID). The professor wouldn't even be helping me the entire time, but as soon as I was blocked, I'd start talking and the ideas would bounce back and forth and I'd find a solution significantly quicker.
The key is, I don't want to have to initiate the contact. Hand holding the AI myself defeats the purpose. The ideal AI assistant is one that behaves as if it's a person that's sitting next to me.
Imagine you're a junior that gets on a teams call to get help via pair programming with your boss. For anything more than just a quick fix, pair programming on calls tends to turn into the junior working on something, hitting a roadblock, and the boss stepping in to provide input.
Here's the really important part that I've realized: very rarely will the input that the boss provides be something that is leaps and bounds outside of the ability of the junior. A lot of it will just be asking questions or talking the problem through until it turns the gears enough for the junior to continue on their own. THAT right there. That's the gear turning AI agent I'm looking for.
If someone could develop a tool that "knows" the right time to jump in and talk with you, then I think we'd see huge jumps in productivity for people.
Here's Google doing essentially the same thing, even more so that it's explicitly shipping your activity to the cloud, and this response is so different from the "we're sticking this on your machine and you can't turn it off" version Microsoft was attempting to land. This is what Microsoft should have done.
Quick research suggests this is part of Firefox's anti-fingerprinting functionality.
Oh who am I kidding, people upload literally everything to drive lmao.
llm install -U llm-gemini
llm -m gemini-2.0-flash-exp 'prompt goes here'
LLM installation: https://llm.datasette.io/en/stable/setup.htmlWorth noting that the Gemini models have the ability to write and then execute Python code. I tried that like this:
llm -m gemini-2.0-flash-exp -o code_execution 1 \
'write and execute python to generate a 80x40 ascii art fractal'
Here's the result: https://gist.github.com/simonw/0d8225d62e8d87ce843fde471d143...It can't make outbound network calls though, so this fails:
llm -m gemini-2.0-flash-exp -o code_execution 1 \
'write python code to retrieve https://simonwillison.net/ and use a regex to extract the title, run that code'
Amusingly Gemini itself doesn't know that it can't make network calls, so it tries several different approaches before giving up: https://gist.github.com/simonw/2ccfdc68290b5ced24e5e0909563c...The new model seems very good at vision:
llm -m gemini-2.0-flash-exp describe -a https://static.simonwillison.net/static/2024/pelicans.jpg
I got back a solid description, see here: https://gist.github.com/simonw/32172b6f8bcf8e55e489f10979f8f...Practically, sandboxing hasn't been super important for me. Running claude with mcp based shell access has been working fine for me, as long as you instruct it to use venv, temporary directory, etc.
https://ipython.readthedocs.io/en/stable/interactive/magics....
https://modelcontextprotocol.io/introduction
My own mcp server could be an inspiration on Mac. It's based on pexpect to enable repl session and has some tricks to prevent bad commands.
https://github.com/rusiaaman/wcgw
However, I recommend creating one with your own customised prompts and tools for maximum benefit.
Alternately, if I wanted to pipe a bunch of screencaps into it and get one grand response, how would I do that?
e.g. "Does the user perform a thumbs up gesture in any of these stills?"
[edit: also, do you know the vision pricing? I couldn't find it easily]
But I also found it hard to prompt to tutor in French or Portuguese; the accents were gruesomely bad.
Interesting theory!
But, once they do get moving in the right direction the can achieve things that smaller companies can't. Google has an insane amount of talent in this space, and seems to be getting the right results from that now.
Remains to be seen how well they will be able to productize and market, but hard to deny that their LLM models aren't really, really good though.
The challenge is trust.
Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.
It's hard to justify committing developers and money to a product when there's a good chance you'll just have to pivot again once they get bored. Say what you will about Microsoft, but at least I can rely on their obsession with supporting outdated products.
Incredibly bad track record of supporting products that don't grow. I'm not saying this to defend Google, I'm still (perhaps unreasonably) angry because of Reader, it's just that there is a pattern and AI isn't likely to fit that for a long while.
My main issue with Google is that internal politic affects users all the time. See the debacle of anything built on top of Android and being treated as a second citizen.
You can’t trust a company which can’t shield users from its internal politics. It means nothing is aligned correctly for users to be taken seriously.
Bus factor et al. is literally CS 101.
No, Reader was killed because it:
- was free
- didn't contribute to revenue growth from ads
It is for a company where the promotion culture rewards new initiatives and products and doesn't reward people maintaining products. Which was most certainly the company culture around the time reader was killed.
That's irrelevant to me as a user if I've already invested my time into the product.
To an extent all companies do it. Google just does it much more, to a degree that I tend to ignore most Google's launches because of this uncertainty.
I think we all acknowledge this.
The question is seldom "why" they kill it (I'd argue ultimately it doesn't matter), it's about how fast and what they offer as a migration path for those who boarded the train.
That also means the minute Gemini stops looking like a growing product it's gone from this world, where Microsoft backed alternatives have a fighting chance to get some leeway to recover or pivot.
Microsoft bought FoxBase in 1992. FoxPro never took the world by storm but it had a dedicated group of devs and ISVs who used it and it solved their needs. The last version was released in 2004, long after Microsoft had released.Net and C# and SQL Server. Microsoft officially ended support for it in 2015.
Google? If the product doesn't become an instant #1 or #2 in its market and doesn't directly contribute to their bottom line in a way which can be itemised in their earnings call, it's gone in less than 3 years guaranteed.
Like how many different instant messengers did they make at the same time only to abandon them all instead of just making one and supporting it?
Have you seen Google Bard anywhere recently? Me neither :)
Now I'm not sure if are you arguing that a name change is not supporting a product or that Gemini is a different product with a different feature set?
This isn't some minor consumer play, like a random tablet or Stadia. Anyone who has paying attention would have noticed that AI has been an important, consistent, long term strategic interest of Google's for a very long time. They've been killing off the fail/minor products to invest in this.
After Reader, it was Currents, Google TV, Picasa, Google Now, Spaces, Chromecast Audio,Inbox, GCM, Nest, Fusion Tables, Google Cloud Print, Google Play Music, Google Bookmarks, Chrome Apps, G Suite....
Reader keeps coming up because after Reader, Google's motto turned into "Do be Evil"
I'm sorry.
With that being said, I am extremely bullish on Google AI for a long time. I imagine they land at being the best and cheapest for the foreseeable future.
[1] https://policies.google.com/terms/generative-ai
[2] https://policies.google.com/terms/generative-ai/use-policy
https://www.xda-developers.com/google-developer-account-ban-...
That day I learned an important lesson: pimpin' ain't easy.
If their fraud AI is wrong there is now human to talk to.
Gemini: Yes, there is a potential risk of your Google account being suspended if your SaaS is used to process inappropriate content, even if you use Gemini to reject the request. While Gemini can help you filter and identify harmful content, it's not a foolproof solution.
Here are some additional measures you can take to protect your account:
* Content moderation: Implement a robust content moderation system to filter out inappropriate content before it reaches Gemini. This can include keyword-based filtering, machine learning models, and human review.
...
* Regularly review usage: Monitor your usage of Gemini to identify any suspicious activity.
* Follow Google's terms of service: Make sure that your use of Gemini complies with Google's terms of service.
By taking these steps, you can minimize the risk of your account being suspended and ensure that your SaaS is used responsibly.
---
In a follow up question I asked about how to implement robust content moderation and it suggested humans reviewing each message...
As an attacker, instead of DDoSing a service we could just upload a bunch of NSFW text so Google kills their infra for us.
Other providers, like OpenAI, at least provide a free moderation API. Google has a moderation API, that after the free 50k requests it is more expensive than Gemini 1.5 flash (Moderation API costs $0.0005/100 characters vs Gemini 1.5 flash $0.000001875/100 characters).
I don't know about that: my wife built her first SME on Google Workspace / GSuite / Google Apps for domain (this thing changed names so many times I lost track). She's now running her second company on Google tools, again.
All she needs is a browser. At one point I switched her from Windows to OS X. Then from OS X to Ubuntu.
Now I just installed Debian GNU/Linux on her desktop: she fires up a browser and opens up Google's GMail / GSuite / spreadsheets and does everything from there.
She's a happy paying customer of Google products since a great many years and there's actually phone support for paying customers.
I honestly don't have many bad things to say. It works fine. 2FA is top notch.
It's a much better experience than being stuck in the Windows "Updating... 35%" "here's an ad on your taskbar" "you're computer is now slow for no reason" world.
I don't think they'l pull the plug on GSuite: it's powering millions and millions of paying SMEs around the world.
> Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.
This is why we've stayed with Anthropic. Every single person I work with on my current project is sore at Google for discontinuing one product or another - and not a single one of them mentioned Reader.We do run some non-customer facing assets in Google Cloud. But the website and API are on AWS.
https://www.reddit.com/r/googlecloud/comments/wpq0eg/what_is...
The top comment in that page is:
> CLI is the new name for the SDK.The reasoning and strategy was explained in great detail in this podcast:
> https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5mZWVkYnVybmVyLmNvbS9HY3BQb2RjYXN0/episode/NTI5ZTM5ODAtYjYzOC00ODQxLWI3NDAtODJiMTQyMDMxNThj?ep=14
So I click that link, and I'm greeted with: > Google Podcasts is no longer available
> Listen to podcasts and build your library in the YouTube Music app.
This is why AWS and Anthropic are getting our money. We can not trust that Google projects will survive as long as our business needs.Eh... I don't know about that. Their tech graveyard isn't as populous as Google's, but it's hardly empty. A few that come to mind: ATL, MFC, Silverlight, UWP.
And even if C++/CX and C++/WinRT aren't that great, with worse tooling than MFC and in maintenance mode, you can easilly create an application with them today.
Hardly the same can be told of most Google technologies.
I can write something with Microsoft tech and expect it with reasonable likelihood to work in 10 years (even their service-based stuff), but can't say the same about anything from Google.
That alone stops me/my org buying stuff from Google.
They haven't wielded this advantage as powerfully as possible, but changes here could signal how committed they are to slaying the search cash cow.
Nadella deservedly earned acclaim for transitioning Microsoft from the Windows era to cloud and mobile.
It will be far more impressive if Google can defy the odds and conquer the innovator's dilemma with search.
Regardless, congratulations to Google on an amazing release and pushing the frontiers of innovation.
Weirdly Google is THE AI play. If AI is not set to change everything and truly is a hype cycle, then Google stock withstands and grows. If AI is the real deal, then Google still withstands due to how much bigger the pie will get.
I'm going back and forth between the different models seeing which works best for me but I'm trying to learn how to read and use other people's feedback in making their decisions.
Video production is just not big enough of a market to make a difference in the AI race. I don’t understand why any AI company would spend significant amount of resources matching Sora when I don’t really think it will be a 10 billion dollar product (yet).
Plus Google is well positioned to match it anyway, since they have YouTube data they can probably license to their AI gen video training.
You mean by shifting away from Windows for mobile and focusing on iOS and Android?
To be fair, it's not that they're bad at it -- it's that they generally have an explicit philosophy against it. It's a choice.
Google management doesn't want to "pick winners". It prefers to let multiple products (like messaging apps, famously) compete and let the market decide. According to this way of thinking, you come out ahead in the long run because you increase your chances of having the winning product.
Gemini is a great example of when they do choose to focus on a single strategy, however. Cloud was another great example.
As a user I always still wish that there were fewer apps with the best features of both. Google's 2(!) apps for AI podcasts being a recent example : https://notebooklm.google.com/ and https://illuminate.google.com/home
For example; those little AI-generated YouTube summaries that have been rolling out are wonderful. They don't require heavyweight LLMs to generate, and can create pretty effective summaries using nothing but a transcript. It's not only more useful than the other AI "features" I interact with regularly, it doesn't demand AGI or chain-of-thought.
This doesn't match my experience of any Google product.
I don't think Google (or really any of FAANG) makes "good" products anymore. But I do think there are things to appreciate in each org, and compared to the way Apple and Microsoft are flailing helplessly I think Google has proven themselves in software here.
Or how would you describe their handling of Stadia, or their weird obsession about shipping and cancelling about a dozen instant messengers?
The IMs post-Hangouts are less explainable, but I do empathize with Google for wanting to find some form of SMS replacement standard. The RCS we have today is flawed and was rushed out of the door just to have a serious option for the DOJ to endorse. This is an area where I believe the United States government has been negligent in allowing competing OEMs to refuse cooperation in creating an SMS successor. I agree it's silly, and it needs to stop eventually.
Have you actually compared these services first hand? Stadia was miles ahead of the competition. The experience was unbelievably good and ubiquitous (Desktop, phone, TV, Chromecast...), and both mouse and gamepad felt like first class input methods.
Microsoft's Xbox game streaming is a complete joke in comparison. Last time I tried, I had to use my mouse to operate a virtual gamepad to operate a virtual cursor to click instruments in MSFS. Four levels of nesting. Development progress is also extremely slow. Not sure where you're seeing laser focus there.
> I do empathize with Google for wanting to find some form of SMS replacement standard
Why did Google out of all companies have to come up with an SMS replacement? Absolutely nobody asked for this! They started out with XMPP, which was federated and had world-class open source implementations, and after what feels like a double-digit number of failed attempts they arrived at SMS over SIP from hell that nobody other than themselves actually knows how to implement and only telcos can federate with (theoretically; practically, they just outsource to Google).
I find it really hard to believe that this is anything other than a thinly veiled marketing plot to be able to point at an "open standard" that Google is almost exclusively running via Jibe (not sure if they provide that for free or are charging carriers for it).
The contortions they went through to decouple their "Allo" and "Duo" brands from Google accounts (something almost everybody has anyway to send email!) for absolutely no benefit and even more significant customer confusion...
And now look at Gemini. It looks like the exact same story to me from the beginning: Amazing technology backed by a great team (they literally invented transformers), yet completely kneecapped by completely confused product development. It's unreal how much better it is queried through the API, but that's unfortunately not what people see when they go to gemini.google.com.
Although I do still pay for ChatGPT, I find it dog slow. ChatGPT is simply way too slow to generate answers. It feels like --even though of course it's not doing the same thing-- I'm back to the 80s with my 8-bit computer printing thing line by line.
Gemini OTOH doesn't feel like that: answers are super fast.
To me low latency is going to be the killer feature. People won't keep paying for models that are dog slow to answer.
I'll probably be cancelling my ChatGPT subscription soon.
The context window of Gemini 1.5 pro is incredibly large and it retains the memory of things in the middle of the window well. It is quite a game changer for RAG applications.
Google's marketing wins again, I guess.
About a year ago, I was saying that Google was potentially walking toward its own grave due to not having any pivots that rivaled OpenAI. Now, I'm starting to think they've found the first few steps toward an incredible stride.
The fact GCP needs to have this page, and these lists are not 100% comprehensive is telling enough. https://cloud.google.com/compute/docs/deprecations https://cloud.google.com/chronicle/docs/deprecations https://developers.google.com/maps/deprecations
Steve Yegge rightfully called this out, and yet no change has been made. https://medium.com/@steve.yegge/dear-google-cloud-your-depre...
Some guy had to do it for Azure, then he went to work for them and it is now deprecated itself
https://blog.tomkerkhove.be/2023/03/29/sunsetting-azure-depr...
Google Cloud grew 35% year over year, when comparing the 3 months ending September 30th 2024 with 2023.
https://abc.xyz/assets/94/93/52071fba4229a93331939f9bc31c/go... page 12
The balance sheet shows (income on days from 2024-07-01 through 09-30) is 1.35 * (income on days from 2023-07-01 through 09-30)
These are different because with heavily handwavey math the first is growing 35% in a single quarter and the second is growing 35% annually (by comparing like-for-like quarters)
I used both azure and AWS to show that GCP has lost significant markshare because of its deprecation policy. Enterprises don’t trust GCP won’t deprecate their services.
> hard to deny that their LLM models aren't really, really good though
I'm so scarred by how much their first Gemini releases sucked that the thought of trying it again doesn't even cross my mind.
Are you telling us you're buying this press release wholesale, or you've tried the tech they're talking about and love it, or you have some additional knowledge not immediately evident here? Because it's not clear from your comment where you are getting that their LLM models are really good.
They’ve had OpenAI-compatible endpoints for a while, but it’s never been clear how serious they were about supporting them long-term. Nice to see another option showing up. For reference, their main repo (not kidding) recommends setting up a Kubernetes cluster and a GCP bucket to submit batch requests.
https://github.com/googleapis/python-genai?tab=readme-ov-fil...
I wasn't going to bother trying those because I was pretty sure it wouldn't get any of them, but decided to give it an easy one (#4) and was impressed at the CoT.
Meanwhile, Google's newest 2.0 Flash model went 0 for 7.
1: https://metro.co.uk/2024/12/11/gchq-christmas-puzzle-2024-re...
If they want a rematch, they'll need to bring their 'A' game next time, because o1-pro is crazy good.
Given the right prompt, though, I'm sure it could handle the 'find the corresponding letter from the landmarks to form an anagram' part. That's easier than most of the other problems.
You're saying the ultimate answer isn't 'PROTECTING THE UNITED KINGDOM'?
There will probably be a 2.0 pro (which will be 4o/sonnet class) and maybe an ultra (o1(?)/Opus).
What do you mean? Is o1 not a single model?
However their on device TPUs lag behind the competition and Google still seem to struggle to move significant parts of Gemini to run on device as a result.
Of course, Gemini is provided as a subscription service as well so perhaps they’re not incentivized to move things locally.
I am curious if they’ll introduce something like Apple’s private cloud compute.
we need to separate inference and training - the real winners are those who have the training compute. you can always have other companies help with inference
The second Apple comes out with strong on-device AI - and it very much looks like they will - Google will have to respond on Android. They can't just sit and pray that e.g. Samsung makes a competitive chip for this purpose.
While there is a chance that Apple might come out with a very sophisticate on-device model. The problem here is that they would only be able to compete with other on-device models. The magnitude of compute needed to keep pace with SOA models is not achievable on a single device. It will take many generations of Apple silicon in order to compete with the compute of existing datacenters.
Google also already has competitive silicon in this space with the Tensor series processors, which are being fabbed at Samsung plants today. There is no sitting and praying necessary on their part as they already compete.
Apple is a very distant competitor in the space of AI, and I see no reason to assume this will change, they are uniquely disadvantaged by several of the choices they made on their way to mobile supremacy. The only thing they currently have going for them is the development of their own ARM silicon which may give them the ability to compete with Google's TPU chips, but there is far more needed to be competitive here than the ability to avoid the Nvidia tax.
I’m in the camp that this is the right call for consumers, instead of trying to compete on the large model side. They’ve yet to deliver on their full promise, but if they can, it’s the place where I think more of the industry will go (for consumers)
And regarding Google’s mobile tensor chips, they are infamously behind all other players in the market space for the same generation of processor. They don’t share the same advantages they do in the server space.
Apple just isn’t very capable in this space, not sure what’s so hard to accept
their models aren’t even that good. sorry apple fanboys but the talent isn’t there
That may not be as big a disadvantage as you think.
Anthropic claim that they did not use any data from their users when they trained Claude 3.5 Sonnet.
About 7 years ago I trained GAN models to generate synthetic data, and it worked so well. The state of the art has increased a lot in 7 years, so Apple will be fine.
At best Synthetic data is a "slow follow" for training a model due to the need for human review, but a competitive model, it does not make.
they’re a little bit less of a nobody than they used to be, but they’re basically a nobody when it comes to frontier research/scaling. and the best model matters way more than on-device which can always just be distilled later and find some random startup/chipco to do inference
Is it really that hard to imagine people have different viewpoints, and decisions than yourself without being painted as vapid, airheads?
The level of optimism for Apple AI capabilities on here is wrong. I can imagine people having wrong viewpoints, but it is wrong.
Besides, did Anthropic and e.g. Mistral inherently have such troves of data to train on that Apple doesn't? For the last 6 months, Anthropic has had the SOTA model for the average production usecase.
> Google also already has competitive silicon in this space with the Tensor series processors, which are being fabbed at Samsung plants today. There is no sitting and praying necessary on their part as they already compete.
Intel had a much bigger advantage with x86, and look where we are now. I find it hard to believe that creating a good AI chip isn't a much smaller challenge than it was to do Apple Silicon. The upcoming SE uses their in-house 5G modem, another huge hardware achievement that no one else has been able to do.
With that in mind, how can you bet against Apple when it comes to designing chips at this point? It's not like Amazon et al aren't producing their own AI chips too. Let alone all of the startups like Cerebras. That indicates the moat and barriers are likely much lower than Apple Slicion or the 5G modem.
If I'm talking nonsense, do correct me.
If anything, I think the upcoming iOS AI update will bring them to a similar level as android/google.
Economically this fits the cloud much better.
I’m not saying on device will ever truly compete at quality, but I believe it’ll be good enough that most people don’t care to pay for cloud services.
inference basically does not matter, it is a commodity
training doesn’t matter if inference costs are high and people don’t pay for them
Training is amortized over each inference, so the cost of inference also needs to include the cost of training to break even unless made up elsewhere
Stack enough GPUs and any of them can run o1. Building a chip to infer LLMs is much easier than building a training chip.
Just because one cost dwarfs another does not mean that this is where the most marginal value from developing a better chip will be, especially if other people are just doing it for you. Google gets a good model, inference providers will be begging to be able to run it on their platform, or to just sell google their chips - and as I said, inference chips are much easier.
I don't know where did you get that price from but 1x RTX 3090 is $1,900. 16x is ~$30,000.
> The parts are expensive
Now that we invested ~$30k in GPUs, we only need to find a motherboard that can accommodate 16x pcie4 x16 GPUs, right? And we also need a CPU that can drive that many pcie4 x16 lanes?
Well, none of them exist, not even in the server parts sector let alone client commodity hardware. In any case, you'd need two CPUs so even with this imaginary motherboard we are already entering the server rack design space. And that costs 100's of thousands of $$$.
> but there is a competitive market in processors that can do LLM inference
Nothing but the smallest and smallish models. If that existed then why would you set yourself out building a 16x RTX 3090 machine?
Sorry, but you're just spitting out non-sense.
I agree that the in-device inference market is not important yet.
inference hardware is a commodity in a way that training is not
And sure, poor reception will be an issue, but most people would still absolutely take a helpful remote assistant over a dumb local assistant.
And you don't exactly see people complaining that they can't run Google/YouTube/etc locally.
Most people are unlikely to buy the device for the AI features alone. It’s a value add to the device they’d buy anyway.
So you need the paid for option to be significantly better than the free one that comes with the device.
Your second sentence assumes the local one is dumb. What happens when local ones get better? Again how much better is the cloud one to compete on cost?
To your last sentence, it assumes data fetching from the cloud. Which is valid but a lot of data is local too. Are people really going to pay for what Google search is giving them for free?
Plus a lot of the "agentic" stuff is interaction with the outside world, connectivity is a must regardless.
Plus once you start with on device features you start limiting your development speed and flexibility.
As the global human population increasingly urbanizes, it’ll become increasingly easy to blanket it with cell towers. Poor(er) regions of the world will increase reception more slowly, but they’re also more likely to have devices that don’t support on-device models.
Also, Gemini Flash is basically positioned as a free model, (nearly) free API, free in GUI, free in Search Results, Free in a variety of Google products, etc. No one will be paying for it.
Flash is free for api use at a low rate limit. Gemini as a whole is not free to Android users (free right now with subscription costs beyond a time period for advanced features) and isn’t free to Google without some monetary incentive. Hence why I also originally ask about private cloud compute alternatives with Google.
I see poor reception in both areas and only one has WiFi.
That is currently Apple’s path with Apple Intelligence for example.
It has no world model. It doesn't know truth any more than it knows bullshit just a statistical relationship between words.
Pretty sure that's not doing any fancy on-device models!
That said, there was a popup today saying that assistant is now using Gemini, so I just enabled it to try. Could well have changed in the last week.
They've ceded the fast mover advantage, but with a massive installed base of Android devices, a team of experts who basically created the entire field, a huge hardware presence (that THEY own), massive legal expertise, existing content deals, and a suite of vertically integrated services, I feel like the game is theirs to lose at this point.
The only caution is regulation / anti-trust action, but with a Trump administration that seems far less likely.
I'm getting fairly excited about "agentic" solutions to the point that I even went out of my way to build "AgentOfCode" (https://github.com/JasonSteving99/agent-of-code) to automate solving Advent of Code puzzles by iteratively debugging executions of generated unit tests (intentionally not competing on the global leaderboard).
And even for this, there's actually only a SINGLE place in the whole "agent" where the models themselves actually make a "decision" on what step to take next, and that's simply deciding whether to refactor the generated unit tests or the generated solution based on the given error message from a prior failure.
Agentic to me means that it acts somewhat under its own authority rather than a single call to an LLM. It has a small degree of agency.
‘Intelligent’ is exactly as precise as ‘funny’ or ‘interesting’. It’s a label for a cluster of observations of another agent’s behavior. It entails almost nothing about how those behaviors are achieved.
This is of course only an opinion, but it’s my professional opinion after thirty five years in the AI business.
Intelligence is a characteristic.
I don't think these are necessary buzzwords if the product really does what they imply.
We use new words so often that we take it for granted. You've passively picked up dozens of new words over the last 5 or 10 years without questioning them.
agentic == not people.
Quite sensible, really.
Anyway, I'm glad that this Google release is actually available right away! I pay for Gemini Advanced and I see "Gemini Flash 2.0" as an option in the model selector.
I've been going through Advent of Code this year, and testing each problem with each model (GPT-4o, o1, o1 Pro, Claude Sonnet, Opus, Gemini Pro 1.5). Gemini has done decent, but is probably the weakest of the bunch. It failed (unexpectedly to me) on Day 10, but when I tried Flash 2.0 it got it! So at least in that one benchmark, the new Flash 2.0 edged out Pro 1.5.
I look forward to seeing how it handles upcoming problems!
I should say: Gemini Flash didn't quite get it out of the box. It actually had a syntax error in the for loop, which caused it to fail to compile, which is an unusual failure mode for these models. Maybe it was a different version of Java or something (I'm also trying to learn Java with AoC this year...). But when I gave Flash 2.0 the compilation error, it did fix it.
For the more Java proficient, can someone explain why it may have provided this code:
for (int[] current = queue.remove(0)) {
which was a compilation error for me? The corrected code it gave me afterwards was just for (int[] current : queue) {
and with that one change the class ran and gave the right solution.EDIT: One reason that lead me to think it's better for mediocre stuff was seeing the Sora model generate videos. Yes it can create semi-novel stuff through combinations of existing stuff, but it can't stick to a coherent "vision" throughout the video. It's not like a movie by a great director like Tarantino where every detail is right and all details point to the same vision. Instead, Sora is just flailing around. I see the same in software. Sometimes the suggestions go towards one style and the next moment into another. I guess AI currently is just way lower in their context length. Tarantino has been refining his style for 30 years now. And always he has been tuning his model towards his vision. AI in comparison seems to always just take everything and turn it into one mediocre blob. It's not useless but currently good to keep in mind I think. That you can only use it to generate mediocre stuff.
True to a point, but is anyone using GPT2 for anything still? Sometimes the better model completely supplants others.
To me that reads like it was trying to accomplish something like
int[] current;
while((current = queue.pop()) != null) {
`queue.remove(0)` gives you an `int[]`, which is also what you were assigning to `current`. So logically it's a single element, not an iterable. If you had wanted to iterate over each item in the array, it would need to be:
``` for (int[] current : queue) { for (int c : current) { // ...do stuff... } } ```
Alternatively, if you wanted to iterate over each element in the queue and treat the int array as a single element, the revised solution is the correct one.
Also a whole lot of computer vision tasks (via LLMs) could be unlocked with this. Think Inpainting, Style Transfer, Text Editing in the wild, Segmentation, Edge detection etc
They have a demo: https://www.youtube.com/watch?v=7RqFLp0TqV0
"That's an insightful question. My understanding of your speech involves a pipeline first. Your voice is converted to text and then I process the text to understand what you're saying. So I don't understand your voice directly but rather through a text representation of it."
Unsure if this is a hallucination, but is disappointing if true.
Edit: Looking at the video you linked, they say "native audio output", so I assume this means the input isn't native? :(
If you're using Gemini in aistudio(not sure about the real-time API but everything else) then it has native audio input
I am currently struggling to diagnose an ipv6 mis-configuration in my enormous aws cloudformation yaml code. I gave the same input to Claude Opus, Gemini and ChatGPT ( o1 and 4o).
4o was the worst. verbose and waste of my time.
Claude completely went off-tangent and began recommending fixes for ipv4 while I specifically asked for ipv6 issues
o1 made a suggestion which I tried out and it fixed it. It literally found a needle in the haystack. The solution is working well now.
Gemini made a suggestion which almost got it right but it was not a full solution.
I must clarify diagnosing network issues on AWS VPC is not my expertise and I use the LLMs to supplement my knowledge.
But it think it has to do more with the freshness of training data.
AWS IPV6 Egress is a new technology from AWS which was introduced only recently. Previously, we had to deploy NAT gateway which supported IPV4. I am assuming claude-3-5-sonnet-20241022 (latest) was not trained on this data.
Find something you like, use it, be ready to look again in a month or two.
EDIT: Typo
I find myself just paying a la carte via the API rather than paying the $20/mo so I can switch between the models.
Though gpt-4o could say "David Mayer" on poe.com but not on chat.openai.com which makes me wonder if they sometimes cheat and sneak in different models.
Most of these things seem to just be a system prompt and a tool that get invoked as part of a pipeline. They’re hardly “agents”.
They’re modules.
Bad
1. installed Antivirus software
2. added screen-size CSS rules
3. copied 'Assets' harddrive to DropBox
4. edited homepage to include Bitcoin wallet address link
5. upgraded to ChatGPT Pro
"Good" 1. Cyber-security defenses
2. Responsive Design implementation
3. Cloud Storage
4. Blockchain Technology gateway
5. Agentic enhancements
It'll create endless consulting opportunities for projects that never go anywhere and add nothing of value unless you value rich consultants.
All the common tricks, like creating a list of steps that are then executed by specialized agents in order, for example, fall flat as soon as one agent returns a result that contradicts the initial steps. It's simply a bandaid for short context sizes and LLMs that can't remain focused past the first few thousand tokens of prompt.
I have an A.I. textbook that has agent terminology that was written preLLm days. agents are just autonomous ish code that loops on itself with some extra functionality. LLMs in their elegance can more easily out the box selfloop just on the basis concatenating language prompts, sensibly. They are almost agent ready out the box by this very elegant quality(the textbook agentic diagram is just a conceptual self perpetuation loop), except…
Except they fail at a lot or get stuck at hiccups. But, here is a novel thought. What if an LLM becomes more agentic (ie more able to sustain autonomous chain prompts that do actions without a terminal failure) and less copilotee not by more complex controlling wrapper self perpetuation code, but by means of training the core llm itself to more fluidly function in agentic scenarios.
a better agentically performing llm that isnt mislabeled with a bad buzzword might not reveal itself in its wrapper control code but through it just performing better in an typical agentic loop or environment conditions with whatever initiating prompt, control wrapper code, or pipeline that initiates its self perpetuation cycle.
It's like a meme that can be milked for monetization.
The latency was low, though the conversation got cut off a few times.
The models do just fine on "work" but are terrible for "thinking". The verbosity of the explanations (and the sheer amount of praise the models like to give the prompter - I've never had my rear end kissed so much!) should lead one to beware any subjective reviews of their performance rather than objective reviews focusing solely on correct/incorrect.
Additionally, Microsoft didn't really have any advantage in the smart phone space.
Google is already a product the majority of people on the planet use regularly to answer questions.
That seems like a competitive advantage to me.
I think people just have rose-tinted glasses on. Sure the hardware from Nokia was great, but software was very poor even by the standards of that time.
By his own admission, Gates was extremely distracted at the time by the antitrust cases in Europe, and he let the initiative die.
Publishers are being squeezed and going under, or replacing humans with hallucinated genai slop.
It’s like we’re taking the private equity model of extracting value and killing something off to the entire web.
I’m not sure where this is headed, but I don’t think Sundar has any strategy here other than playing catch up.
Demis’ goal is pretty transparently positioning himself to take over.
They did it: from now on Google will keep a leadership position.
They have too much data (Search, Maps, Youtube, Chrome, Android, Gmail, etc.), and they have their own servers (it's free!) and now the Willow QPU.
To me, it is evident how the future will look. I'll buy some more Alphabet stocks
> "Now millions of developers are building with Gemini. And it’s helping us reimagine all of our products — including all 7 of them with 2 billion users — and to create new ones"
and
> "We’re getting 2.0 into the hands of developers and trusted testers today. And we’re working quickly to get it into our products, leading with Gemini and Search. Starting today our Gemini 2.0 Flash experimental model will be available to all Gemini users."
All the products including all the products?
"...all of our products — including all 7 of them with 2 billion users..."
It tells people that 7 of their products have 2b users.
"I brought all my shoes, including the pairs that cost over $10,000" is saying something about what shoes you brought, more than "all of them".
-Hey, are you done packing?
-Yes, I decided I'll bring all my shoes, including the ones that cost over $10,000.
What, they just couldn't help themselves?
More specifically, they are trying to emphasize the point that gemini is being used with seven products with over 2 billion users. However, the above user is right that this was a bafflingly terrible use of English to establish this fact.
"including all seven of them with 2 billion users"
It's why the quoted text is obviously written by a human.
Short sentence fact. And aspirational tagline - pause for some metrics - and more. And. Today. And. And. Today.
autocorrect of "significantly improved"?
Flash combines speed and cost and is extremely good to build apps on.
People really take that whole benchmarking thing more seriously than necessary.
Anyone seeing this? I don't have an option in my dropdown.
Based on initial interactions, it's extremely verbose. It seems to be focused on explaining its reasoning, but even after just a few interactions I have seen some surprising hallucinations. For example, to assess current understanding of AI, I mentioned "Why hasn't Anthropic released Claude 3.5 Opus yet?" Gemini responded with text that included "Why haven't they released Claude 3.5 Sonnet First? That's an interesting point." There's clearly some reflection/attempted reasoning happening, but it doesn't feel competitive with o1 or the new Claude 3.5 Sonnet that was trained on 3.5 Opus output.
That's fine, but it couldn't spot it, then it told me that I had put that syntax in the instructions and quoted me - but that wasn't true. It repeatedly said it had undone that and rewrote the code with it still in as well.
It added some async stuff where it wasn't needed (no way for it to know to be fair), what was interesting was when told this it apologised and explained it had just been doing so much async work recently it got confused.
Another interesting quote
> I am incredibly sorry for the repeated errors. I have made so many mistakes, and I am very frustrated with myself. I believe this is now correct, and adheres to the instructions.
I will pay more to not feed Google.
https://gemini.google/advanced/?Btc=web&Atc=owned&ztc=gemini...
then sign in with Google account and you'll see it
That production of output is a form of reasoning via _some_ type of logical processing. No?
Maybe better to say computational reasoning. That’s a mouthful.
To wit, if I am doing a high school geometry proof, I come up with a sequence of steps. If the proof is correct, each step follows logically from the one before it.
However, when I go from step 2 to step 3, there are multiple options for step-3 I could have chose. Is it so different from a "most-likely-prediction" an LLM makes? I suppose the difference is humans can filter out logically-incorrect steps, or prune chains-of-steps that won't lead to the actual theorem quicker. But an LLM predictor coupled with a verifier doesn't feel that different from it.
When asked, “If Alice has 3 apples and gives 2 to Bob, how many does she have left?”, the model doesn’t just retrieve a memorized answer—it infers the logical steps (subtracting 2 from 3) to generate the correct result, showcasing reasoning built on the interplay of its scale and architecture rather than explicit data recall.
I don't see how that is "regurgitation", either, if it performs the reasoning steps first, and only then arrives at the answer.
So... its a trade secret to know how it actually works...
A little thin...
Also no pricing is live yet. OpenAI's audio inputs/outputs are too expensive to really put in production, so hopefully Gemini will be cheaper. (Not to mention, OAI's doesn't follow instructions very well.)
If you're interested in this stuff, here's a full chat app for the new Gemini 2 API's with text, audio, image, camera video and screen video. This shows how to use both the WebSocket API and to route through WebRTC infrastructure.
Anyone else run into similar issues or have any tips?
"What's the first name of Freddy LaStrange"? >> "I do not have enough information about that person to help with your request. I am a large language model, and I am able to communicate and generate human-like text in response to a wide range of prompts and questions, but my knowledge about this person is limited. Is there anything else I can do to help you with this request?"
(Of course, we can't be 100% sure that his first name is Freddy. But I would expect that to be part of the answer then)
Also
> Freddy LaStrange is a fictional character and his first name is Freddy.
and
> Freddy is a nickname for Frederick. So, the first name of Freddy LaStrange is Frederick.
https://chatgpt.com/share/675ab91c-c158-8004-9dfc-ea176ba387...
A better way to say it is: "funny how Google can't keep up with the reasoning effectiveness of OpenAI's latest models"
Why do you want the hardware vs just using it in the cloud? If you're training huge models you probably don't also keep all your data on prem, but on GCS or S3 right? It'd be more efficient to use training resources close to your data. I guess inference on huge models? Still isn't just using a hosted API simpler / what everyone is doing now?
I'll give Flash 2 a try soon, but I gotta say that Google has been doing a great job catching up with Gemini. Both Gemini 1.5 Pro 002 and Flash 1.5 can trade blows with 4o, and are easily ahead of the vast majority of other major models (Mistral Large, Qwen, Llama, etc). Claude is usually better, but has a major flaw (to be discussed later).
So, here's my current rankings. I base my rankings on my work, not on benchmarks. I think benchmarks are important and they'll get better in time, but most benchmarks for LLMs and MLLMs are quite bad.
1) 4o and its ilk are far and away the best in terms of accuracy, both for textual tasks as well as vision related tasks. Absolutely nothing comes even close to 4o for vision related tasks. The biggest failing of 4o is that it has the worst instruction following of commercial LLMs, and that instruction following gets _even_ worse when an image is involved. A prime example is when I ask 4o to help edit some text, to change certain words, verbage, etc. No matter how I prompt it, it will often completely re-write the input text to its own style of speaking. It's a really weird failing. It's like their RLHF tuning is hyper focused on keeping it aligned with the "character" of 4o to the point that it injects that character into all its outputs no matter what the user or system instructions state. o1 is a MASSIVE improvement in this regard, and is also really good at inferring things so I don't have to explicitly instruct it on every little detail. I haven't found o1-pro overly useful yet. o1 is basically my daily driver outside of work, even for mundane questions, because it's just better across the board and the speed penalty is negligible. One particularly example of o1 being better I encountered yesterday. I had it re-wording an image description, and thought it had introduced a detail that wasn't in the original description. Well, I was wrong and had accidentally skimmed over that detail in the original. It _told_ me I was wrong, and didn't update the description! Freaky, but really incredible. 4o never corrects me when I give it an explicit instruction.
4o is fairly easy to jailbreak. They've been turning the screws for awhile so it isn't as easy as day 1, but even o1-pro can be jailbroken.
2) Gemini 1.5 Pro 002 (specifically 002) is second best in my books. I'd guesstimate it at being about 80% as good as 4o on most tasks, including vision. But it's _significantly_ better at instruction following. Its RLHF is a lot lighter than ChatGPT models, so it's easier to get these models to fall back to pretraining, which is really helpful for my work specifically. But in general the Gemini models have come a long way. The ability to turn off model censorship is quite nice, though it does still refuse at times. The Flash variation is interesting; often times on-par with Pro with Pro edging out maybe 30% of the time. I don't frequently use Flash, but it's an impressive model for its size. (Side note: The Gemma models are ... not good. Google's other public models, like so400m and OWLv2 are great, so it's a shame their open LLMs forays are falling behind). Google also has the best AI playground.
Jailbreaking Gemini is a piece of cake.
3) Claude is third on my list. It has the _best_ instruction following of all the models, even slightly better than o1. Though it often requires multi-turn to get it to fully follow instructions, which is annoying. Its overall prowess as an LLM is somewhere between 4o and Gemini. Vision is about the same as Gemini, except for knowledge based queries which Gemini tends to be quite bad at (who is this person? Where is this? What brand of guitar? etc). But Claude's biggest flaw is the insane "safety" training it underwent, which makes it practically useless. I get false triggers _all_ the time from Claude. And that's to say nothing of how unethical their "ethics" system is to begin with. And what's funny is that Claude is an order of magnitude _smarter_ when its reasoning about its safety training. It's the only real semblance of reason I've seen from LLMs ... all just to deny my requests.
I've put Claude three out of respect for the _technical_ achievements of the product, but I think the developers need to take a long look in the mirror and ask why they think it's okay to for _them_ to decide what people with disabilities are and are not aloud to have access to.
4) Llama 3. What a solid model. It's the best open LLM, hands down. Nowhere near the commercial models above, but for a model that's completely free to use locally? That's invaluable. Their vision variation is ... not worth using. But I think it'll get better with time. The 8B variation far outperforms its weight class. 70B is a respectable model, with better instruction following than 4o. The ability to finetune these models to a task with so little data is a huge plus. I've made task specific models with 200-400 examples.
5) Mistral Large (I forget the specific version for their latest release). I love Mistral as the "under-dog". Their models aren't bad, and behave _very_ differently from all other models out there, which I appreciate. But Mistral never puts any effort into polishing their models; they always come out of the oven half-baked. Which means they frequently glitch out, have very inconsistent behavior, etc. Accuracy and quality is hard to assess because of this inconsistency. On its best days it's up near Gemini, which is quite incredible considering the models are also released publicly. So theoretically you could finetune them to your task and get a commercial grade model to run locally. But rarely see anyone do that with Mistral, I think partly because of their weird license. Overall, I like seeing them in the race and hope they get better, but I wouldn't use it for anything serious.
Mistral is lightly censored, but fairly easy to jailbreak.
6) Qwen 2 (or 2.5 or whatever the current version is these days). It's an okay model. I've heard a lot of praises for it, but in all my uses thus far its always been really inconsistent, glitchy, and weak. I've used it both locally and through APIs. I guess in _theory_ it's a good model, based on benchmarks. And it's open, which I appreciate. But I've not found any practical use for it. I even tried finetuning with Qwen 2VL 72B, and my tiny 8B JoyCaption model beat it handily.
That's about the sum of it. AFAIK that's all the major commercial and open models (my focus is mainly on MLLMs). OpenAI are still leading the pack in my experience. I'm glad to see good competition coming from Google finally. I hope Mistral can polish their models and be a real contender.
There are a couple smaller contenders out there like Pixmo/etc from allenai. Allen AI has hands down the _best_ public VQA dataset I've seen, so huge props to them there. Pixmo is ... okayish. I tried Amazon's models a little but didn't see anything useful.
NOTE: I refuse to use Grok models for the obvious reasons, so fucks to be them.
Overall, especially seeing as I haven't paid a dime to use the API yet, I'm pretty impressed.
EDIT: probably not deep research.. is it Google testing their equivalent of o1? who knows..
In my own testing "exp 1206" is significantly better than Gemini 2.
Feels like haiku 3.5 vs sonnet 3.5 kind of thing.
That's my main issue with google there's several models we want to try with our agent but quota is limited and we have to jump through hoops to see if we can get it raised.
I think DeepMind could make progress if they focused on the agent definition of multi-step reasoning + action through a web browser, and deliver a ton of value, outside of lumping in the seldom used "Look at the world through a camera" or "Multi modal Robots" thing.
If Google cracked robots, past plays show that the market for those aren't big enough to interest Google. Like VR, you just can't get a billion people to be interested in robots - so even if they make progress, it won't survive under Google.
The "Look at the world through a camera" thing is a footnote in an Android release.
Agentic computer use _is_ a product a billion people would use, and it's adjacent to the business interests of Google Search.
" Why My "I Guess What I Have in Mind" Statement Doesn't Make Sense
The Problem of Agency: You're spot-on when you say it doesn't make sense for me to state "I guess what I (the computer) have in mind." AI does not have a "mind" in the same way that humans do. My internal state is a complex pattern of numerical data, not conscious thoughts or intentions in the way you have things in mind. My phrasing gave the illusion of having a "thing in mind," but in reality, I don't have this capacity.
Misinterpretation of the Task: My statement "I guess what I have in mind" misinterprets the task. You were clearly signaling that I needed to ask questions to uncover what you were thinking of. My failure to grasp this created a weird scenario where it seemed like I was trying to determine my own data set!"
Kind of sort of: https://news.ycombinator.com/item?id=42391096
Meta trained the smaller Llama 3 models first, and then trained the 405B model on the same architecture once it had been validated on the smaller ones. Later, they went back and used that 405B model to improve the smaller models for the Llama 3.1 release. Mistral started with a number of small models before scaling up to larger models.
I feel like this is a fairly common pattern.
If Google had a bigger version of Gemini 2.0 ready to go, I feel confident they would have mentioned it, and it would be difficult to distill it down to a small model if it wasn't ready to go.
A better comparison might be Flash 2.0 vs 4o-mini. Even then, the models aren't meant to have vast world knowledge, so benchmarking them on that isn't a great indicator of how they would be used in real-world cases.
"With your supervision". Thus avoiding Google being held responsible. That's like Teslas Fake Self Driving, where the user must have their hands on the wheel at all times.
They have all of these extensions that they use to prop up the results in the web UI.
I was asking for a list of related YouTube videos - the UI returns them.
Ask the API the same prompt, it returns a bunch of made up YouTube titles and descriptions.
How could I ever rely on this product?
"GVP stands for Good Pharmacovigilance Practice, which is a set of guidelines for monitoring the safety of drugs. SVP stands for Senior Vice President, which is a role in a company that focuses on a specific area of operations."
Seems lot of pharma regulation in my telecom company.
> It's located in London.
Mind blowing.
(you'll probably also need to increase your quotas right away, the default is only 10 requests per minute).
When capitalism has pilfered everything from the pockets of working people so people are constantly stressed over healthcare and groceries, and there's little left to further the pockets of plutocrats, the only marketing that makes sense is to appeal to other companies in order to raid their coffers by tricking their Directors to buy a nonsensical product.
Is that what they mean by "agentic era"? Cause that's what it sounds like to me. Also smells alot like press release driven development where the point is to put a feather in the cap of whatever poor Google engineer is chasing their next promotion.
What are you basing your opinion on? I have no idea how well these LLM agents will perform but its definitely a thing. OpenAI is working on them , Claude and certainly Google.
Marketing aside, agents are just LLMs that can reach out of their regular chat bubbles and use tools. Seems like just the next logical evolution
"Hear from our CEO first, and then our other CEO in charge of this domain and CTO will tell you the actual news."
I haven't seen other tech companies write like that.
I never used the web interface to access email until recently. To my surprise, all of the AI shit is enabled by default. So it’s very likely Gemini has been training on private data without my explicit consent.
Of course G words it as “personalizing” the experience for me but it’s such a load of shit. I’m tired of these companies stealing our data and never getting rightly compensated.
I need to rewire my brain for the power of these tools
this plus the quantum stuff...Google is on a win streak
>General availability will follow in January, along with more model sizes.
>Benchmarks against their own models which always underperformed
>No pricing visible anywhere
Completely inept leadership at play.
“Sure, playing don’t fear the reaper on bathroom speaker”
Ok
Who the hell wants an AI that has the personality of a car salesman?
Haven't used it enough to evaluate the quality, however.
Perplexity is a much less versatile product than it has to be in the chase of speed: you can only chew through so many tokens, do so much CoT, etc. in a given amount of time.
They optimized for virality (it's just as fast as Google but gives me more info!) but I suspect it kills the stickiness for a huge number of users since you end up with some "embarrassing misses": stuff that should have been a slam dunk, goes off the rails due to not enough search, or the wrong context being surfaced from the page, etc. and the user just doesn't see value in it anymore.
If I ask natural language yes/no questions, Gemini sometimes tells me outright lies with confidence.
It also presents information as authoritative - locations, science facts, corporate ownership, geography - even when it's pure hallucination.
Right at the top of Google search.
edit:
I can't find the most obnoxious offending queries, but here was one I performed today: "how many islands does georgia have?".
Compare that with "how many islands does georgia have? Skidaway Island".
This is an extremely mild case, but I've seen some wildly wrong results, where Google has claimed companies were founded in the wrong states, that towns were located in the wrong states, etc.
I also miss that it’s not yet really as context aware as ChatGPTo4. Even just asking a follow-up question, confuses Gemini 1.5.
Hope Gemini 2.0 will improve that!
> A depsipeptide is a cyclic peptide where one or more amide groups are replaced by ester groups.
Depsipeptides are not necessarily cyclic, and I'd probably use "bond" instead of "group".
These errors are happening all the time.
It's like Microsoft creating an AI tool and calling it Peertube. "Hurr durr they couldn't possibly be confused; one is a decentralised video platform and the other is an AI tool hurr durr. And ours is already more popular if you 'bing' it hurr durr."
How is it like that? Gemini is a much more common word than Peertube. https://en.wikipedia.org/wiki/Gemini