29 comments

  • ntulpule6 hours ago
    Hi, I lead the teams responsible for our internal developer tools, including AI features. We work very closely with Google DeepMind to adapt Gemini models for Google-scale coding and other Software Engineering usecases. Google has a unique, massive monorepo which poses a lot of fun challenges when it comes to deploying AI capabilities at scale.

    1. We take a lot of care to make sure the AI recommendations are safe and have a high quality bar (regular monitoring, code provenance tracking, adversarial testing, and more).

    2. We also do regular A/B tests and randomized control trials to ensure these features are improving SWE productivity and throughput.

    3. We see similar efficiencies across all programming languages and frameworks used internally at Google and engineers across all tenure and experience cohorts show similar gain in productivity.

    You can read more on our approach here:

    https://research.google/blog/ai-in-software-engineering-at-g...

    • reverius426 hours ago
      To me the most interesting part of this is the claim that you can accurately and meaningfully measure software engineering productivity.
      • ozim4 hours ago
        You can - but not on the level of a single developer and you cannot use those measures to manage productivity of a specific dev.

        For teams you can measure meaningful outcomes and improve team metrics.

        You shouldn’t really compare teams but it also is possible if you know what teams are doing.

        If you are some disconnected manager that thinks he can make decisions or improvements reducing things to single numbers - yeah that’s not possible.

      • UncleMeat1 hour ago
        At scale you can do this in a bunch of interesting ways. For example, you could measure "amount of time between opening a crash log and writing the first character of a new change" across 10,000s of engineers. Yes, each individual data point is highly messy. Alice might start coding as a means of investigation. Bob might like to think about the crash over dinner. Carol might get a really hard bug while David gets a really easy one. But at scale you can see how changes in the tools change this metric.

        None of this works to evaluate individuals or even teams. But it can be effective at evaluating tools.

      • valval5 hours ago
        You can come up with measures for it and then watch them, that’s for sure.
    • hitradostava4 hours ago
      I'm continually surprised by the amount of negativity that accompanies these sort of statements. The direction of travel is very clear - LLM based systems will be writing more and more code at all companies.

      I don't think this is a bad thing - if this can be accompanied by an increase in software quality, which is possible. Right now its very hit and miss and everyone has examples of LLMs producing buggy or ridiculous code. But once the tooling improves to:

      1. align produced code better to existing patterns and architecture 2. fix the feedback loop - with TDD, other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.

      Then we will definitely start seeing more and more code produced by LLMs. Don't look at the state of the art not, look at the direction of travel.

      • latexr1 hour ago
        > if this can be accompanied by an increase in software quality

        That’s a huge “if”, and by your own admission not what’s happening now.

        > other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.

        What a stupid future. Machines which make errors being “corrected” by machines which make errors in a death spiral. An unbelievable waste of figurative and literal energy.

        > Then we will definitely start seeing more and more code produced by LLMs.

        We’re already there. And there’s a lot of bad code being pumped out. Which will in turn be fed back to the LLMs.

        > Don't look at the state of the art not, look at the direction of travel.

        That’s what leads to the eternal “in five years” which eventually sinks everyone’s trust.

    • fhdsgbbcaA6 hours ago
      I’ve been thinking a lot lately about how an LLM trained in really high quality code would perform.

      I’m far from impressed with the output of GPT/Claude, all they’ve done is weight against stack overflow - which is still low quality code relative to Google.

      What is probability Google makes this a real product, or is it too likely to autocomplete trade secrets?

    • gamesetmath5 hours ago
      [flagged]
    • pixxel3 hours ago
      [flagged]
  • imaginebit8 hours ago
    I think he's trying to promote AI, somehow raises questions about thrir code quality among some
    • dietr1ch8 hours ago
      I think it just shows how much noise there is in coding. Code gets reviewed anyways (although review quality was going down rapidly the more PMs where added to the team)

      Most of the code must be what could be snippets (opening files and handling errors with absl::, and moving data from proto to proto). One thing that doesn't help here, is that when writing for many engineers on different teams to read, spelling out simple code instead of depending on too many abstractions seems to be preferred by most teams.

      I guess that LLMs do provide smarter snippets that I don't need to fill out in detail, and when it understands types and whether things compile it gets quite good and "smart" when it comes to write down boilerplate.

  • nosbo6 hours ago
    I don't write code as I'm a sysadmin. Mostly just scripts. But is this like saying intellisense writes 25% of my code? Because I use autocomplete to shortcut stuff or to create a for loop to fill with things I want to do.
    • n_ary6 hours ago
      You just made it less attractive to the target corps who are to buy this product from Google. Saying, intellisense means corps already have license of various of these and some are even mostly free. Saying AI generate our 25% code sounds more attractive to corps, because it feels like something new and novel and you can imagine laying off 25% of the personnel and justify buying this product from Google.

      When someone who uses a product says it, there is a 50% chance of it being true, but when someone far away from the user says it, it is 100% promotion of product and setup for trust building for a future sale.

  • mergisi1 hour ago
    I've been following the integration of AI into coding with great interest. It's remarkable to see that over a quarter of Google's new code is now AI-generated. In line with this trend, I've been working on a tool called AI2sql https://ai2sql.io/ that uses AI to convert natural language into SQL queries. It's been helpful in streamlining database interactions without needing deep SQL expertise. I'm curious—has anyone else here been leveraging AI tools to assist with code generation or simplify complex programming tasks?
  • ausbah9 hours ago
    i would be may more impressed if LLMs could do code compression. more code == more things that can break, and when llms can generate boatloads of it with a click you can imagine what might happen
    • Scene_Cast29 hours ago
      This actually sparked an idea for me. Could code complexity be measured as cumulative entropy as measured by running LLM token predictions on a codebase? Notably, verbose boilerplate would be pretty low entropy, and straightforward code should be decently low as well.
      • jeffparsons8 hours ago
        Not quite, I think. Some kinds of redundancy are good, and some are bad. Good redundancy tends to reduce mistakes rather than introduce them. E.g. there's lots of redundancy in natural languages, and it helps resolve ambiguity and fill in blanks or corruption if you didn't hear something properly. Similarly, a lot of "entropy" in code could be reduced by shortening names, deleting types, etc., but all those things were helping to clarify intent to other humans, thereby reducing mistakes. But some is copy+paste of rules that should be enforce in one place. Teaching a computer to understand the difference is... hard.

        Although, if we were to ignore all this for a second, you could also make similar estimates with, e.g., gzip: the higher the compression ratio attained, the more "verbose"/"fluffy" the code is.

        Fun tangent: there are a lot of researchers who believe that compression and intelligence are equivalent or at least very tightly linked.

        • 8note7 hours ago
          Interpreting this comment, it would predict low complexity for code copied unnecessarily.

          I'm not sure though. If it's copied a bunch of times, and it actually doesn't matter because each usecase of the copying is linearly independent, does it matter that it was copied?

          Over time, you'd still see copies being changed by themselves show up as increased entropy

    • ks20489 hours ago
      I agree. It seems like counting lines of generated code is like counting bytes/instructions of compiled code - who cares? If “code” becomes prompts, then AI should lead to much smaller code than before.

      I’m aware that the difference is that AI-generated code can be read and modified by humans. But that quantity is bad because humans have to understand it to read or modify it.

      • latexr58 minutes ago
        > If “code” becomes prompts, then AI should lead to much smaller code than before.

        What’s the point of shorter code if you can’t trust it to do what it’s supposed to?

        I’ll take 20 lines of code that do what they should consistently over 1 line that may or may not do the task depending on the direction of the wind.

      • TZubiri8 hours ago
        What's that line about accounting for lines of code on the wrong side of the balance sheet?
      • 8 hours ago
        undefined
    • AlexandrB8 hours ago
      Exactly this. Code is a liability, if you can do the same thing with less code you're often better off.
      • EasyMark7 hours ago
        Not if it’s already stable and has been running for years. Legacy doesn’t necessarily mean “need replacement because of technical debt”. I’ve seen lots of people want to replace code that has been running basically bug free for years because “there are better coding styles and practices now”
    • 8note7 hours ago
      How would it know which edge cases are being useful and which ones aren't?

      I understand more code as being more edge cases

    • asah8 hours ago
      meh - the LLM code I'm seeing isn't particularly more verbose. And as others have said, if you want tighter code, just add that to the prompt.

      fun story: today I had an LLM write me a non-trivial perl one-liner. It tried to be verbose but I insisted and it gave me one tight line.

  • pixelat3d5 hours ago
    Sooo... is this why Google sucks now?
  • mjbale1168 hours ago
    If you manage to convince software engineers that you are doing them a favour by employing them then they will approach any workplace negotiations with a specific mindset which will make them grab the first number it gets thrown to them.

    These statements are brilliant.

    • 8 hours ago
      undefined
  • ChrisArchitect6 hours ago
    Related:

    Alphabet ($GOOG) 2024 Q3 earnings release

    https://news.ycombinator.com/item?id=41988811

  • rcarmo4 hours ago
    There is a running gag among my friends using Google Chat (or whatever their corporate IM tool is now called) that this explains a lot of what they’re experiencing while using it…
  • FactKnower699 hours ago
    this is the type of thing you should be desperate to hide and cover up with secret internal memos promising immediate termination for leakers, what in god's name could be going through his head that he chose to announce this to the public
    • eob9 hours ago
      So GCS customers will trust their codegen product. (Engineers aren’t the buyer; corp suite is)
    • hn_throwaway_998 hours ago
      I don't understand why you think this at all. Care to explain?
    • dartharva8 hours ago
      Why? Especially when said AI helpers are a part of what the company itself is selling?
    • joeevans10008 hours ago
      These companies are competing to be the next codegen service provider.
    • foota8 hours ago
      Translation: They'd love to lay off all the engineers.
      • TheNewsIsHere2 hours ago
        By some intuitive measures, it’s surprising they have very many still writing their code. Google’s product quality isn’t what it once was. There is no amount of AI accelerators and energy they can burn through to fix that without humans.
      • sfmz8 hours ago
        We should watch for dev layoffs as a sign/signal of the impact of generated code. I remember reading about an anime shop that fired 80% of its illustrators due to ai-images.
    • lesuorac8 hours ago
      Well, the article has a paywall so it might go into this.

      I'm not sure this stat is as important as people point it out to be. If I start of `for` and the AI auto-completes `for(int i=0; i<args.length; i++) {` then a lot more than 25% of the code is AI written but it's also not significant. I could've figured out how to write the for-loop and its also not a meaningful amount of time saved because most of the time is figuring out and testing which the AI doesn't do.

    • dyauspitr8 hours ago
      I don’t think the public cares wether their code is written by machines or real people as long as the product works.
      • Nullabillity8 hours ago
        Just today, Google Calendar asked me whether I wanted the "easy" or "depressed" colour scheme.
        • mattigames7 hours ago
          It's for when you have an upcoming funeral, the calendar it's just trying to dress appropriately.
        • Mistletoe8 hours ago
          Ironically, your comment brightened my day.
  • 8 hours ago
    undefined
  • nine_zeros8 hours ago
    Writing more code means more needs to be maintained and they are cleverly hiding that fact. Software is a lot more like complex plumbing than people want to admit:

    More lines == more shit to maintain. Complex lines == the shit is unmanageable.

    But wall street investors love simplistic narratives such as More X == More revenue. So here we are. Pretty clever marketing imo.

  • jrockway9 hours ago
    When I was there, way more than 25% of the code was copying one proto into another proto, or so people complained. What sort of memes are people making now that this task has been automated?
    • hn_throwaway_998 hours ago
      I am very interested in how this 25% number is calculated, and if it's a lot of boilerplate that in the past would have been just been big copy-paste jobs like a lot of protobuffers work. Would be curious if any Googlers could comment.

      Not that I'm really discounting the value of AI here. For example, I've found a ton of value and saved time getting AI to write CDKTF (basically, Terraform in Typescript) config scripts for me. I don't write Terraform that often, there are a ton of options I always forget, etc. So asking ChatGPT to write a Terraform config for, say, a new scheduled task for example saves me from a lot of manual lookup.

      But at the same time, the AI isn't really writing the complicated logic pieces for me. I think that comes down to the fact that when I do need to write complicated logic, I'm a decent enough programmer that it's probably faster for me to write it out in a high-level programming language than write it in English first.

    • dietr1ch8 hours ago
      I miss old memegen, but it got ruined by HR :/
      • rcarmo4 hours ago
        I am reliably told that it is alive and well, even if it’s changed a bit.
    • 8 hours ago
      undefined
    • 8 hours ago
      undefined
  • kev0099 hours ago
    I would hope a CEO, especially a technical one, would have enough sense to couple that statement to some useful business metric, because in isolation it might be announcement of public humiliation.
    • dmix8 hours ago
      The elitism of programmers who think the boilerplate code they write for 25% of the job, that's already been written before by 1000 other people before, is in fact a valuable use of company time to write by hand again.

      IMO it's only really an issue if a competent human wasn't involved in the process, basically a person who could have written it if needed, then they do the work connecting it to the useful stuff, and have appropriate QA/testing in place...the latter often taking far more effort than the actual writing-the-code time itself, even when a human does it.

      • marcosdumay8 hours ago
        If 25% of your code is boilerplate, you have a serious architectural problem.

        That said, I've seen even higher ratios. But never in any place that survived for long.

        • TheNewsIsHere2 hours ago
          To add: it’s been my experience that it’s the company that thinks the boilerplate code is some special, secret, proprietary thing that no other business could possibly have produced.

          Not the developer who has written the same effective stanza 10 times before.

        • hn_throwaway_998 hours ago
          Depends on how you define "boilerplate". E.g. Terraform configs count for a significant number of the total lines in one of my repos. It's not really "boilerplate" in that it's not the exact same everywhere, but it is boilerplate in the since that setting up, say, a pretty standard Cloud SQL instance can take many, many lines of code just because there are so many config options.
        • 8note7 hours ago
          Is it though? It seems to me like a team ownership boundary question rather than an architecture question.

          Architecturally, it sounds like different architecture components map somewhere close to 1:1 to teams, rather than teams hacking components to be closer coupled to each other because they have the same ownership.

          I'd see too much boilerplate as being a organization/management org issue rather than a code architecture issue

        • dmix8 hours ago
          You're probably thinking of just raw codebases, your company source code repo. Programmers do far, far more boilerplate stuff than raw code they commit with git. Debugging, data processing, system scripts, writing SQL queries, etc.

          Combine that with generic functions, framework boilerplate, OS/browser stuff, or explicit x-y-z code then your 'boilerplate' (ie repetitive, easily reproducible) easily gets to 25% of code you're programmers write every month. If your job is >75% pure human cognition problem solving you're probably in a higher tier of jobs than the vast majority of programmers on the planet.

        • cryptoz8 hours ago
          Android mobile development has gotten so …architectured that I would guess most apps have a much higher rate of “boilerplate” than you’d hope for.

          Everything is getting forced into a scalable, general purpose way, that most apps have to add a ridiculous amount of boilerplate.

      • kev0097 hours ago
        Doing the same thing but faster might just mean you are masturbating more furiously. Show me the money, especially from a CEO.
      • mistrial98 hours ago
        you probably underestimate the endless miles of verbose code that are possible, by human or machine but especially by machine.
        • 8 hours ago
          undefined
    • dyauspitr8 hours ago
      Or a statement of pride that the intelligence they created is capable of lofty tasks.
    • 8 hours ago
      undefined
  • joeevans10008 hours ago
    I read these threads and the usual 'I have to fix the AI code for longer than it would have taken to write it from scratch' and can't help but feel folks are truly trying to downplay what is going to eat the software industry alive.
    • 8 hours ago
      undefined
  • tylerchilds9 hours ago
    if the golden rule is that code is a liability, what does this headline imply?
    • eddd-ddde6 hours ago
      The code would be getting written anyways, its an invariant. The difference is less time wasted typing keys (albeit small amount of time) and more importantly (in my experience) it helps A LOT for discoverability.

      With g3's immense amount of context, LLMs can vastly help you discover how other people are using existing libraries.

    • JimDabell2 hours ago
      Nothing at all. The headline talks about the proportion of code written by AI. Contrary to what a lot of comments here are assuming, it does not say that the volume of code written has increased.

      Google could be writing the same amount of code with fewer developers (they have had multiple layoffs lately), or their developers could be focusing more of their time and attention on the code they do write.

    • danielmarkbruce8 hours ago
      I'm sure google won't pay you money to take all their code off their hands.
      • AlexandrB8 hours ago
        But they would pay me money to audit it for security.
        • danielmarkbruce8 hours ago
          yup, you can get paid all kinds of money to fix/guard/check billion/trillion dollar assets..
    • 8 hours ago
      undefined
  • croes8 hours ago
    Related?

    > New tool bypasses Google Chrome’s new cookie encryption system

    https://news.ycombinator.com/item?id=41988648

    • 8 hours ago
      undefined
  • an_d_rew8 hours ago
    Huh.

    That may explain why google search has, in the past couple of months, become so unusable for me that I switched (happily) to kagi.

    • twarge8 hours ago
      Which uses Google results?
    • 8 hours ago
      undefined
  • Tier3r7 hours ago
    Google is getting enshittified. It's already visible in many small ways. I was just using Google maps and in the route they called X (bus) Interchange as X International. I can only assume this happened because they are using AI to summarise routes now. Why in the world are they doing that? They have exact location names available.
    • 7 hours ago
      undefined
  • hipadev238 hours ago
    Google is now mass-producing techdebt at rates not seen since Martin Fowler’s first design pattern blogposts.
    • joeevans10008 hours ago
      Not really technical debt when you will be able to regenerate 20K lines of code in a minute then QA and deploy it automatically.
      • kibwen7 hours ago
        So a fresh, new ledger of technical debt every morning, impossible to ever pay off?
      • 1attice7 hours ago
        Assuming, of course:

        - You know which 20K lines need changing - You have perfect QA - Nothing ever goes wrong in deployment.

        I think there's a tendency in our industry to only take the hypotenuse of curves at the steepest point

        • TheNewsIsHere2 hours ago
          That is a fantastic way to put it. I’d argue that you’ve described a bubble, which fits perfectly with the topic and where _most_ of it will eventually end up.
    • 8 hours ago
      undefined
  • microtherion8 hours ago
    [flagged]
    • 8 hours ago
      undefined
  • Tiktaalik8 hours ago
    [flagged]
    • 8 hours ago
      undefined
  • calmbonsai8 hours ago
    [flagged]
    • 7 hours ago
      undefined
  • pyuser5838 hours ago
    [flagged]
    • YPPH8 hours ago
      Actually 0%, assembly language is assembled to machine code, not compiled.
      • ndesaulniers8 hours ago
        Inline asm has to go through the compiler to get wired up by the register allocator.
  • bakugo9 hours ago
    [flagged]
  • evbogue8 hours ago
    I'd be turning off the autocomplete in my IDE if I was at Google. Seems to double as a keylogger.
    • 8 hours ago
      undefined
  • 1oooqooq6 hours ago
    this only means employees sign up to use new toys and they are paying enough seats for all employees.

    it's like companies paying all those todolist and tutorial apps left running on aws ec2 instances in 2007ish.

    I'd be worried if i were a google investor. lol.

    • fragmede6 hours ago
      I'm not sure I get your point. Google created Gemini and whatever internal LLM their employees are using for code generation. Who are they paying, and for what seats? Not Microsoft or OpenAI or Anthropic...
  • ultra_nick8 hours ago
    Why work at big businesses anymore? Let's just create more startups.
    • IAmGraydon8 hours ago
      Risk appetite.
    • 7 hours ago
      undefined