29 comments

  • ntulpule8 小时前
    Hi, I lead the teams responsible for our internal developer tools, including AI features. We work very closely with Google DeepMind to adapt Gemini models for Google-scale coding and other Software Engineering usecases. Google has a unique, massive monorepo which poses a lot of fun challenges when it comes to deploying AI capabilities at scale.

    1. We take a lot of care to make sure the AI recommendations are safe and have a high quality bar (regular monitoring, code provenance tracking, adversarial testing, and more).

    2. We also do regular A/B tests and randomized control trials to ensure these features are improving SWE productivity and throughput.

    3. We see similar efficiencies across all programming languages and frameworks used internally at Google and engineers across all tenure and experience cohorts show similar gain in productivity.

    You can read more on our approach here:

    https://research.google/blog/ai-in-software-engineering-at-g...

    • LinuxBender43 分钟前
      Is AI ready to crawl through all open source and find / fix all the potential security bugs or all bugs for that matter? If so will that become a commercial service or a free service?

      Will AI be able to detect bugs and back doors that require multiple pieces of code working together rather than being in a single piece of code? Humans have a hard time with this.

      - Hypothetical Example: Authentication bugs in sshd that requires a flaw in systemd which then requires a flaw in udev or nss or PAM or some underlying library ... but looking at each individual library or daemon there are no bugs that a professional penetration testing organization such as the NCC group or Google's Project Zero would find. In other words, will AI soon be able to find more complex bugs in a year than Tavis has found in his career and will they start to compete with one another and start finding all the state sponsored complex bugs and then ultimately be able to create a map that suggests a common set of developers that may need to be notified? Will there be a table that logs where AI found things that professional human penetration testers could not?

    • mysterydip16 分钟前
      I assume the amount of monitoring effort is less than the amount of effort that would be required to replicate the AI generated code by humans, but do you have numbers on what that ROI looks like? Is it more like 10% or 200%?
    • reverius428 小时前
      To me the most interesting part of this is the claim that you can accurately and meaningfully measure software engineering productivity.
      • ozim6 小时前
        You can - but not on the level of a single developer and you cannot use those measures to manage productivity of a specific dev.

        For teams you can measure meaningful outcomes and improve team metrics.

        You shouldn’t really compare teams but it also is possible if you know what teams are doing.

        If you are some disconnected manager that thinks he can make decisions or improvements reducing things to single numbers - yeah that’s not possible.

        • deely31 小时前
          > For teams you can measure meaningful outcomes and improve team metrics.

          How? Which metrics?

          • ozim30 分钟前
            That is what we pay managers -to figure out- for. They should find out which and how by knowing the team, familiarity with domain knowledge, understanding company dynamics, understanding customer, understanding market dynamics.
      • UncleMeat3 小时前
        At scale you can do this in a bunch of interesting ways. For example, you could measure "amount of time between opening a crash log and writing the first character of a new change" across 10,000s of engineers. Yes, each individual data point is highly messy. Alice might start coding as a means of investigation. Bob might like to think about the crash over dinner. Carol might get a really hard bug while David gets a really easy one. But at scale you can see how changes in the tools change this metric.

        None of this works to evaluate individuals or even teams. But it can be effective at evaluating tools.

      • valval7 小时前
        You can come up with measures for it and then watch them, that’s for sure.
        • lr197021 分钟前
          when metric becomes the target it ceases to be a good metric. when discovered how it works developers will type the first character immediately after opening the log.

          edit: typo

    • hitradostava6 小时前
      I'm continually surprised by the amount of negativity that accompanies these sort of statements. The direction of travel is very clear - LLM based systems will be writing more and more code at all companies.

      I don't think this is a bad thing - if this can be accompanied by an increase in software quality, which is possible. Right now its very hit and miss and everyone has examples of LLMs producing buggy or ridiculous code. But once the tooling improves to:

      1. align produced code better to existing patterns and architecture 2. fix the feedback loop - with TDD, other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.

      Then we will definitely start seeing more and more code produced by LLMs. Don't look at the state of the art not, look at the direction of travel.

      • latexr3 小时前
        > if this can be accompanied by an increase in software quality

        That’s a huge “if”, and by your own admission not what’s happening now.

        > other LLM agents reviewing code, feeding in compile errors, letting other LLM agents interact with the produced code, etc.

        What a stupid future. Machines which make errors being “corrected” by machines which make errors in a death spiral. An unbelievable waste of figurative and literal energy.

        > Then we will definitely start seeing more and more code produced by LLMs.

        We’re already there. And there’s a lot of bad code being pumped out. Which will in turn be fed back to the LLMs.

        > Don't look at the state of the art not, look at the direction of travel.

        That’s what leads to the eternal “in five years” which eventually sinks everyone’s trust.

    • fhdsgbbcaA8 小时前
      I’ve been thinking a lot lately about how an LLM trained in really high quality code would perform.

      I’m far from impressed with the output of GPT/Claude, all they’ve done is weight against stack overflow - which is still low quality code relative to Google.

      What is probability Google makes this a real product, or is it too likely to autocomplete trade secrets?

    • gamesetmath7 小时前
      [flagged]
    • pixxel6 小时前
      [flagged]
  • imaginebit10 小时前
    I think he's trying to promote AI, somehow raises questions about thrir code quality among some
    • dietr1ch10 小时前
      I think it just shows how much noise there is in coding. Code gets reviewed anyways (although review quality was going down rapidly the more PMs where added to the team)

      Most of the code must be what could be snippets (opening files and handling errors with absl::, and moving data from proto to proto). One thing that doesn't help here, is that when writing for many engineers on different teams to read, spelling out simple code instead of depending on too many abstractions seems to be preferred by most teams.

      I guess that LLMs do provide smarter snippets that I don't need to fill out in detail, and when it understands types and whether things compile it gets quite good and "smart" when it comes to write down boilerplate.

  • nosbo8 小时前
    I don't write code as I'm a sysadmin. Mostly just scripts. But is this like saying intellisense writes 25% of my code? Because I use autocomplete to shortcut stuff or to create a for loop to fill with things I want to do.
    • n_ary8 小时前
      You just made it less attractive to the target corps who are to buy this product from Google. Saying, intellisense means corps already have license of various of these and some are even mostly free. Saying AI generate our 25% code sounds more attractive to corps, because it feels like something new and novel and you can imagine laying off 25% of the personnel and justify buying this product from Google.

      When someone who uses a product says it, there is a 50% chance of it being true, but when someone far away from the user says it, it is 100% promotion of product and setup for trust building for a future sale.

  • ausbah11 小时前
    i would be may more impressed if LLMs could do code compression. more code == more things that can break, and when llms can generate boatloads of it with a click you can imagine what might happen
    • Scene_Cast211 小时前
      This actually sparked an idea for me. Could code complexity be measured as cumulative entropy as measured by running LLM token predictions on a codebase? Notably, verbose boilerplate would be pretty low entropy, and straightforward code should be decently low as well.
      • jeffparsons10 小时前
        Not quite, I think. Some kinds of redundancy are good, and some are bad. Good redundancy tends to reduce mistakes rather than introduce them. E.g. there's lots of redundancy in natural languages, and it helps resolve ambiguity and fill in blanks or corruption if you didn't hear something properly. Similarly, a lot of "entropy" in code could be reduced by shortening names, deleting types, etc., but all those things were helping to clarify intent to other humans, thereby reducing mistakes. But some is copy+paste of rules that should be enforce in one place. Teaching a computer to understand the difference is... hard.

        Although, if we were to ignore all this for a second, you could also make similar estimates with, e.g., gzip: the higher the compression ratio attained, the more "verbose"/"fluffy" the code is.

        Fun tangent: there are a lot of researchers who believe that compression and intelligence are equivalent or at least very tightly linked.

        • 8note10 小时前
          Interpreting this comment, it would predict low complexity for code copied unnecessarily.

          I'm not sure though. If it's copied a bunch of times, and it actually doesn't matter because each usecase of the copying is linearly independent, does it matter that it was copied?

          Over time, you'd still see copies being changed by themselves show up as increased entropy

    • ks204811 小时前
      I agree. It seems like counting lines of generated code is like counting bytes/instructions of compiled code - who cares? If “code” becomes prompts, then AI should lead to much smaller code than before.

      I’m aware that the difference is that AI-generated code can be read and modified by humans. But that quantity is bad because humans have to understand it to read or modify it.

      • latexr3 小时前
        > If “code” becomes prompts, then AI should lead to much smaller code than before.

        What’s the point of shorter code if you can’t trust it to do what it’s supposed to?

        I’ll take 20 lines of code that do what they should consistently over 1 line that may or may not do the task depending on the direction of the wind.

      • TZubiri10 小时前
        What's that line about accounting for lines of code on the wrong side of the balance sheet?
      • 10 小时前
        undefined
    • AlexandrB10 小时前
      Exactly this. Code is a liability, if you can do the same thing with less code you're often better off.
      • EasyMark10 小时前
        Not if it’s already stable and has been running for years. Legacy doesn’t necessarily mean “need replacement because of technical debt”. I’ve seen lots of people want to replace code that has been running basically bug free for years because “there are better coding styles and practices now”
    • 8note10 小时前
      How would it know which edge cases are being useful and which ones aren't?

      I understand more code as being more edge cases

    • asah10 小时前
      meh - the LLM code I'm seeing isn't particularly more verbose. And as others have said, if you want tighter code, just add that to the prompt.

      fun story: today I had an LLM write me a non-trivial perl one-liner. It tried to be verbose but I insisted and it gave me one tight line.

  • pixelat3d7 小时前
    Sooo... is this why Google sucks now?
  • mergisi3 小时前
    I've been following the integration of AI into coding with great interest. It's remarkable to see that over a quarter of Google's new code is now AI-generated. In line with this trend, I've been working on a tool called AI2sql https://ai2sql.io/ that uses AI to convert natural language into SQL queries. It's been helpful in streamlining database interactions without needing deep SQL expertise. I'm curious—has anyone else here been leveraging AI tools to assist with code generation or simplify complex programming tasks?
  • mjbale11610 小时前
    If you manage to convince software engineers that you are doing them a favour by employing them then they will approach any workplace negotiations with a specific mindset which will make them grab the first number it gets thrown to them.

    These statements are brilliant.

    • 10 小时前
      undefined
  • ChrisArchitect8 小时前
    Related:

    Alphabet ($GOOG) 2024 Q3 earnings release

    https://news.ycombinator.com/item?id=41988811

  • rcarmo6 小时前
    There is a running gag among my friends using Google Chat (or whatever their corporate IM tool is now called) that this explains a lot of what they’re experiencing while using it…
  • FactKnower6911 小时前
    this is the type of thing you should be desperate to hide and cover up with secret internal memos promising immediate termination for leakers, what in god's name could be going through his head that he chose to announce this to the public
    • eob11 小时前
      So GCS customers will trust their codegen product. (Engineers aren’t the buyer; corp suite is)
    • hn_throwaway_9910 小时前
      I don't understand why you think this at all. Care to explain?
    • dartharva10 小时前
      Why? Especially when said AI helpers are a part of what the company itself is selling?
    • joeevans100010 小时前
      These companies are competing to be the next codegen service provider.
    • foota10 小时前
      Translation: They'd love to lay off all the engineers.
      • TheNewsIsHere5 小时前
        By some intuitive measures, it’s surprising they have very many still writing their code. Google’s product quality isn’t what it once was. There is no amount of AI accelerators and energy they can burn through to fix that without humans.
      • sfmz10 小时前
        We should watch for dev layoffs as a sign/signal of the impact of generated code. I remember reading about an anime shop that fired 80% of its illustrators due to ai-images.
    • lesuorac10 小时前
      Well, the article has a paywall so it might go into this.

      I'm not sure this stat is as important as people point it out to be. If I start of `for` and the AI auto-completes `for(int i=0; i<args.length; i++) {` then a lot more than 25% of the code is AI written but it's also not significant. I could've figured out how to write the for-loop and its also not a meaningful amount of time saved because most of the time is figuring out and testing which the AI doesn't do.

    • dyauspitr11 小时前
      I don’t think the public cares wether their code is written by machines or real people as long as the product works.
      • Nullabillity10 小时前
        Just today, Google Calendar asked me whether I wanted the "easy" or "depressed" colour scheme.
        • mattigames10 小时前
          It's for when you have an upcoming funeral, the calendar it's just trying to dress appropriately.
        • Mistletoe10 小时前
          Ironically, your comment brightened my day.
  • 10 小时前
    undefined
  • nine_zeros10 小时前
    Writing more code means more needs to be maintained and they are cleverly hiding that fact. Software is a lot more like complex plumbing than people want to admit:

    More lines == more shit to maintain. Complex lines == the shit is unmanageable.

    But wall street investors love simplistic narratives such as More X == More revenue. So here we are. Pretty clever marketing imo.

  • jrockway11 小时前
    When I was there, way more than 25% of the code was copying one proto into another proto, or so people complained. What sort of memes are people making now that this task has been automated?
    • hn_throwaway_9910 小时前
      I am very interested in how this 25% number is calculated, and if it's a lot of boilerplate that in the past would have been just been big copy-paste jobs like a lot of protobuffers work. Would be curious if any Googlers could comment.

      Not that I'm really discounting the value of AI here. For example, I've found a ton of value and saved time getting AI to write CDKTF (basically, Terraform in Typescript) config scripts for me. I don't write Terraform that often, there are a ton of options I always forget, etc. So asking ChatGPT to write a Terraform config for, say, a new scheduled task for example saves me from a lot of manual lookup.

      But at the same time, the AI isn't really writing the complicated logic pieces for me. I think that comes down to the fact that when I do need to write complicated logic, I'm a decent enough programmer that it's probably faster for me to write it out in a high-level programming language than write it in English first.

    • dietr1ch10 小时前
      I miss old memegen, but it got ruined by HR :/
      • rcarmo6 小时前
        I am reliably told that it is alive and well, even if it’s changed a bit.
    • 10 小时前
      undefined
    • 10 小时前
      undefined
  • kev00911 小时前
    I would hope a CEO, especially a technical one, would have enough sense to couple that statement to some useful business metric, because in isolation it might be announcement of public humiliation.
    • dmix11 小时前
      The elitism of programmers who think the boilerplate code they write for 25% of the job, that's already been written before by 1000 other people before, is in fact a valuable use of company time to write by hand again.

      IMO it's only really an issue if a competent human wasn't involved in the process, basically a person who could have written it if needed, then they do the work connecting it to the useful stuff, and have appropriate QA/testing in place...the latter often taking far more effort than the actual writing-the-code time itself, even when a human does it.

      • marcosdumay10 小时前
        If 25% of your code is boilerplate, you have a serious architectural problem.

        That said, I've seen even higher ratios. But never in any place that survived for long.

        • TheNewsIsHere5 小时前
          To add: it’s been my experience that it’s the company that thinks the boilerplate code is some special, secret, proprietary thing that no other business could possibly have produced.

          Not the developer who has written the same effective stanza 10 times before.

        • hn_throwaway_9910 小时前
          Depends on how you define "boilerplate". E.g. Terraform configs count for a significant number of the total lines in one of my repos. It's not really "boilerplate" in that it's not the exact same everywhere, but it is boilerplate in the since that setting up, say, a pretty standard Cloud SQL instance can take many, many lines of code just because there are so many config options.
        • 8note9 小时前
          Is it though? It seems to me like a team ownership boundary question rather than an architecture question.

          Architecturally, it sounds like different architecture components map somewhere close to 1:1 to teams, rather than teams hacking components to be closer coupled to each other because they have the same ownership.

          I'd see too much boilerplate as being a organization/management org issue rather than a code architecture issue

        • dmix10 小时前
          You're probably thinking of just raw codebases, your company source code repo. Programmers do far, far more boilerplate stuff than raw code they commit with git. Debugging, data processing, system scripts, writing SQL queries, etc.

          Combine that with generic functions, framework boilerplate, OS/browser stuff, or explicit x-y-z code then your 'boilerplate' (ie repetitive, easily reproducible) easily gets to 25% of code you're programmers write every month. If your job is >75% pure human cognition problem solving you're probably in a higher tier of jobs than the vast majority of programmers on the planet.

        • cryptoz10 小时前
          Android mobile development has gotten so …architectured that I would guess most apps have a much higher rate of “boilerplate” than you’d hope for.

          Everything is getting forced into a scalable, general purpose way, that most apps have to add a ridiculous amount of boilerplate.

      • kev0099 小时前
        Doing the same thing but faster might just mean you are masturbating more furiously. Show me the money, especially from a CEO.
      • mistrial910 小时前
        you probably underestimate the endless miles of verbose code that are possible, by human or machine but especially by machine.
        • 10 小时前
          undefined
    • dyauspitr11 小时前
      Or a statement of pride that the intelligence they created is capable of lofty tasks.
    • 10 小时前
      undefined
  • joeevans100010 小时前
    I read these threads and the usual 'I have to fix the AI code for longer than it would have taken to write it from scratch' and can't help but feel folks are truly trying to downplay what is going to eat the software industry alive.
    • 10 小时前
      undefined
  • tylerchilds11 小时前
    if the golden rule is that code is a liability, what does this headline imply?
    • eddd-ddde8 小时前
      The code would be getting written anyways, its an invariant. The difference is less time wasted typing keys (albeit small amount of time) and more importantly (in my experience) it helps A LOT for discoverability.

      With g3's immense amount of context, LLMs can vastly help you discover how other people are using existing libraries.

      • tylerchilds2 小时前
        my experience dabbling with the ai and code is that it is terrible at coming up with new stuff unless it already exists

        in regards to how others are using libraries, that’s where the technology will excel— re-writing code. once it has a stable AST to work with, the mathematical equation it is solving is a refactor.

        until it has that AST that solves the business need, the game is just prompt spaghetti until it hits altitude to be able to refactor.

    • JimDabell5 小时前
      Nothing at all. The headline talks about the proportion of code written by AI. Contrary to what a lot of comments here are assuming, it does not say that the volume of code written has increased.

      Google could be writing the same amount of code with fewer developers (they have had multiple layoffs lately), or their developers could be focusing more of their time and attention on the code they do write.

    • danielmarkbruce10 小时前
      I'm sure google won't pay you money to take all their code off their hands.
      • AlexandrB10 小时前
        But they would pay me money to audit it for security.
        • danielmarkbruce10 小时前
          yup, you can get paid all kinds of money to fix/guard/check billion/trillion dollar assets..
    • 10 小时前
      undefined
  • croes11 小时前
    Related?

    > New tool bypasses Google Chrome’s new cookie encryption system

    https://news.ycombinator.com/item?id=41988648

    • 11 小时前
      undefined
  • an_d_rew11 小时前
    Huh.

    That may explain why google search has, in the past couple of months, become so unusable for me that I switched (happily) to kagi.

    • twarge10 小时前
      Which uses Google results?
    • 10 小时前
      undefined
  • hipadev2311 小时前
    Google is now mass-producing techdebt at rates not seen since Martin Fowler’s first design pattern blogposts.
    • joeevans100010 小时前
      Not really technical debt when you will be able to regenerate 20K lines of code in a minute then QA and deploy it automatically.
      • kibwen9 小时前
        So a fresh, new ledger of technical debt every morning, impossible to ever pay off?
      • 1attice9 小时前
        Assuming, of course:

        - You know which 20K lines need changing - You have perfect QA - Nothing ever goes wrong in deployment.

        I think there's a tendency in our industry to only take the hypotenuse of curves at the steepest point

        • TheNewsIsHere4 小时前
          That is a fantastic way to put it. I’d argue that you’ve described a bubble, which fits perfectly with the topic and where _most_ of it will eventually end up.
    • 10 小时前
      undefined
  • Tier3r9 小时前
    Google is getting enshittified. It's already visible in many small ways. I was just using Google maps and in the route they called X (bus) Interchange as X International. I can only assume this happened because they are using AI to summarise routes now. Why in the world are they doing that? They have exact location names available.
    • 9 小时前
      undefined
  • microtherion10 小时前
    [flagged]
    • 10 小时前
      undefined
  • Tiktaalik10 小时前
    [flagged]
    • 10 小时前
      undefined
  • calmbonsai10 小时前
    [flagged]
    • 9 小时前
      undefined
  • pyuser58310 小时前
    [flagged]
    • YPPH10 小时前
      Actually 0%, assembly language is assembled to machine code, not compiled.
      • ndesaulniers10 小时前
        Inline asm has to go through the compiler to get wired up by the register allocator.
  • bakugo11 小时前
    [flagged]
  • evbogue10 小时前
    I'd be turning off the autocomplete in my IDE if I was at Google. Seems to double as a keylogger.
    • 10 小时前
      undefined
  • 1oooqooq9 小时前
    this only means employees sign up to use new toys and they are paying enough seats for all employees.

    it's like companies paying all those todolist and tutorial apps left running on aws ec2 instances in 2007ish.

    I'd be worried if i were a google investor. lol.

    • fragmede8 小时前
      I'm not sure I get your point. Google created Gemini and whatever internal LLM their employees are using for code generation. Who are they paying, and for what seats? Not Microsoft or OpenAI or Anthropic...
  • ultra_nick10 小时前
    Why work at big businesses anymore? Let's just create more startups.
    • IAmGraydon10 小时前
      Risk appetite.
    • 9 小时前
      undefined