Copyright reform is necessary for national security

(annas-archive.org)

79 points | by artninja198817 hours ago

14 comments

  • Kim_Bruning14 hours ago
    This wouldn't be the first time IP laws hurt Western innovation and security. When WW1 started - despite the airplane being invented in Dayton, Ohio - the US had fallen so far behind they had to use French aircraft. Why? The Wright brothers' patent wars had effectively frozen US aviation development.

    Today it's not just one industry - Western IP laws are slowing progress across multiple tech frontiers. While companies navigate complex IP restrictions (In EU and US), China's development is following a sharp exponential curve. You can already see it clearly in robotics, electric vehicles, and now these last weeks with AI.

    While in the west you deal with predatory licensing (try talking with Siemens, Oracle, or Autodesk), and everyone keeps working on barriers and moats; other nations that allow a more collaborative approach (voluntary or not) are on an accelerating trajectory.

    IP law is clearly no longer suitable for purpose - we need a system that encourages collaboration more directly. A complete free for all isn't ideal either - and I certainly don't advocate that- but even that appears to be better than what we have now.

    • absolutelastone13 hours ago
      I think our IP laws are optimized to maximize benefits for the legal system.
      • JumpCrisscross13 hours ago
        > our IP laws are optimized to maximize benefits for the legal system

        Versus the content owners?!?

    • _factor9 hours ago
      Copyright is a form of censorship in an absolute sense. It leads to a less skilled / educated populace and consolidates wealth. For a capitalist nation, it’s very anti free-market.
    • wrtka10 hours ago
      So many euphemisms here. Security isn't impacted at all, that is just the new talking point to get Trump tech bro money flowing.

      What kind of collaboration are we talking about? Open source people work for free, big tech steals their works and launders them for a $100 subscription fee?

      Laundering is literally the only purpose of LLMs.

      If the Chinese steal from the West and internally, sanction them and bring the industries back.

  • TheAceOfHearts15 hours ago
    I hope to see copyright duration go down to a reasonable length within my lifetime. There's tons of creative derivative work which builds upon existing content which cannot be sold due to copyright.

    The way I think about IP is that if you grew up with something, by the time you're an adult it should be possible to remix it in any way you like, because it's part of your culture. Nobody should get to lock down an idea for their lifetime.

    • jazzyjackson14 hours ago
      14 years as originally intended would be fine. All the classic movies up to 2010, and I still have to pay a subscription to watch, and lots of them are unavailable because the studios want to haggle over licenses for 50 year old properties? It's the kind of thing that might drive a man to pirate.

      If your book hasn't made you rich in the first 14 years after publishing, I'm sorry, audience just isn't that into you.

      (I am aware movie deals can languish for many years before finally landing a deal, and with 14 years studios could just wait you out and there's no incentive to write a script anymore, but copyright should be 14 years after publishing right? And movie scripts are not generally published before they get made into a movie, if ever.)

      • jasinjames14 hours ago
        Not sure if this influences your thinking at all, but whatever duration copyright applies for starts once it was made, whether it was published or not.

        > Copyright in a work created on or after January 1, 1978, subsists from its creation ... [1]

        And "creation" basically means "written down" [2]. IANAL.

        [1] https://www.law.cornell.edu/uscode/text/17/302 [2] https://www.law.cornell.edu/wex/fixed_in_a_tangible_medium_o...

      • EvanAnderson13 hours ago
        The option for a single renewal for 14 more years doesn't strike me as onerous. It makes a concession for the "audience hasn't made you rich in the first 14 years" and is still light years ahead of the current de facto infinite duration.

        It won't ever change, though. No chance ever. Other than to be actually made infinite. The nature of "intellectual property" and money being speech in US politics locks that in.

        • foobarbecue13 hours ago
          Well, at some point our government will collapse entirely, and then all the laws can change. I'm starting to think that might not be far off.
      • JumpCrisscross13 hours ago
        Maybe a middle ground is mandatory, standardised licensing after 14 years and then public domain after 28. (Where do the multiples of 7 come from?)
      • 8note14 hours ago
        if you do suddenly become popular, you can write a second book, and get involved in TV or movie adaptations of it.

        GRRM isnt included on ASOIAF content because hes the copyright holder, but because his influence makes people want to see the shows

        • wisty13 hours ago
          You could argue that 14 years is a little short, especially for books which can get adaptations that pay more than the book. OTOH preventing adaptations stifles things. Imagine if Tolkein had managed to lock down his orcs and elves like Disney locked down the mouse.

          I think 20 or 30 years is fairer. But IIRC it's not about fair, the Constitution says that it should only be to encourage innovation, and long copyright inhibits innovation.

          • JumpCrisscross13 hours ago
            > 14 years is a little short, especially for books which can get adaptations that pay more than the book

            If you want to make an official adapatation, hire the author. If not, it's not official. This takes care of itself for authors who cultivate their fan base.

      • bsder13 hours ago
        The original author (not an assignee) should get N years free and should be able to renew every N years for some fee that escalates upwards with a single retroactive period if you haven't paid. If you hit a second period that you haven't paid, copyright dropped. If you aren't the original author, you get to renew once for N years--no more.

        This is a nice compromise in that it disincentivizes simply squatting on properties--you have to pay money to maintain later copyright so you have a strong push to make money on it.

    • Trung02469 hours ago
      One of my idea was establish a copyright renew system in which first year is free, then for second year the right holder will pay $1 to retain the rights, third year pay $2, fourth year pay $4, fifth year pay $8... which follows the power of 2. If they cannot pay for next year then the copyright will fall into public domain (or allows more restrictive form of non-commercial). Most of the worthless stuff will expire at around 10-15 years. Then around 35-40 years for big works that generate tons of revenue if they can pay that amount.
  • wayathr0w51 minutes ago
    Fuck national security & fuck "the west". These are not things we should seek to protect or defend. (And yes, fuck copyright too.)
  • abetusk12 hours ago
    An exceptional post from people who have an ideological stake in knowledge preservation/dissemination and are at the center of it.

    From the post:

    """

    Our first recommendation is straightforward: shorten the copyright term. In the US, copyright is granted for 70 years after the author’s death. This is absurd. We can bring this in line with patents, which are granted for 20 years after filing. This should be more than enough time for authors of books, papers, music, art, and other creative works, to get fully compensated for their efforts (including longer-term projects such as movie adaptations).

    """

  • jimmydoe10 hours ago
    > This should be more than enough time for authors of books, papers, music, art, and other creative works, to get fully compensated for their efforts (including longer-term projects such as movie adaptations).

    not sure I agree. a lot of work only get recognized broadly long after published.

  • can16358p14 hours ago
    I honestly don't understand why this is not the case already. Actually copyright should be even less-enforceable.

    Information/access to data/works should be totally free and there should be other ways to support the creators.

    For example I could easily download MP3s of music and MP4s of series/movies but I don't: simply because of two reasons:

    - I want to support the artist (to an extent as possible) - Using Spotify/Apple Music/Netflix is much more convenient with a totally acceptable monthly fee.

    I know the article is not about entertainment but a library, same rules should apply.

    And if one wants to train an LLM, let them: at its essence it's just a person who has read all the books (and access to information should be free), just the person is a machine instead of a biological human being.

    • martin-t13 hours ago
      > And if one wants to train an LLM, let them: at its essence it's just a person

      If I gzip a couple hundred thousand books and distribute them freely, can I also claim it's just a person who has read those books and avoid a massive lawsuit?

      Please, stop anthropomorphising machine learning models.

    • jprete13 hours ago
      None of those services could exist today if copyright didn't exist, because streaming services wouldn't be able to compete with free downloads. I think Patreon and Kickstarter are how creative work is funded in that world.
      • Kim_Bruning13 hours ago
        Piracy isn’t a legal problem—it’s a service problem [1].

        Netflix, Spotify, and Valve (Steam) didn’t succeed because of copyright enforcement. They won because they made paying for content easier, faster, and better than piracy.

        Piracy isn’t hard, but these services solved the friction: instant access, high quality, fair pricing, and features that free alternatives couldn’t match. That’s why they still thrive today.

        [1] https://www.escapistmagazine.com/valves-gabe-newell-says-pir...

        • JumpCrisscross13 hours ago
          > They won because they made paying for content easier, faster, and better than piracy

          PopcornTime would eat their lunch if it were allowed to work [1].

          [1] https://en.wikipedia.org/wiki/Popcorn_Time

          • Kim_Bruning13 hours ago
            Popcorn Time definitely set out to solve the service problem in a big way.
        • jprete12 hours ago
          If it were legal to download movies and music, Netflix and Spotify would absolutely not exist.

          Steam is an unusual case, because games are running software and can't be trivially reproduced in their unencoded form. The publishers can include copy protection, network connection requirements, or even run essential parts of game logic on their own servers. So free downloads became a much worse experience over time.

        • thatcat11 hours ago
          Alternative take: those services have a smaller selection, use annoying algorithms to promote IP they own, and are generally worse that piracy, but people don't like being hassled by their ISP and initially they cost a similar amount to a VPN.
    • JumpCrisscross13 hours ago
      > copyright should be even less-enforceable

      It should also be de-criminalized.

    • 14 hours ago
      undefined
  • RyanShook14 hours ago
    I’m afraid LLMs are making copyrights obsolete and unenforceable. If an author uses DeepSeek to write a book, piece of music, application, or patent did they break copyright? Is the new work protected if this is disclosed?
    • jarsin13 hours ago
      As per your second question the copyright office came out this week with the following guidance:

      The new guidelines say that AI prompts currently don’t offer enough control to “make users of an AI system the authors of the output.”(AI systems themselves can’t hold copyrights.) That stands true whether the prompt is extremely simple or involves long strings of text and multiple iterations. “No matter how many times a prompt is revised and resubmitted, the final output reflects the user’s acceptance of the AI system’s interpretation, rather than authorship of the expression it contains,”

      They are suppose to come out with guidance regarding the first question in a month or so.

    • fsckboy13 hours ago
      that does not make copyrights unenforceable
      • EnergyAmy13 hours ago
        The sheer volume of uncopyrightable work will soon kill the system. Let us dance on its corpse
    • ranger_danger13 hours ago
      If I looked at a painting before making my own similar one, did I break copyright?
      • ben_w13 hours ago
        While this pattern shows the inconsistency between how humans and AI are treated, there have been many examples over history where the ability to do at an increased scale something that was already familiar, results in the law being changed.

        Shining a torch at a plane is usually fine, shining a laser at them usually is a crime.

        • martin-t13 hours ago
          None of the people who claim LLMs are intelligent and "persons" argue for giving them legal personhood and human rights. Telling.
          • ben_w4 hours ago
            You're one of today's lucky 10,000*:

            Blake Lemoine, 2.5 years ago, hired a lawyer to make this exact argument: https://www.businessinsider.com/suspended-google-engineer-sa...

            Myself, every time this topic comes up, I point to the fact that the philosophy of mind has 40 different definitions of "consciousness" which makes it really hard for any two people to even be sure they're arguing about the same thing when they argue if any given AI does or doesn't have it.

            (Also: They can be "persons" legally without being humans, cf. corporate personhood; and they can have rights independently of either personhood or humanity, cf. animal welfare).

            * https://xkcd.com/1053/

            • martin-t2 hours ago
              I knew about that case (though I don't know the specifics of how sophisticated the LaMDA model was at the time, I don't know if it ever was available online so I could try it). AFAICT Blake Lemoine was not concerned with copyright at all, he just genuinely believed the model was sentient.

              What I meant ("none" is obviously a hyperbole, though not by much) is that people argue that "AI" (they always use this term rather than the more descriptive ML or LLM or generative models) is somehow special and either that the mixing of input material is sufficient to defeat copyright or that it somehow magically doesn't apply for reasons they either cannot describe or which include the word "intelligence".

      • martin-t13 hours ago
        I am writing a master's thesis and notice somebody has written one containing a chapter than I also need to write. I copy paste the chapter but replace every word with a synonym. Did I break copyright? Did I commit plagiarism?
        • ranger_danger11 hours ago
          Exactly. I think the answer is always "it depends" and usually boils down to a judge's opinion on just how obvious of a copy it is.
          • martin-t2 hours ago
            There are 2 separate metrics - obviousness (=provability) and reality.

            Courts operate on provability (and for good reasons).

            However, reality is I have used someone else's work and pretended it's my own. Now, ironically, there are cases where the act of masking it can be more time consuming than writing it from scratch. That is still plagiarism, although it might not be provable.

      • 13 hours ago
        undefined
  • foobarbecue13 hours ago
    I tried to repost this on Facebook. It was very upset
  • mettamage13 hours ago
    This site is blocked in the Netherlands. Does anyone have an archive.is or archive.ph of it?
  • agnishom9 hours ago
    Cory Doctorow's books "Chokepoint Capitalism" and "The Internet Con" discuss a good number of possible reforms, if anyone is interested
  • qrwafn13 hours ago
    It is interesting that since January 20th all pro-copyright posts are downvoted. Is this the user alignment that the AI broligarchs speak of?
    • artninja198813 hours ago
      On the contrary. It's only been since the proliferation of genai that I've seen so many copyright maximalist takes on Hackernews, reminiscent of the you wouldn't steal a car propaganda run by big corpos.
      • hnad812513 hours ago
        Both is true. There have been more pro-copyright posts but these are getting downvoted recently. Last year they were popular.
    • martin-t13 hours ago
      I haven't noticed a particular date but yes, my posts describing a fair system of compensation get downvoted incredibly fast.

      At least I sometimes also get replies but many of them use fallacious arguments to the point of feeling like trolling. No idea if the same people commenting are also downvoting but I am starting to think that votes should not be anonymous.

  • thw123812914 hours ago
    [flagged]
  • farts_mckensy13 hours ago
    This has no chance of passing. Reform is out of the question. This is just navel gazing. Get it through your skulls that reform is impossible at this point, and accept the implications of that.
  • martin-t14 hours ago
    I was hoping the article would propose the opposite: if you train LLMs on copyrighted data, you owe the author a part of your income from it. How big should be determined by courts but probably proportional to the amount of data.

    There's absolutely no reason rich people owning ML companies should be getting richer by stealing ordinary people's work.

    But practicality trumps morality. The west needs to beat China and China doesn't give a fuck about copyright or individual people's (intellectual) property.

    The ML algos demand to be fed so we gotta sink to their level.

    • pavpanchekha14 hours ago
      Realistically, the copyrighted works that are most valuable to training machine learning models, at least if we go by The Pile as typical of training data [1] is:

      - Web pages; hard to argue that royalties are due since these are publicly available for free

      - Scientific papers; these do cost money but the copyright is typically owned by scientific publishers

      - Github, Stack Exchange, HN (yes); these are freely available, sometimes by license, so hard to argue for royalties

      - Wikipedia, Project Gutenberg; these are also free by license

      So the actual consequence of what you're proposing (or at least the realistically-enactable version of it) is the big AI firms paying scientific publishers a lot of money. Is this actually good? Is Elsevier, a basically pure rent-seeker, really more worthy than AI labs, which maybe you don't like but at least do something valuable?

      [1]: https://en.wikipedia.org/wiki/The_Pile_(dataset)

      • kevingadd14 hours ago
        Scientific papers are available on web pages so they're publicly available for free too right? I can download an ISO or installer of almost anything if I go to the right website, so all software is already free?

        If you're going to ignore the existence of copyright and licenses we should extend it to everything that's ever been posted on the internet, not just "web pages". Why shouldn't all books and films count as free too?

        I'm actually open to the idea of just abolishing copyright but it's kind of silly to act like it's only about Elsevier. Lots of creatives depend on copyright in order to earn a living, similarly to how patents fund a lot of important research despite how noxious the patent system has become.

        If we fixate on examples like Elsevier or Martin Shkreli in order to argue for completely abolishing the copyright or patent systems we risk destroying the framework that enables valuable creative works or new technologies to be developed in the first place. This is part of why people are so upset by AI companies arguing that they should just be able to ignore the whole framework in order to enrich themselves; once you allow the for-profit AI companies to do it, other groups are going to line up to also demand a free ride.

        • Kim_Bruning13 hours ago
          > We risk destroying the framework that enables valuable creative works or new technologies to be developed in the first place

          I think we are starting to see not just historical empirical data but also strategic intelligence that this assumption might be somewhat inaccurate. - which is what Anna's archive is pointing out here in the context of AI development and copyright restrictions.

        • martin-t13 hours ago
          I am surprised by how quickly I get downvoted to hell any time I try to propose that people should be compensated for their work fairly. And I get replies like the above, full of fallacies, which basically feel like trolling. So first, thank you for a sane reply.

          Now, it boils down to the typical issue of a current system being bad and clearly not serving its stated purpose so the choice is whether to abolish it or whether to reform it (and how).

          And just like any other time people have to agree on something, the simplest-to-describe solution (abolishing) will get a huge following because the more fair solutions (reform so it actually protects and rewards creators) are way, way more complex.

          I fundamentally think if you build on other people's work, they deserve 1) credit 2) compensation according to what percentage of your work is based on theirs.

          We could for example compare the performance of the same model depending on which parts of the training data you omit. If you omit all copyrighted work and the result is useless, we know the copyrighted work is responsible a large part of the value.

          • Kim_Bruning12 hours ago
            > I fundamentally think if you build on other people's work, they deserve 1) credit 2) compensation according to what percentage of your work is based on theirs.

            That’s not a naturally occurring principle—it’s a legal and social construct that people have come to see as an inherent entitlement. If the current system isn’t working, maybe that's the actual bit that's broken and needs reform?

            • martin-t12 hours ago
              What I describe isn't the current system, it's the reform I propose. The current system sadly often is you can't self-publihsh because "lol no marketing" so you sell your rights for a one-time fixed payment and the new owner profits for years or decades.

              The new owner is also usually a large company which has fundamentally more bargaining power than an individual so even if you get a propertional payment, the ratio of effort to reward of the company vs individual is unfair (if you put a 1000 man-days of work into your game and Steam takes 5%, do you think it's actually putting 50 man-days of work into selling it?).

              As for naturally occurring principles... there are some, ehm, methods of negotiating, which you could consider natural but the state you live in would generally not approve.

      • martin-t14 hours ago
        I have trouble believing this argument is in good faith.

        1) (minor nitpick) I don't see how HN is in the same category as GitHub or Stack Overflow.

        2) "sometimes by license" or "free by license" imply you don't understand how copyright works. Code that is not accompanied by a license is proprietary. Period. [0] And if it has a license, then you have to follow it. If the license says that derivative works have to give credit and use the same license, then LLMs using that code for training and anything generated by those models is derivative and has to respect the license.

        3) Arguably i didn't say this in the OP but the idea that the publisher owns copyright is absolutely nuts and only possible through extensive lobbying and basically extortion. It should be illegal. Copyright should always belong to the person doing the actual work. Fuck rent-seekers.

        4) If western ML companies thought they can produce models of the same quality without for example stealing and laundering the entirety of Github, they would have. They don't so clearly GH is a very important input and its builders should either be compensated or the models and ML companies should only use the subset that they can without breaking the license.

        Please don't get offended but I've seen this argument multiple times by proponents of A"I" and their entire argument hinged on the idea that the current ML models are so large that nobody can understand them and therefore a form of "intelligence" which they are clearly not (unless proven otherwise, at which point, they should get their own personhood but the fact no big ML company is arguing for that makes it obvious nobody really considers them intelligent).

        [0]: https://opensource.stackexchange.com/questions/1720/what-can...

    • chromanoid14 hours ago
      Maybe enforcing open sourcing the models is the best route to go. At some point everything worthwhile ever created will be processed. Models can be seen as the processed, collective cultural output of humanity. It seems fair to me to force publishing the models in the vein of some kind of copyleft clause.
      • martin-t14 hours ago
        Even if they are small enough to be run be individuals, this still doesn't solve the issue of profiting from someone's work for free.

        Many of us publish our open source work under GPL or AGPL with the intention that you can profit from it but you have to give back what you built on it under the same license. LLMs allow anyone to launder copyleft code and profit without giving anything back.

        If people who downvote bothered to reply, they'd probably say that by being "absorbed" into LLM weights, the code served that purpose and is available for everyone to use. That forgets 2 critical points:

        - LLMs give no attribution. I deserve to be credited for my fractional contribution to the collective output of humanity.

        - LLMs are not intelligences, they don't suddenly make intellectual work redundant. Using them still requires work to integrate their output and therefore companies build (for-profit) products on top of my work without compensating me, without crediting me and without giving anyone the freedom to modify the code.

        • liamwire9 hours ago
          > LLMs give no attribution. I deserve to be credited for my fractional contribution to the collective output of humanity.

          I disagree, respectfully. You’re unable to accurately credit the myriad of people, sources, and works that’ve all educated and influenced you along the way. Suppose even if you could, you’d not be able to resolve their dependencies ad infinitum. Nor would it even make sense—arguably, as a species we’re defined by the compounding and shared nature of the knowledge we learn, implicitly and explicitly. Shoulders of giants, and all that.

          If your contributions were a significant portion of the entire training corpus, then perhaps we’d have a separate discussion to have, but there’s no single person for which you can reasonably argue that’s true. Maybe for authors of hyper-niche topics or cutting edge research?

        • chromanoid14 hours ago
          I am not sure if profits are easy to gather if all professional users can rent hardware instead of renting the service. In the end the users would actually profit from the model, not the provider (who competes with other providers of the same open source model). And here I am not sure if we can see AI as some kind of cybernetic enhancement that allows to execute ideas on the shoulders of humanity instead of just a scammy way to resell already present content.
          • martin-t14 hours ago
            Exactly. It's impractical to distribute compensation (and credit) fairly so it's very easy for those who profit to say "we can't", their their hands up and keep profiting. Doesn't make it right.

            It's just the rich getting richer by taking from everyone so little that no individual bothers to fight back.

            • chromanoid14 hours ago
              Why do you think the rich get richer when operating an AI is a price competition and training new models gives only a short competitive advantage? I would hope that for each individual there opens an ocean of opportunities that is sustained by all humans that came before and poured their bucket of knowledge into it.
              • martin-t13 hours ago
                1) For starters, ML companies clearly go through enormous amounts of money, the rich people in control of them very obviously get compensated quite generously, if you're into euphemisms, which the people who built the "training data" (= their copyrighted works) get nothing.

                2) It's not just ML companies but anyone using their products.

                A while ago everyone was upset that chatGPT regurgitated fast inverse square root from Quake's GPL code verbatim including a comment, clearly violating the license unless the program it generated it into was also under GPL.

                I am sure since then they've made sure the copyrighted material used as training data gets misex up a bit more thoroughly so it doesn't produce it in the output verbatim and is therefore harder to detect.

                So what if i spend a couple days writing an algorithm, give it a nice doc comment and tests, publish it under AGPL, it gets used as training data and a random programmer in a random company working on a random for-profit product runs into the same problem but instead of writing the algo himself, he asks a generative model and it produces my code just a little mixed up so it's not immediately recognizable but clearly based on my code? I deserve to be paid, that's what happens.

                • chromanoid12 hours ago
                  Algorithms should not be subject to copyright IMO. If you publish code, AFAIK under EU law only blatant copies are an infringement. In case of art, copyright is enforceable anyway, at least not in any different form than before AI.

                  Yes, current AIs just remix their training data in a rather direct way. But in the end how different is that to how humans create? I would suggest we should embrace this new way of creating things while finding laws to empower all creators and not only those who were hired by deep pockets.

                  • martin-t12 hours ago
                    > AFAIK under EU law only blatant copies are an infringement

                    Laws generally don't encode what is right but a compromise between the state's interests, lobbyists and the general population making enough ruckus if too unsatisfied.

                    > But in the end how different is that to how humans create?

                    1) Scale. Some strategies that are socially acceptable when done by individuals but not when done at a massive scale. For example because individuals have very limited time and can invest very limited effort. Looking at a website is perfectly OK. Making thousands of requests a second might be considered an attack. Human memory is limited. Similar principles apply to humans looking at code.

                    2) Source of data. Much of human "input" is viewing the real world (not copyrighted material) through their senses. Much of learning is from teachers or documentation, both of which voluntarily give me information.

                    I don't know about you but when I wanna know how to use a particular function, I don't go looking through random GH repos to see how other people use it, I go to the docs.

                    > finding laws to empower all creators and not only those who were hired by deep pockets

                    That is not even the only issue. When I publish something under AGPL, my users have the right to modify the code, even if my code gets to them through some third party. LLMs allow laundering code and taking that right from (my) users.

                    • chromanoid4 hours ago
                      > > AFAIK under EU law only blatant copies are an infringement

                      > Laws generally don't encode what is right but a compromise between the state's interests, lobbyists and the general population making enough ruckus if too unsatisfied.

                      of course, but I actually think that this is the correct moral stand. Patenting algorithms is like patenting thoughts.

                      > I don't know about you but when I wanna know how to use a particular function, I don't go looking through random GH repos to see how other people use it, I go to the docs.

                      I always look into sources. Usually I look into the code I want to call first. But this is probably also because I mainly use Java.

                      So if I read your AGPL code and implement something similar in another programming language after also reading other implementations of the algorithm, is that something I have to attribute you for? Isn't code just an executable documentation of an algorithm? Especially if the algorithm is well known, I don't see any injustice here - "Die Gedanken sind frei". If I copy your code via copy and paste, then this is an infringement, but just retelling a similar story should not be affected by copyright.

                      • martin-t1 hour ago
                        > Patenting algorithms is like patenting thoughts.

                        OK, I agree there, I should have written "function" or "module" something similar. Something that takes nontrivial amounts of work and although it is based on some general principles which should not be patentable/copyrightable, their particular implementation is novel/unique enough that it would take nontrivial amounts of work to replicate the functionality without seeing the original.

                        > is that something I have to attribute you for

                        Depends how closely you follow my implementation.

                        If you use my code as the only reference and translate it verbatim (whether manually or using a tool), then you should credit me. If you look at many implementations, form an _understanding_ of the algorithm in general, then write your own implementation based on that understanding, then probably not.

                        The question is where LLMs stand. They mix enough sources that crediting all of them would be impractical and in practise they end up crediting none. But their proponents (who always call them AI, sometimes even using pronouns like "he" to refer to the models) argue that the models also form an _understanding_ rather than just regurgitating a mix of inputs. And I have to disagree, what I see is an imitation of reasoning/understanding which is sometimes convincing due to how complex statistics are being used inside the models. But they are still just statistical models of existing content and we see that every time somebody releases a new model, HN upvotes it to the top and a few hours later we inevitable see people giving it trivial questions which it fails to answer correctly.

                        My other two points:

                        - Even if an ML company made a model that is actually intelligent, the burden of proof should be on them, otherwise or until them, it's just a remix of existing work. BTW this reminds me an interesting comparison is remixes vs cover songs in music.

                        - Code is famously harder to read than write. If a human takes time to understand a piece of code and reimplement it not verbatim, then he generally does not get ahead by much. An LLM can do this at scale and speed unattainable by humans.

                        Let's say two products compete (purely on features and quality instead of marketing - for the sake of argument). One is written first, is novel and written fully by humans. The other is written by training a model on the first product's code and using the model to generate the same product, all within hours or days instead of months or years. The other puts in less actual work but gets the same result. It is clearly parasiting on the first, benefiting from their work without giving them credit or compensation.

                        ---

                        Bottom line is copyright is meant to protect authors who invest effort into creating. Whether it succeeds in that can sometimes be questionable. But using an algorithm (even a very complex one) to take a bit of everyone's work and redistribute is for free without crediting or compensating them does not benefit authors.

                        I hate analogies but if I write banking software and send 0.000000001% of every transaction to my account, none of the individuals thusly affected probably care that much but I am still going to prison.

                        • chromanoid35 minutes ago
                          > But using an algorithm (even a very complex one) to take a bit of everyone's work and redistribute is for free without crediting or compensating them does not benefit authors.

                          I am not sure about that. As long as they also can benefit from it, it just accelerates creation of new things.

                          > I hate analogies but if I write banking software and send 0.000000001% of every transaction to my account, none of the individuals thusly affected probably care that much but I am still going to prison.

                          As long as the amount goes to everybody, I don't think this "tax" would be a problem. Human beings can only survive in a collective after all.

      • sam_lowry_14 hours ago
        Publishing weights? Meh.

        Publishing code and data would lead to abolishing copyright.

        • chromanoid14 hours ago
          Copyright is just not prepared for AI. Training with copyrighted material could become "officialy legal" under copyleft terms, at least when the amount of training material exceeds a certain threshold.
          • martin-t14 hours ago
            The issue is treating "AI" as something special. It is just derivative work.

            They are called large language _models_ for a reason. They are just statistical models of existing work.

            If anybody seriously thought they were intelligent, they'd be arguing for giving them personhood and we'd see massive protests akin to pro-life vs pro-choice. People (well, ML companies) only use the word "intelligence" as long as it suits their marketing purposes.

            • chromanoid13 hours ago
              I agree, but it's derivative work that infringes almost indiscriminately on all publicly available cultural goods and can be very useful while doing so. That's why I think copyleft is somewhat a fitting consequence.
              • martin-t13 hours ago
                So you think all work produced with the help of LLMs should by required to be open sourced under a copyleft-like license?

                That is actually an intriguing idea and at least aligned with the reasons I use AGPL for my code.

                • chromanoid13 hours ago
                  Honestly, I only thought about the models, assuming that work that uses them, will also inevitably be incorporated into them. But maybe it is a good idea to actually include the produced works. It could create a nice incentive for rich corps to pay artists to create something AI free / copyleft free.

                  If we could extract correct attribution and even licensing out of work that was produced with AI, I don't think it would help that much. I would even assume that especially in this case the rich would profit the most. They wouldn't care having to pay thousands of artists for the pixels they provided to their AI generated blockbuster movie. It would effectively exclude the poor from using AI for compliance reasons. Or even worse rich corps monopolize the training data and then they can create content practically for free, while indies cannot use AI because they would have to pay traditional prices or give the rich corps money for indirectly using their training data.

                  • martin-t13 hours ago
                    > pay artists to create something AI free / copyleft free

                    I still don't think that's enough to be fair. If their work is used to produce value ad infinitum, them any one-time payment is obviously less than what they deserve.

                    The payment should be fractional compared to the produced value. And that is very hard to do since you don't know how much money somebody made by using the model.

                    > It would effectively exclude the poor from using AI for compliance reasons.

                    Again, this only an issue if you're thinking in terms of one-time fixed payments.

                    • chromanoid13 hours ago
                      I believe you think in too short time frames. In 70 years this becomes a futile discussion. AI is a way to directly benefit from the explosion of free content that the next decades will bring. The only way to counter this in an ethical way, is to establish some kind of enforced liberation for AI models, otherwise as you say, only the rich will profit from this.
                      • martin-t12 hours ago
                        It's author's life plus 70 years, if you meant that. And TBH I am mostly interested in the author's life part anyway.

                        If it was possible to train AI models on just the public domain, them I am sure ML companies would have because it's less effort than lobbying and risking lawsuits (though I am surprised how well creators have accepted that their work is used by others to profit without any compensation, I expected way more outrage).

                        Virtually all code relevant to training code-completion LLMs is written by people still alive or dead for way less than 70 years. We can try to come up with a better system over the next decades but those people's rights are being violated right now.

                        • chromanoid4 hours ago
                          I am curious why you are so critical especially when looking at code.

                          At least for Java there are search engines to look for code that call libraries etc. Models could probably be trained on free code and then be fed with the results of these search engines on demand even through the client who calls the LLM:

                          Client -> LLM -> Client automation API -> Code search on client machine to fill context -> Code generation w/o model that is trained on the code that was found through the code search, but merely used as context

                          Even if they only feed code into it, that is freely given, I think the difference in quality of output would approach the current quality and better over time, especially when using RAG techniques like the above.

                          Companies can also buy code for feeding the model after all. So beside the injustice you directly experience right now over your own code probably being fed into AI models, do you fear/despise anything more than that from LLMs?

                          • martin-t1 hour ago
                            What is "free code"? Most code is either proprietary (though sometimes public), under a permissive license or under a copyleft license. The only free code is code which is in the public domain.

                            You could make separate versions of an LLM depending on the license of its output. If it has t produce public domain code, it can only be trained on the public domain. If if has to produce permissive code (without attribution), then it can be trained on the public domain and permissive code. If copyleft, then those two and copyleft (but it does not solve attribution).

                            > Companies can also buy code for feeding the model after all.

                            We've come a long way since the time slavery was common (at least in the west) but we still have a class system where rich people can pay a one time fee (buying a company) and extract value in perpetuity from people putting in continuous work while doing nothing of value themselves. This kind of passive income is fundamentally unjust but pervasive enough that people accept it, just like they accepted slavery as a fact of life back then.

                            Companies have a stronger bargaining position than individuals (one of the reasons beside defense why people form states - to represent a common interest against companies). This is gonna lead to companies (the rich) paying a one time fee, then extracting value forever while the individual has to invest effort into looking for another job. Compensation for buying code can only be fair if it's a percentage of the value generated by that code.

                            > So beside the injustice you directly experience right now over your own code probably being fed into AI models, do you fear/despise anything more than that from LLMs?

                            Umm, I think theft on a civilization-level scale is sufficient.

                            • chromanoid37 minutes ago
                              > Umm, I think theft on a civilization-level scale is sufficient.

                              As long as everybody can also benefit from it, I see it as some kind of collective knowledge sharing.

                              As you stated in the paragraphs before unless the wealth distribution changes, LLMs may lead to an escalating imbalance, unless the models are shared for free as soon as a critical mass of authors is involved, regardless of who owns the assets.

          • Kim_Bruning13 hours ago
            EU Copyright law actually seems to have you covered already.

            EU Digital Single Market Directive (2019/790) Art 3 and 4 allow text and data mining. Art 3 for scientific purposes, Art 4 more in general.

            Now, some people argue that AI models are somehow compressed databases of the data that was crawled; but that seems patently ridiculous to me - so this should be sufficient.(at least mathematically) (IANAL) (famous last words)

            • martin-t13 hours ago
              Not ridiculous at all. If LLMs (can we please stop calling it AI?) can produce correct factual statements (for example about historical events), then the data is clearly present in the model in some (compressed) form.

              The only question then is if the models have some kind of additional value ("intelligence") beyond being compressed databases.

              My take is that either no, or the burden of proof is on those making the claim. Until they prove it, they are just databases and therefore derivative work of their input and their output is also derivative work.

              • Kim_Bruning13 hours ago
                I think you're positing a false binary.

                * LLM's aren't databases, you can’t query them for exact stored records, and they can’t reconstruct (most of) their training data.

                * But they also don’t reason or understand exactly like humans do either.

                They're something else: to wit, Transformer models.

                • martin-t12 hours ago
                  You have a point but

                  1) I don't think being able to query them and reconstruct input 1:1 are requirements. If i build a shitty db with a buggy query language that retrieves incomplete data and occasionally mixes in data i didn't ask for, then it's still a db, just a shitty one.

                  If i populate it with copyrighted material and put it online, whether I am gonna get sued is likely based on how shitty it is, if it's good enough that people can get enough value from it that they don't buy the original works, then the original authors are not gonna be pleased.

                  2) Yes, comparisons to humans are not always useful though I'd say they don't reason or understand at all.

                  Either way the discussion should be about justice and fairness. The fact is LLMs are trained on data which took human work and effort to create. LLMs would not be possible without this data (or ML companies would train on just the public domain and avoid the risk of a massive lawsuit). The people who created the original training data deserve a fair share of the value produced by using their work.

                  So the real question to me is how much so they deserve?

                  • Kim_Bruning12 hours ago
                    It sounds like you’re sort of starting from the position that AI is inherently unjust and then reasoning backward to justify it. But shouldn’t the argument start with actual harm rather than assumed unfairness?
                    • martin-t12 hours ago
                      I wouldn't say that.

                      My point is that any situation where person A puts in a certain amount of work (normalized by skill, competence, etc.), person B uses person A's work, puts in some work of his own but less than A, then gets more reward than A, is fundamentally unfair.

                      LLMs are this, just at a massive scale.

                      But to be honest, this is where the discussion went only after thinking about it for a while. My real starting point was that when I publish my code under AGPL, I do it because I want anyone who builds on top of it to also have to release their code so users have the freedom to modify it.

                      LLMs are used to launder my code, deprive me of credit and deprive users of their rights.

                      Can we agree this is harm?

                      I also believe than unfairness is fundamentally the same as harm, just framed a bit differently.

                      • Kim_Bruning11 hours ago
                        > LLMs are used to launder my code,

                        > deprive me of credit

                        > and deprive users of their rights.

                        > Can we agree this is harm?

                        I might consider it if any of those claims were true.

                        I think the opposite is true -- especially with Open-Weight models which expand user freedoms rather than restricting them. I wonder if we can get the FSF to come up with GPL compatible Open-Weight licenses.

                        At this point in time I'm not entirely convinced they even need to. But if future lawsuits turn out that way, it might solve issues with some models.

                        • martin-t10 hours ago
                          > I might consider it if any of those claims were true.

                          Please step back and consider what you are replying to.

                          > LLMs are used to launder my code

                          If an LLM was trained only on AGPL code, would it have to be licensed under AGPL? Would its output?

                          > deprive me of credit

                          They _obviously_ deprive me of credit. Even if an LLM was trained entirely on my code, nobody using its output would know. Compare to using a library where my name is right there in the license.

                          > and deprive users of their rights.

                          I appeal to you again, re-read my comment. I am not talking about users of the model but users of the software that is in part based on my AGPL code. If my code got there traditionally by being included in a library, the whole software would have to be AGPL and users would have the right to modify. If my code is laundered through an LLM, users of my code lose that right.

                          > Can we agree this is harm?

                          So all of those things are true. And this is clearly harm.

                          Stealing a little from everyone is morally no different than stealing a lot from one person. Whenever you think about ML, consider extreme cases such as training on data under one license and all the arguments to pretend it's not copyright infringement fall apart. (And if you don't think extreme cases set precedent to real cases, then please point out where exactly you draw the line. Give me a number.)

                          Spreading the harm around means everyone is harmed similarly but that is not the kind of fairness I had in mind.

        • marssaxman14 hours ago
          Well, that sounds good - what are we waiting for?