34 comments

  • rpcope12 days ago
    > Exploiting user-generated content.

    You know, if I've noticed anything in the past couple years, it's that even if you self-host your own site, it's still going to get hoovered up and used/exploited by things like AI training bots. I think between everyone's code getting trained on, even if it's AGPLv3 or something similarly restrictive, and generally everything public on the internet getting "trained" and "transformed" to basically launder it via "AI", I can absolutely see why someone rational would want to share a whole lot less, anywhere, in an open fashion, regardless of where it's hosted.

    I'd honestly rather see and think more about how to segment communities locally, and go back to the "fragmented" way things once were. It's easier to want to share with other real people than inadvertently working for free to enrich companies.

    • dend2 days ago
      Nothing to disagree in this statement, for sure. If it's on the open internet, it will almost surely be used for AI training, consent be damned. But it feels like even at a rudimentary level, if I post a picture on my site that is then used by a large publisher for ads, I would (at least in theory) have some recourse to pursue the matter and prevent them from using my content.

      In contrast, if I uploaded something to a social media site like Instagram, and then Meta "sublicensed" my image to someone else, I wouldn't have much to say there.

      Would love someone with actual legal knowledge to chime in here.

      • chii2 days ago
        > Meta "sublicensed" my image to someone else, I wouldn't have much to say there.

        but you agreed to this, when agreeing to the TOS.

        > I post a picture on my site that is then used by a large publisher for ads, I would (at least in theory) have some recourse

        which you didn't sign any contract, and therefore it is a violation of copyright.

        But the new AI training methods are currently, at least imho, not a violation of copyright - not any more than a human eye viewing it (which you've implicitly given permission to do so, by putting it up on the internet). On the other hand, if you put it behind a gate (no matter how trivial), then you could've at least legally protected yourself.

        • DrScientist1 day ago
          > But the new AI training methods are currently, at least imho, not a violation of copyright - not any more than a human eye viewing it

          Interesting comparison - as if a human viewed something, memorized it and reproduced in a recognisable way to be pretty much the same, wouldn't that still breach copyright?

          ie in the human case it doesn't matter whether it went through an intermediate neural encoding - what matters is whether the output is sufficiently similar to be deemed a copy.

          Surely the same is the case of AI?

          • omnimus1 day ago
            This whole AI learns like a human is trajectory of thought pushed by AI companies. They at same time try to humanize AI (it learns like a human would) and dehumanize humans (humans are stochastic parrots anyway). It's if anything a distraction if not straight up anti-human.

            But you are right that copyright is complex and in the end decided by human (often in court). Consider how code infringement is not about code itself but about what it does. If you saw somewhat original implementation of something and then you rewrite it in different language by yourself there is high chance its still copyright infringement.

            On the other hand with images and art it's even more about cultural context. For example works of pop artists like Andy Warhol are for sure original works (even though some of it was disputed recently in court and lost). Nobody considers Andy Warhols work unoriginal even if it often looks very similar to some output it was riffing off because the essence is different to the original.

            Compare that to pepople prompting directly with name of artist they want to replicate. This in direct copyright infringement in both essence and intention no matter the resulting image. Also it's different to when human would want to replicate some artist style because humans can't do it 100% even if they want to. There is still piece of their "essence". There are many people who try to fake some famous artist style and sell it as real thing and simply can't do it. This is of course copyright infringement because of the intent but it's more original work than anything coming from LLMs.

            • DrScientist1 day ago
              It's both complex and extremely simple for the same reason - it's a human judgement in the end.

              Just because you can't define something mathematically, doesn't mean it isn't obvious to most people in 99% of cases.

              Reminds me of the endless games in tax law/avoidance/evasion and the almost pointless attempt to define something absolutely in words. To be honest you could simplify the whole thing by having a 'taking the piss' test - if the jury thinks you are obviously 'taking the piss' then you are guilty - and if you whine about the law not being clear and how it's unfair because you don't know whether or not you are breaking the law - well don't take the piss then - don't pretend you don't know whether something is an agressive tax dodge or not.

              If you create some fake IP, and license it from some shell company in a low tax regime to nuke your profits in the country you are actually doing business in - let's not pretend we all can't see what you doing there - you are taking the piss.

              Same goes for what some tech companies are doing right now - every reasonable person can see they are taking the piss - and high paid lawyers arguing technicalities isn't going to change that.

            • Kim_Bruning1 day ago
              > Consider how code infringement is not about code itself but about what it does. If you saw somewhat original implementation of something and then you rewrite it in different language by yourself there is high chance its still copyright infringement.

              Actually if you rewrite it in a different language, you're well on your way to making it an independent expression; (though beware Structure, Sequence and Organization, unless you're implementing an API : See Google v. Oracle). Copyright protects specific expressions, not functionality.

              > Compare that to pepople prompting directly with name of artist they want to replicate. This in direct copyright infringement in both essence and intention no matter the resulting image.

              As far as I'm aware an artists' style is not something that is protected by law, Copyright protects specific works.

              If you did want to protect artistic styles, how would you go about legally defining them?

              • omnimus23 hours ago
                The fact LLMs are generating any images is purely thanks to database of source images that are copyright protected. Its a form of sophisticated automated photobashing. Photobashing is grayzone but often legal because of the other artist doing the (often original) work.

                When you prompt for Mijazaki image this image can only exist thanks to his protected work being in database (where he doesnt want to be) otherwise the user wouldnt get Mijazaki image they wanted.

                We will see how that all plays out but i think if Mijazaki took this to court there would be solid case on grounds that the resulting images breach the copyright of the source, are not original works and are created with bad intent that goes against protections of original author.

                What seems to be current direction is atleast that the resulting images cannot be copyrighted automatically in public domain. Making it difficult to use commercially.

                • Kim_Bruning18 hours ago
                  Actually, while I just said "there is no database", maybe you're working from a very different mental model from mine...

                  What do you mean by "Database" in this context? What information do you think is being stored, (and how?)

                  • omnimus15 hours ago
                    I understand what the model is and how you get to it. I know the training data is not stored. But as far as i understand - the model is closer to derived intermediary from the training data. Like database index or like you said form of compression.

                    Thats why i on purpose tend to call trainng data + model the database. Because to non progammers it makes more sense. To me there is intentional slight of hand of hiding the fact that the only reason LLMs can work as they do now is because of the source data. The way its usually marketed it seems like the model is program that generalised principles of drawing from looking and other drawings thats why it can draw like Mijazaki when it wants to. Not that it can draw Mijazaki because it preprocessed every Mijazaki drawing, stemmed patterns out of it and can mash them with other patterns (from the database).

                    Thats why i intentionally say database to lead this discussions back to what i see is core of these technologies.

                    • chii5 hours ago
                      What you're describing as database would be what i call information.
                • Kim_Bruning19 hours ago
                  There's no such database, AFAICT.

                  If you've ever worked with open source models (eg one of the stable diffusion models or models based on them, using tools such as AUTOMATIC1111 or ComfyUI); you can inspect them yourself and simply see. If you haven't done so already, see if you can figure out the installation instructions for one of the tools and try!

                  Meanwhile ...

                  Ok, fine, I've heard some crazy compression conspiracy theories, but they're a bit too crazy to be credible.

                  I've also heard stories about these models being intelligent - a little artist living in your computer. I think that's going a bit too far in another direction.

                  In reality, I think it's better to install the software and take your time to learn about the way these models are actually built and work.

                  [ btw: If Miyazaki were to take this to court with the argument you put forward, he wouldn't get very far. "Please remove my images from your systems in whatever form you are holding them". The response for the defense would simply be: "We don't actually have them, and you are quite welcome to inspect all our systems". ]

                  (Incidentally, I've been here before. I play with synths as a hobby! ;-)

              • omnimus1 day ago
                I dont believe rewrite in different language is specific expression.

                We will see because we are well on our way of LLMs being able to translate whole codebases to different stack without a hitch. If thats OK than any of the copyleft, open-core or leaked codebases are up for grabs.

                • Kim_Bruning20 hours ago
                  A hand rewrite (or intelligent rewrite in general) will tend to become unique pretty quickly, especially when you start leaning into language features of the target language for improved efficiency. Your Structure and Organization will be different.

                  If you order an LLM (or a human) to do a straight 1:1 translation, you'll sort of pass one test (it's a completely different language after all!), but fail to show much difference wrt structure, sequence or organization. I'm also not entirely sure how good of an idea it is technically. If you start iterating on it you can probably get much better results anyway. But then you're doing real creative work!

              • 19 hours ago
                undefined
            • Terr_1 day ago
              > This whole AI learns like a human is trajectory of thought pushed by AI companies.

              My retort towards the " it would be legal if a human did it" argument is that if the model gets personhood then those companies are guilty of enslaving children.

              > Compare that to pepople prompting directly with name of artist they want to replicate.

              In that case, I would emphasize that the infringement is being done by the model, It's not illegal or infringing to ask for an unlicensed copyright infringing work. (Although it might become that way, if big corporations start lobbying for it.)

          • Kim_Bruning1 day ago
            > as if a human viewed something, memorized it and reproduced in a recognisable way to be pretty much the same, wouldn't that still breach copyright?

            > Surely the same is the case of AI?

            That's close to my position.

            Also, consider the case where you want to ask an image generator to not infringe copyright by eg saying "make the character look less like Donald Duck". In which case, the image generator still needs to know what Donald Duck looks like!

          • ToucanLoucan1 day ago
            The difference is an image generation algorithm does not consume images the way a human does, nor reproduce them that way. If you show a human several Rembrandt's and ask them to duplicate them, you won't get exact copies, no matter how brilliant the human is: the human doesn't know how Rembrandt painted, and especially if you don't permit them to keep references, you won't get the exact painting: you'll get the elements of the original that most stuck out to them, combined with an ethereal but detectable sense of their original tastes leaking through. That's how inspiration works.

            If on the other hand you ask an image generator for a Rembrandt, you'll get several usable images, and good odds a few them will be outright copies, and decent odds a few of them will be configured into an etsy or ebay product image despite you not asking for that. And the better the generator is, the better it's going to do at making really good Rembrandt style paintings, which ironically, increases the odds of it just copying a real one that appeared many times in it's training data.

            People try and excuse this with explanations about how it doesn't store the images in it's model, which is true, it doesn't. However if you have a famous painting by any artist, or any work really, it's going to show up in the training data many, many times, and the more popular the artist, the more times it's going to be averaged. So if the same piece appears in lots and lots of places, it creates a "rut" in the data if you will, where the algorithm is likely going to strike repeatedly. This is why it's possible to get full copied artworks out of image generators with the right prompts.

            • HanClinto1 day ago
              We have the problem of too-perfect-recall with humans too -- even beyond artists with (near) photographic memory, there's the more common case of things like reverse-engineering.

              At times, developers on projects like WINE and ReactOS use "clean-room" reverse-engineering policies [0], where -- if Developer A reads a decompiled version of an undocumented routine in a Windows DLL (in order to figure out what it does), then they are now "contaminated" and not eligible to write the open-source replacement for this DLL, because we cannot trust them to not copy it verbatim (or enough to violate copyright).

              So we need to introduce a barrier of safety, where Developer A then writes a plaintext translation of the code, describing and documenting its functionality in complete detail. They are then free to pass this to someone else (Developer B) who is now free to implement an open-source replacement for that function -- unburdened by any fear of copyright violation or contamination.

              So your comment has me pondering -- what would the equivalent look like (mathematically) inside of an LLM? Is there a way to do clean-room reverse-engineering of images, text, videos, etc? Obviously one couldn't use clean-room training for _everything_ -- there must be a shared context of language at some point between the two Developers. But you have me wondering... could one build a system to train an LLM from copywritten content in a way that doesn't violate copyright?

              [0]: https://en.wikipedia.org/wiki/Clean-room_design

            • chii1 day ago
              > with the right prompts.

              that is doing a lot of pull. Just because you could "get the full copies" with the right prompts, doesn't mean the weights and the training is copyright infringement.

              I could also get a full copy of any works out of the digits of pi.

              The point i would like to emphasize is that the using data to train the model is not copyright infringement in and of itself. If you use the resulting model to output a copy of an existing work, then this act constitutes copyright infringement - in the exact same way that using photoshop to reproduce some works is.

              What a lot of anti-ai arguments are trying to achieve is to make the act of training and model making the infringing act, and the claim is that the data is being copied while training is happening.

              • DrScientist1 day ago
                >The point i would like to emphasize is that the using data to train the model is not copyright infringement in and of itself.

                Interesting point - though the law can be strange in some cases - so for example in the UK in court cases where people are effectively being charged for looking at illegal images, the actual crime can be 'making illegal images' - simply because a precedence has been set that because any OS/Browser has to 'copy' the data of any image in order someone to be able to view it - any defendent has been deemed to copied it.

                Here's an example. https://www.bbc.com/news/articles/cgm7dvv128ro

                So to ingest something your training model ( view ) you have by definition have had to have copied it to your computer.

                • xp847 hours ago
                  That seems to be an artifact of the whole copyright thing predating all forms of computing and memory, but if we don’t ignore that one, we’ve all been illegally copying copyrighted text, images and videos into our RAM every time we use the Internet. So i think the courts now basically acknowledge that that doesnt count as a “copy.”

                  *Not a lawyer

          • mystified50161 day ago
            Imagine I have a shit ton of data on the books people read, down to their favorite passage in each chapter.

            I feed all of that into an algorithm that extracts the top n% of passages and uses NLP to string them into a semi-coherent new book. No AI or ML, just old fashioned statistics. Since my new book is comprised entirely of passages stolen wholesale from thousands of authors, clearly it's a transformative work that deserves its own copyright, and none of the original authors deserve a dime right? (/s)

            What if I then feed my book through some Markov chains to mix up the wording and phrasing. Is this a new work or am I still just stealing?

            AI is not magic, it does not learn. It is purely statistics extracting the top n% of other people's work.

        • entropi2 days ago
          >But the new AI training methods are currently, at least imho, not a violation of copyright - not any more than a human eye viewing it (which you've implicitly given permission to do so, by putting it up on the internet).

          I don't understand how that matters. I thought that the whole idea of copyright and licences was that the holder of the rights can decide what is ok to do with the content and what is not. If the holder of the rights does not agree to a certain kind of use, what else is there to discuss?

          It sure does not matter if I think that downloading a torrent is not any more pirating than borrowing a media from my friend.

          • chii2 days ago
            > If the holder of the rights does not agree to a certain kind of use, what else is there to discuss?

            the holder of content does not automatically get to prescribe how i would use said content, as long as i comply with the copyrights.

            The holder does not get to dictate anything beyond that - for example, i can learn from the content. Or i can berate it. Copyright is not a right that covers every single conceivable use - it is a limited set of uses that have been outlayed in the law.

            So the current arguments center on the fact that it is unknown if existing copyright covers the use of said works in ML training.

            • Copyright means the holder does automatically get to prescribe how content can be copied. That's literally the definition of copyright.

              A typical copyright notice for a book says something like (to paraphrase...) "not to be stored, transmitted, or used by or on any electronic device without explicit permission."

              That clearly includes use for training, because you can't train without making a copy, even if the copy is subsequently thrown away.

              Any argument about this is trying to redefine copyright as the right to extract the semantic or cultural value of a document. In reality the definition is already clear - no copying of a document by any means for any purpose without explicit permission.

              This is even implicitly acknowledged in the CC definitions. CC would be meaningless and pointless without it.

              • chii1 day ago
                > That clearly includes use for training, because you can't train without making a copy, even if the copy is subsequently thrown away.

                a copy for ingestion purposes - such as viewing in a browser, is not the same as a distribution copy that you make sending it to another person.

                > the right to extract the semantic or cultural value of a document.

                this right does not belong to the author - in fact, this is not an explicit right granted by the copyright act. Therefore, the extraction of information from a works is not something the author can (nor should) control. Otherwise, how would anyone learn off a textbook, music or art?

                In the future, when the courts finally decide what the limits of ML training is, may be it will be a new right granted to authors. But it isn't one atm.

              • rpdillon1 day ago
                > Any argument about this is trying to redefine copyright as the right to extract the semantic or cultural value of a document. In reality the definition is already clear - no copying of a document by any means for any purpose without explicit permission.

                I've studied copyright for over 20 years as an amateur, and I used to very much think this way.

                And then I started reading court decisions about copyright, and suddenly it became extremely clear that it's a very nuanced discussion about whether or not the document can be copied without explicit permission. There are tons of cases where it's perfectly permissible, even if the copyright holder demands that you request permission.

                I've covered this in other posts on Hacker News, but it is still my belief that we will ultimately find AI training to be fair use because it does not materially impact the market for the original work. Perhaps someone could bring a case that makes the case that it does, but courts have yet to see a claim that asserts this in a convincing way based on my reading of the cases over the past couple of years.

                • Terr_1 day ago
                  I assume the emphasis there is on training, whereas it's totally possible to infringe by running the model in certain ways later.
                  • rpdillon1 day ago
                    Agreed! My take is that usages still can infringe if the output produced would otherwise infringe. I would take the fact that you use AI as the particular tool to accomplish the infringement as incidental.
              • rcxdude1 day ago
                This a particularly extreme interpretation of copyright, and not one that has seen that much support in the courts. You can put what you like in a copyright notice or license, but it doesn't mean it'll succeed, and the courts have generally taken a dim view of any argument which relied on the fact that electronic data is technically copied many times just to make it viewable to a user. Copyright is probably better understood as distribution rights.

                (Not saying training will necessarily fall in the same boat, just saying that the view 'copying to a screen or over the internet is necessarily a copy for the purposes of copyright' is reductive to the point of being outright incorrect)

            • chromanoid2 days ago
              yeah, it is called _copy_ right. The question is, if AI is making obfuscated copies or not.

              interestingly in German it is not called copyright, but Urheberrecht "authors rights". So there the word itself implies more things.

              BTW at least in Germany you can own image rights of your art piece or building that is placed in a public place.

          • Terr_1 day ago
            Not quite, it is (at least in the US) a limited privilege to control the copying and reproduction.

            If you make a movie poster, and it goes out into the market, and then someone picks it up from a garage sale, copyright still applies, they can't just make tons of duplicates.

            But you can't use copyright to force them to display it right side up instead of upside down, to not write on it, to not burn it, and to not make it into a bizarre pseudosexual shrine in their basement.

        • ehnto2 days ago
          Strong disagree on the last paragraph. It's data online, your data, and it was used for commercial purposes without your consent.

          In fact, I never consented for anyone to access my server. Just because it has an IP address, does not make it a public service.

          Obviously in a practical sense that is a silly position to take, and in prior cases there is usually an extenuating factor that got the person charged, eg breaking through access controls, violating ToS, or intellectual property violations.

          But I don't rescind the prior statement. Just because I have an address doesn't mean you can come in through any unlocked doors.

          • ahtihn2 days ago
            > In fact, I never consented for anyone to access my server. Just because it has an IP address, does not make it a public service.

            If you don't take any steps to make it clear that it's not public, like an auth wall or putting pages on unguessable paths, then it is public, because that is what everyone expects.

            Just like you if you have a storefront, if the door is unlocked you'd expect people to just come in and no one would take you seriously if you complain that people keep coming in if you don't somehow make it clear that they're not supposed to.

            • DrScientist1 day ago
              Your shop might be open sure - but aren't we talking about people coming in and taking whatever they like for free?

              ie if you were an art gallery, the expectation would be people could come in and look, but you don't expect them to come in, photograph everything and then sell prints of everything online.

              • chii1 day ago
                That's not what's happening.

                Instead, it's that there's some people coming into your gallery, studying the art and its style, and leaving with the learned information. They then replicate that style in their own gallery. Of course, none of the images are copies, or would be judged to be copies by a reasonable person.

                So now you, the gallery owner, want to forbid just those people who would come to learn the style. But you still want people to come and admire the art, and may be buy a print.

                • DrScientist1 day ago
                  > Of course, none of the images are copies, or would be judged to be copies by a reasonable person.

                  That's the fiction of course.

                  Tell me how something like ChatGPT can simultaneously claim to return accurate information while at the same time being completely independent from the sources of the information?

                  In terms of images - copyright isn't only for exact copies - it if was then humans would have been taking the piss by making minor changes for decades.

                  Sure you could argue some is fair use with genuinely original content being produced in the process, but I think you are also overlooking an important part of what's considered 'fair' - industrialised copying of source material isn't really the same in terms of fairness as one person getting inspiration.

                  Taking the Encylopedia Britanica and running it though an algorithm to change the wording, but not the meaning, and selling it on is really not the same as a student reading it and including those facts in their essay - the latter is considered fair use, the former is taking the piss.

                  • chii1 day ago
                    > ChatGPT can simultaneously claim to return accurate information while at the same time being completely independent from the sources of the information?

                    why can't that be true? Information is not copyrightable. The expression of information is. If chatGPT extracted information from a source works, and represent that information back to you in a form that is not a copy of the original works, then this is completely fine to me. An example would be a recipe.

                    • DrScientist1 day ago
                      So you think taking something like the Encylopedia Britanica, running it through a simple rewording algorithm, and selling it on is totally 'fair use'?

                      Taking all newspaper and proper journalistic output and rewording it automatically and selling it on is also 'fair use'?

                      Stand back from the detail ( of whether this pixel or word is the same or not ) and look at the bigger picture. You still telling me that's all fine and dandy?

                      I think it's obviously not 'fair use'.

                      It means the people doing the actual hard graft of gathering the news, or writing Encylopedias or Textbooks won't be able to make a living so these important activities will cease.

                      This is exactly the scenario copyright etc exists to stop.

                      • chii1 day ago
                        > Taking all newspaper and proper journalistic output and rewording it automatically and selling it on is also 'fair use'?

                        it would be, if the transformation is substantial. If you're just asking for snippets of existing written works, then those snippets are merely derivative works.

                        For example, if you asked an LLM to summarize the news and stories of 2024, i reckon the output is not infringing. Because the informational contents of the news is not itself copyrightable, only the article itself. A summary, which contains a precis of the information, but not the original expression, is surely uncopyrightable - esp. if it is a small minority of the source (e.g., chatGPT used millions of sources).

                        > won't be able to make a living so these important activities will cease.

                        this is irrelevant as far as i'm concerned. They being able to make or not make a living is orthogonal. If they can't, then they should stop.

                    • 1 day ago
                      undefined
          • yencabulator1 day ago
            Content is often publicly available and copyright protected. Paint a mural near a busy street. No locked door in that metaphor; locked door would be password protected site.
        • Terr_1 day ago
          If they aren't a violation of copyright, then I want to see what happens when people are trading around models and "prompts" that describe recently released movies and music sufficiently that it competes with the original.

          Not necessarily because I like either "we monetize public work" or "copyright robber-barons", but I'd like at least one of them to clearly lose so that the rest of us have clear and fair rules to work with.

        • 1 day ago
          undefined
        • PittleyDunkin2 days ago
          > but you agreed to this, when agreeing to the TOS

          The legal definition of agreement means basically zilch

        • immibis2 days ago
          > but you agreed to this

          Yes, that was the point? You agree to this by using Meta. So don't.

      • 2 days ago
        undefined
    • immibis2 days ago
      For images, there's Nightshade, which imperceptibly alters your images but makes them poison for AI (does anyone understand why?)

      I don't know if there's something similar for text. You could try writing nonsense with a color that doesn't contrast with the background.

      The evidence Nightshade works is that AI companies want to make it illegal.

      • rcxdude1 day ago
        Nightshade and glaze are basically adversarial attacks on various commonly used subcomponents of image generators, most notably the CLIP image captioner, which is both used to generate training data and as part of the generation process.

        Like most adversarial attacks, they get more perceptible as they try to be robust to more transformations of the data (both in practice, i.e. applied to a level that's non-trivially removable, tend to make images look like slightly janky AI, ironically), and they are specific to the net(s) they are targeting, so it's more of a temporary defense against the current generation then a long-term protection.

      • kelseydh1 day ago
        Link to Nightshade: https://nightshade.cs.uchicago.edu/whatis.html

        This is fascinating. Would be great to have a web interface artists can use that doesn't require them to install the software locally.

    • matheusmoreira2 days ago
      > I can absolutely see why someone rational would want to share a whole lot less, anywhere, in an open fashion, regardless of where it's hosted.

      I've reached the same conclusion.

      All data is just bits. Numbers. Once it's out there, trying to control their spread and use is just delusional. People should just stop sharing things publicly. Even things like AGPLv3 are proving to be ineffective against their exploitation.

      I really didn't expect to live in this "copyright for me, not for thee" world. The same corporations that compare us mere mortals to high seas pirates when we infringe their copyrights are now getting caught shamelessly AI laundering the copyrights of others on an industrial scale.

      It's so demoralizing. I feel like giving up and just going private. Problem is I also want to share the things I made. To talk about my projects with real people. Programming is lonely enough as it is. Without sharing I'm not sure what the point even is. I have no idea what I'm supposed to do from now on. I just know I don't want to end up working for free to enrich trillion dollar corporations.

      • dend2 days ago
        I can relate to the sentiment. For what it's worth, I also know that if someone's personal site/repos/pictures are used to train AI, they have no recourse short of said person having TONS of money to go and fight the legal battles similar to how media companies do.

        But you know what, I grew up in a family of educators whose whole life mission was to help others by sharing their knowledge. That's what I am doing through my blog. I learned something? Blog about it. I built a reverse-engineered wrapper over some API? Share it openly. For every AI ingress job over this content there will be a few people that will read my code or blog post and either learn from it, be inspired, ignore it, or unblock themselves from a problem that they tried to solve. I think that makes the effort worth it to me.

        For what it's worth, even before AI emerged, I've seen sites that would shamelessly rip off my content and re-publish it on their own domains under a different author. One even tried charging people for it. On several occasions I fought it and won with the help of Google/Bing. Other times, nothing happened. And that's fine. Such is the fate of online content. If my content helped at least one person, it was worth sharing it in the open.

      • pjc501 day ago
        > I really didn't expect to live in this "copyright for me, not for thee" world

        Having been interested in copyright activism for two decades, that's exactly what I expected. Copyright is very much about power, and concentration of power.

      • bulatb2 days ago
        Yeah, I hear this. Anything I put online is feeding the machine that will replace me.

        Maybe I can carve myself a niche if I can find an audience, and maybe turn that into something kind of reward-shaped, but that's not happening without me feeding the machine. And almost certainly I won't succeed, and I'll just make it harder for myself and everyone like me to succeed in the future.

        It seems the only thing to do is do it anyway and try to be unique enough to make it work. And somehow just be fine with pulling up the ladder behind you.

        • matheusmoreira2 days ago
          Yeah, I'm trying too. Specifically, the GitHub Sponsors thing.

          I'm opposed to advertising and don't want to inflict it on others. So I don't generally advertise my work on sites like this one, I just participate in threads about it whenever I see them.

          Somehow people found my projects and posted them here. Just woke up one day and saw I had one sponsor. Not gonna lie, I'm still amazed about it. Not even close to providing for my family despite an incredibly favorable exchange rate, so I can't work full time on my projects. It's still the only thing that gives me hope right now. Really thankful to that person.

          > And somehow just be fine with pulling up the ladder behind you.

          Do you really think it will come to that? I mean, this AI situation has got to come to a head at some point. We can't have these corporations defending copyright and simultaneously pretending it doesn't exist while exploiting software developers. One of those things has got to go away.

          • notpushkin2 days ago
            > So I don't generally advertise my work on sites like this one

            Please do — I for one always love to hear about indie projects, if they are relevant to the topic discussed.

        • sneak1 day ago
          If a machine can replace a creative, the creative isn’t creative and should be replaced.
      • alisonatwork2 days ago
        Who cares? Information wants to be free. You put your stuff out there for free, it's hoovered up and sold back to you by capitalists, that sucks, but you've still made a real contribution to society. Meanwhile a select few will still find your stuff directly. Maybe what you shared will make just one person's life a little bit better, and that was your impact - you made a difference! Capitalists will never have that feeling, because anyone consuming their repackaged content is paying for the privilege - any benefit to society is just an incidental side-effect of their greed. Sucks to be them.

        The way I see it, this is exactly what life is about. Do you want to make a positive impact in society? Then share your knowledge, your experiences, your creations. People will try to capitalize on your work, and they might even get rich from it, but oh well. It doesn't take away from your own contribution to the ongoing story of humanity.

        I don't have or want kids, but I see my existence in society and free contributions to the "collective consciousness", such as it is, as my legacy. For me that's comforting. I'm choosing to be part of something bigger. If I just disappeared from society and lived like a hermit, or if I buried myself completely in my day job working for capitalists and not producing anything outside of that, I think I'd lose my sense of meaning.

        • throwaway23711 day ago
          Everyone who put their stuff out there, on the Internet, has contributed to the AI Leviathan. Maybe the end result will be a utopia, maybe it'll be a dystopia. It's definitely too soon to say producing content for the AI titans to consume is a positive impact on society.
      • immibis2 days ago
        There are two ways we can go from "copyright for me, no copyright for thee"

        We can force it to "copyright for me, copyright for thee" by injecting AI poison and by not sharing at all. See Nightshade.

        Or we can force it to "no copyright for me, no copyright for thee" by ignoring their copyright just like they ignore ours, and making sure they don't find us. See Anna's Archive.

        • notpushkin2 days ago
          We can also do “copyright for thee, no copyright for me”! It does sound a bit hypocritical, but until we see where the copyright needle goes this might be the safest option.
    • alibarber2 days ago
      Based on my experience, I've found that I like using AI (GitHub copilot) to do things like answer questions about a language that I could easily verify in the documentation. Almost basically 'yes/no' questions. To be honest if I were writing such documentation for a product/feature, I wouldn't mind the AI hoovering it up.

      I've found it to be pretty crap at doing things like actual algorithms or explaining 'science' - the kind of interesting work that I find on websites or blogs. It just throws out sensible looking code and nice sounding words that just don't quite work or misses out huge chunks of understanding / reasoning.

      Despite not having done it in ages, I enjoy writing and publishing online info that I would have found useful when I was trying to build / learn something. If people want to pay a company to mash that up and serve them garbage instead, then more fool them.

      • namaria1 day ago
        I've argued years ago based on how LLMs are built that they would only ever amount to lossy and very memory inefficient compression algorithms. The whole 'hallucination' thing misses the mark. LLMs are not 'occasionally' wrong/hallucinating sometimes. They can only ever return lower resolution versions of what was on their training data. I was mocked then but I feel vindicated now.
        • richardwhiuk1 day ago
          They can combine two things in a way that never appeared together in the source material.
          • namaria1 day ago
            Youtube compression algorithm also produces lots of artifacts that were never filmed by the video producers
            • wizzwizz41 day ago
              And datamoshing lets you produce effects that weren't in the source clips.
    • baxtr2 days ago
      Part of me viscerally agrees because large corporations have monetized UGC.

      Another part of me though thinks differently. We are a species that builds knowledge from generation to generation. From one person to another. Over years, over centuries.

      Philosophically this part tends to think that your thoughts and ideas belong to humanity and thus need to be shared with all of us.

      • friendzis2 days ago
        If you recall high school history, rapid, exponential "progress" happened once the knowledge was 1) written down (printing press) 2) archived for the future (libraries) 3) systematized (textbook/encyclopaedia) 4) proactively shared (public education), all on a massive scale.

        The fact that some knowledge exists and is even accessible does not really matter if takes a highly trained in a very narrow field scholar to find that piece of information. You need a well established knowledge creation and distribution funnel in operation for humanity as a whole to reap the benefits of knowledge.

        There is undoubtedly a lot of useful knowledge on internet platforms, however, most of that knowledge remains unsystematized and largely undiscoverable, meaning that contribution to the totality of human knowledge by these platforms is infinitesimal, which is further drowned by cat and porn videos.

        • TeMPOraL1 day ago
          Now we have 5) aggregated and internalized as a whole by computational constructs such as LLMs, which are - 4) - proactively shared (open weights, but also freemium service and dirt-cheap API access to commercial SOTA models), still on a massive scale.

          > There is undoubtedly a lot of useful knowledge on internet platforms, however, most of that knowledge remains unsystematized and largely undiscoverable, meaning that contribution to the totality of human knowledge by these platforms is infinitesimal, which is further drowned by cat and porn videos.

          Precisely that. Which is why I often argue, that for 99%+ of the content in the training data, its marginal contribution to the training process - itself infinitesimal in isolation - is still by far the most value that content will ever bring to the world.

      • yencabulator1 day ago
        The compromise that was supposed to be in place was strong, short term, copyright protection, to help the author (a person) financially during their lifetime. That compromise was destroyed by rich people using corporations as owners and extending copyright duration.

        https://en.wikipedia.org/wiki/Copyright_Term_Extension_Act

      • Salgat2 days ago
        There's two decades worth of countless conversations on Reddit alone that would be buried into nothingness but instead ML has revived all that activity as useful data. ML is definitely a great way to bring back utility for a lot of old and unused data.
        • Terr_1 day ago
          > revived that activity as useful data

          Revived as compressed text associations, it is potentially useful data, but also potentially totally wrong in non-obvious ways. (Or, to riff on Futurama, "The worst kind of incorrect.")

          • Salgat1 day ago
            It is used to help train the LLMs on how to "talk" like normal people, even if the topic they're discussing isn't that useful or valuable.
        • tempestn2 days ago
          This seems like a reasonable take to me. I wish those downvoting you would explain where they disagree.
      • yowayb2 days ago
        Great take. Also agree with parent. I feel like some form of provenance would take us to the next level.
    • Terr_2 days ago
      IANAL, but lately I've had this quixotic daydream of a combination accept-cookies / agree-to-TOS page that comes up, and the Terms of Service says by proceeding they agree to give the site-owner an perpetual, irrevocable, and royalty-free to use and re-license any future content that they create using any generative AI that was trained using the website contents.

      Then you carefully log what LLM user-agents/IPs go past that agree, along with some very distinctive secretly crawlable pages which have contents that can be distinctively reproduced back out of the model if needed.

      Then whenever SomeShittyLLM posts "articles", everybody with that TOS that was crawled gets to duplicate it without ads for free. :P

      • w42 days ago
        This idea is reminiscent of the opening scene of Accelerando by Charlie Stross:

        Are you saying you taught yourself the language just so you could talk to me?"

        "Da, was easy: Spawn billion-node neural network, and download Teletubbies and Sesame Street at maximum speed. Pardon excuse entropy overlay of bad grammar: Am afraid of digital fingerprints steganographically masked into my-our tutorials."

        "Uh, I'm not sure I got that. Let me get this straight, you claim to be some kind of AI, working for KGB dot RU, and you're afraid of a copyright infringement lawsuit over your translator semiotics?"

        "Am have been badly burned by viral end-user license agreements. Have no desire to experiment with patent shell companies held by Chechen infoterrorists. You are human, you must not worry cereal company repossess your small intestine because digest unlicensed food with it, right?”

        - https://www.antipope.org/charlie/blog-static/fiction/acceler...

        Amusing to also note that this excerpt predicted the current LLM training methodology quite well, in 2005.

        • Terr_1 day ago
          More inspired by the GPL, I think, although the sketch above doesn't force the writer to put things into the public domain.

          I'm imagining a separate declaration of: "Content I can sublicense from ShittyNewsLLM--which is everything made by their model--is now public-domain through me until further notice", without any need to identify specific items or rehost it myself.

          I suppose the counterstrike would be for them to try to transform their own work and argue what they finally released contains some human spark that wasn't covered by the ToS, in which case there may need to be some "and any derivative work" kinda clause.

          I wonder if some organization (similar to the Open Software Foundation) could get some lawyers and web-designers together to craft legally-sound site-design rules and terms-of-service, which anyone could use to protect their own blogs or web-forums.

        • TeMPOraL1 day ago
          Also amusing:

          > patent shell companies held by Chechen infoterrorists

          This perfectly captures how both patent trolls and MAFIAA look like in my mind.

      • spiritplumber2 days ago
        I love this, I did something like that with made-up-italian-sounding words a while ago (you used to be able to find my site if you looked for FANTACHIAVE).

        It's a bit like fake roads on map databases)

    • ehnto2 days ago
      I have decided not to put text online if I feel it has IP or personal ideas in it. Some exceptions, like posting here, and including stuff I want to get out there for commercial reasons eg: marketing of services. The one I struggle with is discord, but I am not too personal on discord servers so I suppose I'll just mesh into the soup of barely worthwhile chatter.

      I also started self hosting my git repos and knowledge base, both were trivial to set up.

    • blahblah232 days ago
      There's a neat little thing some discords I've seen use, where they honeypot spam bots into a channel-- if someone posts into it, their messages in the last 5 minutes get deleted and their account gets kicked.

      Is there a meaningful way to make it so a website shares a resource that automatically updates their blacklist to block the IP address? Knowing that you will lose X but hopefully you'll retain everyone who can read?

      • notpushkin2 days ago
        Most scrapers use residential IPs nowadays. They will just rotate their IP and go on, while the IP you banned would get assigned to an innocent user that won’t be able to access your site now.
      • I saw a Discord server use this but it never actually caught anything. Turns out all the spammers were just human idiots!
    • couscouspie1 day ago
      The low adoption rate is another big plus of the Gemini protocol and similar solutions for a Javascript-free and open internet.
    • foxglacier1 day ago
      What is your purpose of publishing where having the content used to train AI is a problem? Are you trying to gatekeep information that's not even protected by copyright anyway? Are you worried your potential audience will get the same thing (including your personal creativity) from AI that just copied your work so they don't recognize your name and develop some brand awareness? Or do you just not like AI and don't want to help it? Maybe you could build your own paywall or other technical access restriction instead of making it freely available? Even just a captcha should block AI training scrapers, shouldn't it?
      • asdff1 day ago
        No attribution with ai
        • foxglacier1 day ago
          But also no direct copying, typically. Would OP really be happy with an AI that reliably rewords everything so that attribution is not required but still reproduces the information?
          • rchaud15 hours ago
            Human educational systems penalize you for not citing sources (i.e. not showing your work, not crediting referenced works). Why should an AI system be exempt?
    • 1 day ago
      undefined
    • dehrmann2 days ago
      > used/exploited by things like AI training bots

      How is this worse than a human reading your blog/code, remembering the key parts of it, and creating something transformative from it?

      • dend2 days ago
        In the grand scheme of things and at this point, it probably doesn't matter. I know for me it certainly is not in any shape a discouragement to continue writing on my blog and contributing code to open source communities (my own and others).

        But if we're going to dig into this a bit, one person reading my code, internalizing it, processing it themselves, tweaking it and experimenting with it, and then shipping something transformative means that I've enhanced the knowledge of some individual with my work. It's a win. They got my content for free, as I intended it to be, and their life got a tiny bit better because of it (I hope).

        The opposite of that is some massively funded company taking my content, training a model off of it, and then reaping profits while the authors don't even get as much as an acknowledgement. You could theoretically argue that in the long-run, a LLM would likely help other people through my content that it trained on, but ethically this is most definitely a more-than-gray area.

        The (good/bad) news is that this ship has sailed and we now need to adjust to this new mode of operation.

        • dehrmann2 days ago
          > The opposite of that is some massively funded company taking my content, training a model off of it, and then reaping profits while the authors don't even get as much as an acknowledgement.

          Taking out the "training a model" part, the same thing could happen with a human at the company.

          • dend2 days ago
            Oh, 100%. I mentioned this in another comment (https://news.ycombinator.com/item?id=42582518) - I've dealt with a fair share of stolen content (thankfully nothing too important, just a random blog post here and there), and it definitely stings. The difference is that this is now done at a massive scale.

            But again - this doesn't stop me from continuing to write and publish in the open. I am writing for other people reading my content, and as a bouncing board for myself. There will always be some shape or form of actors that try to piggyback off of that effort, but that's the trade-off of the open web. I am certainly not planning to lock all my writing behind a paywall to stop that.

          • thieaway39472 days ago
            This is already a scenario that people generally accept as bad, could you elaborate the point you are making?
      • camgunz1 day ago
        One of the--admittedly many--things that puts me off AI is the pitch starts off as "you will have abilities you never had before and probably never would have had, be excited", then when critics are like, "woof it's a little worrying you can <generate a million deepfakes>, <send a million personalized phishing emails>, <scrape a million websites and synthesize new ones>, etc.", the pitch switches to "you could always have done this, calm down".

        The whole point of software engineering is to do stuff faster than you could before. It is THE feature. We could already add, we could already FMA, we could already do matrix math, etc. etc. Doing it billions of times faster than we could before at far less energy expenditure--even including what it takes to build and deliver computers--has led to an explosion of productivity, discovery, and prosperity. Scale is the point. It changes everything and we know it; we shouldn't pretend otherwise.

      • mrweasel2 days ago
        Attribution. If you read a book, blog or code and others ask you where you got your ideas/inspiration you can refer them back to the original author. This helps people build a reputation. Even if it's just happens every once in a while, it still helps the original author.

        Once an AI has hoover up your work and regurgitated it as it's own, all links back to the original creator is lost.

      • ykonstant2 days ago
        Scale makes all the difference in the world.
      • sifar2 days ago
        Scale.
      • moron4hire2 days ago
        Seriously? How is rule utilitarianism different from act utilitarianism?
    • pixelmonkey1 day ago
      I think you're right, and I don't think it's just about public content being "exploited" to train AI models and the like. Rather, even before LLMs, there was a growing sense that publishing ideas or essays publicly is "risky" with very little reward for the very real risks.

      I wrote about this a little in "The Blog Chill":

      https://amontalenti.com/2023/12/28/the-blog-chill

      Speaking personally, among my social circle of "normie" college-educated millennials working in fields like finance, sales, hospitality, retail, IT, medicine, civil engineering, and law -- I am one of the few who runs a semi-active personal site. Thinking about it for a moment, out of a group of 50-or-so people like this, spread across several US states, I might be the only one who has a public essay archive or blog. Yet among this same group you'll find Instagram posters, TikTok'ers, and prolific DM authors in more private spaces like WhatsApp and Signal groups. A handful of them have admitted to being lurkers on Reddit or Twitter/X, but not one is a poster.

      It isn't just due to a lack of technical ability, although that's a (minor) contributing factor. If that were all, they'd all be publishing to Substack, but they're not. It's that engaging with "the public" via writing is seen as an exhausting proposition at odds with everyday middle class life.

      Why? My guesses: a) smartphones aren't designed for writing and editing, hardware-wise; b) long-form writing/editing is hard and most people aren't built for it; c) the dynamics of modern internet aggregation and agglomeration makes it hard to find independent sites/publishers anyway; and d) the risk of your developed view on anything being "out there" (whether professional risk or friendship risk) seems higher than any sort of potential reward.

      On the bright side, for people who fancy themselves public intellectuals or public writers, hosting your own censorship-resistant publishing infrastructure has never been easier or cheaper. And for amateur writers like me, I can take advantage of the same.

      But I think everyday internet users are falling into a lull of treating the modern internet as little more than a source of short-form video entertainment, streams for music/podcasts, and a personal assistant for the sundries of daily life. Aside from placating boredom, they just use their smartphones to make appointment reminders, send texts to a partner/spouse, place e-commerce orders, and check off family todo lists, etc. I expect LLMs will make this worse as a younger generation may view long-form writing not as a form of expression but instead as a chore to automate away.

    • sneak1 day ago
      Copying data isn’t exploitation.
      • 1 day ago
        undefined
    • TeMPOraL2 days ago
      > even if you self-host your own site, it's still going to get hoovered up and used/exploited by things like AI training bots. I think between everyone's code getting trained on, even if it's AGPLv3 or something similarly restrictive, and generally everything public on the internet getting "trained" and "transformed" to basically launder it via "AI", I can absolutely see why someone rational would want to share a whole lot less, anywhere, in an open fashion (...)

      > (...) share with other real people than inadvertently working for free to enrich companies.

      That attitude, quite commonly expressed on HN these days, strikes me as a peculiar form of selfishness - the same kind we routinely accuse companies of and attribute the sad state of society to.

      A person is not entitled to 100% of the value of everything they do, much less to secondary value this subsequently generated. A person is not entitled to receive rent for any of their ideas just because they wrote them down and put on display somewhere. Just because they touched something, and it exists, doesn't mean everyone else touching it owes them money.

      The society works best when people don't capture all the fruits of their labor for themselves. Conversely, striving to capture 100% (or more) of the value generated is a hallmark of the late stage capitalism and everything that's bad and wrong and Scrooge-y.

      Self-censoring on principle because some company (gasp!) will train an LLM model on it (gasp!!) and won't share the profit from it? That's just feeling entitled to way over 100% of the value of one's hypothetical output, and feeling offended the society hasn't already sent advance royalty cheques.

      Chill out. No matter what you do, someone else will somehow make money out of it, that's how it supposed to work - and AI in particular is, for better or worse, one of the most fundamentally transformative things to happen to humanity, somewhere between the Internet and the Industrial Revolution if it's just a bubble that pops, much more if it isn't. Assuming it all doesn't go to shit (let's entertain something more than maximum pessimism for a moment), everyone will benefit much more from it than from whatever they imagine they could get from their Internet comments.

      (Speaking of Industrial Revolution - I can understand this attitude from people who actually earn a living from the kind of IP that AI is trained on, only to turn around and compete with them. They're the modern Luddites, and I respect their struggle and that they have a real point. Everyone else, those complaining about "AI theft" the most, especially here? Are not them.)

      • johnklos1 day ago
        > The society works best when people don't capture all the fruits of their labor for themselves.

        Sure, but it sounds like you think people shouldn't be upset about businesses trying to capture all the fruits of people's labor, too.

        Capitalism is evil, and people thinking that normalizing exploitation is OK is either shortsighted or it's also evil. Are you simply unaware that this is what's happening and what people are upset about? Have you never thought about it? Or do you want businesses to succeed in exploiting people's work? It sounds like it, because you wrote, "that's how it supposed to work".

        I truly wonder if you're self-aware, or if you just think that you'll one day be on the side of the exploiters.

      • rpdillon1 day ago
        This post deserves more attention, I think. It's occurred to me as well.

        Over the holidays, my father gave my children a book that he had written. It was a photo essay that was 50 pages, and it was titled 'Sharks'. It's an unpublished labor of love that he spent about 500 hours on.

        It's a true story centered on Captain Frank Mundus, who operated the Cricket II. He was a renowned shark fisherman and would take people out to fish for enormous sharks. He did this for 40 or 50 years.

        An author by the name of Peter Benchley wrote a novel that was heavily inspired by many of Frank's traits, his mannerisms, his approach to shark fishing, the kind of boat he had, the kind of charters he ran. The novel was titled 'Jaws' and received little attention when it was first released. A while after, a director by the name of Steven Spielberg took notice of it and turned it into a multi-million dollar blockbuster movie.

        My father was a lawyer that Frank Mundus consulted with and asked, is there any way that he could get a payout for being the inspiration for this character?

        My family read the book over the holidays, and it was clearly my father's position that Steven Spielberg and Peter Benchley were maybe the sharks that the title of the book was talking about. The idea that they could make $100 million based on the work and life of this captain and give him literally nothing in return, not even attribution, seemed wrong to him.

        I was the lone detractor in the room. My take is that Captain Frank Mundus was just living his life. He was doing what he did to make money chartering fishing trips for sharks. He would have done this regardless of whether or not a writer had come along or a movie had come along. What Peter Benchley and Steven Spielberg did is they found value in his work that he didn't know existed and that he wasn't capable of extracting. I think this is generally true of artists. They wander the world and they create art that gives the viewer a new insight into the experiences the artist had. If artists had to give money back to every real-life inspiration, I think the whole system wouldn't work.

        I see parallels with the current attitudes toward AI. I think writers are a lot like Captain Mundus. They're living their life, they're writing their stories, or doing their research and publishing, and having people read their works. And copyright is helping them do all this.

        AI companies have come along and found value in their work that they didn't know existed and they were never capable of extracting. And that's OK: that's what innovation is, taking the work that others have done and building on it to create something new.

        I'm not unequivocally in favor of all applications of AI, but I do think there are tons of places that can be super helpful and we should allow it to be helpful. One example: I'm drafting this on my phone using Futo keyboard entirely with my voice. Extremely useful, but no doubt trained on copyrighted content.

      • yokem551 day ago
        The dilemma here is that the incentive to capture value for yourself comes from the legitimate fear that someone else will try to capture all that residual value you leave on the table instead of allowing that value to be socialized in a healthy way. Which means enshitification becomes the default for everyone.
    • CamperBob22 days ago
      You know, if I've noticed anything in the past couple years, it's that even if you self-host your own site, it's still going to get hoovered up and used/exploited by things like AI training bots.

      So? What do I care? If some stuff I posted to my website (with no requirement for attribution or remuneration, and also no guarantee that the information is true or valid) can improve the AI services that I use, great.

      • bulatb2 days ago
        Wouldn't you feel just a little bad if you worked really hard on something, gave it out for free in the spirit of sharing, and someone came along and said thanks, loser, and sold it for money? Would you want to go on making it for free for them to sell?
        • CaptainFever2 days ago
          No, not really? If others can get my stuff for free, then that means that whoever sells it for money must have done something to make it worth money. So they've earned it.
        • prmoustache1 day ago
          The ones who really lose are the one who buy their stuff while yours stays free.
        • CamperBob21 day ago
          No. If I cared, I wouldn't have posted the information in the first place... or I would have erected a paywall.
      • liontwist2 days ago
        Even if no attribution etc is your personal policy that’s not everyone else’s.

        The end result is that any authors who care about copyright protection will become less accessible. It’s a gold rush for AI bots to capture the good will of early internet creators before the well runs dry,

        • dend2 days ago
          +1

          My content is still MY content, and I'd prefer that if an entity is going to make money off of it directly (i.e., it's not a person learning how to code from something I wrote but rather a well-funded company pulling my content for their gain), that I at least have some semblance of consent to it.

          That being said, I think there is no longer a point of crying over spilled milk. The LLM technology is out of the bag, and for every company that attempts to ethically manage content (are there any?) there will be ten that will disregard any kind of license/copyright notices and pull that content to train their models anyway.

          I write because I want to be a better writer, and I enjoy sharing my knowledge with others. That's the motivation. If it helps at least one person, that's a win in my book, especially in the modern internet where there's so much junk scattered around.

          • tonyedgecombe21 hours ago
            Pretty much all of my work has been published on the internet over the last twenty years. Some of it has been commercial, some open source and some that is just for myself.

            I’m pretty much done with that now, I doubt I will publish anything online again.

        • CamperBob21 day ago
          Even if no attribution etc is your personal policy that’s not everyone else’s.

          That's up to the courts. As usual, we will all lose if the copyright maximalists win.

          • tonyedgecombe21 hours ago
            To me it looks like individual creators are the ones most likely to lose.

            I was watching an interview with John Warnock (one of the founders of Adobe) and he was proud of the fact that the US went from having 25,000 graphic designers to 2,500,000 largely thanks to software his company created.

            I do wonder if we are on the verge of reversing that shift.

            • CamperBob216 hours ago
              The question you should be asking is if we need 2,500,000 graphic designers. Humans have a higher purpose than doing a robot's job.
          • liontwist1 day ago
            Last I checked creators of a work held copyright for it and that hasn’t changed. So no, this is not a new legal question
            • CamperBob21 day ago
              That's not how copyright law works.

              That's not how anything works.

              • liontwist1 day ago
                Ok. Thanks for your contribution to the discussion.
      • AnthonyMouse2 days ago
        I think the source of the contrary sentiment goes something like this: AI stuff (especially image generation) is competition for artists. They don't much like competition that can easily undercut them on price, so they want to veto it somehow and lean on their go-to of accusing anybody who competes with them of theft.

        The problem in this case is that it doesn't matter. The AI stuff is going to exist, and compete with them, whether the AI companies have to pay some pittance for training data or not.

        But the chorus is made worse by two major factors.

        First, many of the AI companies themselves are closed-source profiteers. "OpenAI" stepping all over themselves to be the opposite of their own name etc. If all the models got trained and then published, people would be much more inclined to say "oh, this is neat, I can use this myself and it knows my own work". But when you have companies hoovering everything up for free and then trying to keep the result proprietary, they look like scumbags and that pisses people off.

        Second, then you get other opportunistic scumbags who try to turn that legitimate ire into their own profit by claiming that training for free should be prohibited so that only proprietary models can be created.

        Whereas the solution you actually want is that anybody can train a model on public data but then they have to publish the model/weights. Which is probably not going to happen because in practice the law is likely to end up being what favors one of the scumbags.

        • dend2 days ago
          I think that's an overly reductive way of looking at it. Artists, are by their definition, creators of art. AI-generated "art" (it's not art at all in my eyes) is effectively a machine-based reproduction of actual art, but doesn't take the same skill level, time, and passion for the craft for a user to be able to generate an output, and certainly generates large profits for those that created the models.

          So, imagine the scenario where you, an artist, trained for years to develop a specific technique and style, only for a massively funded company to swoop in, train a model on your art, make bank off of your skill while you get nothing, and now some rando can also create look-alikes (and also potentially profit from them - I've seen AI-generated images for sale at physical print stores and Etsy that mimic art styles of modern artists), potentially destroying your livelihood. Very little to be happy about here, to be frank.

          It's less about competition and more about the ethical way to do it. If another artist would learn the same techniques and then managed to produce similar art, do you think there would be just as visceral of a reaction to them publishing their art? Likely not, because it still required skill to achieve what they did. Someone with a model and a prompt is nowhere near that same skill level, yet they now get to reap the benefits of the artist's developed craft. Is this "gatekeeping what's art"? I don't think so. Is this fair in any capacity? I don't think so either. Because we're comparing apples to pinecones.

          All that being said, I do agree that the ship has sailed - the models are there, the trend of training on art AND written content shared openly will continue, and we're yet to see what the consequences of that will be. Their presence certainly won't stop me from continuously writing, perfecting my craft, and sharing it with the world. My job is to help others with it.

          My hunch is that in the near-term we'll see a major devaluing of both written and image material, while a premium will be put on exceptional human skill. That is, would you pay to read a blog post written and thoroughly researched by Molly White (https://mastodon.social/@molly0xfff@hachyderm.io) or Cory Doctorow (https://pluralistic.net/), or some AI slop generated by an automated aggregator? My hunch is you'd pick the former. I know I would. As an anecdotal data point, and speaking just for myself, if I see now that someone uses AI-generated images in their blog post or site, I almost instantly close the tab. Same applies to videos on YouTube that have an AI-generated thumbnail or static art. It somehow carries a very negative connotation to me.

          • AnthonyMouse1 day ago
            > It's less about competition and more about the ethical way to do it. If another artist would learn the same techniques and then managed to produce similar art, do you think there would be just as visceral of a reaction to them publishing their art? Likely not, because it still required skill to achieve what they did.

            Now suppose that the other artist studies to learn the techniques -- several of them do -- and then Adobe offers them each two cents and a french fry to train a model on it, which many accept because the alternative is that the model exists anyway and they don't even get the french fry. Is this more ethical somehow? Even if you declined the pittance, you still have to compete with the model. Even if you accept it, it's only a pittance, and you still have to compete with the model. It hasn't improved your situation whatsoever.

            > My hunch is that in the near-term we'll see a major devaluing of both written and image material, while a premium will be put on exceptional human skill.

            AI slop is in the nature of "80% as good for 20% of the price" except that it's more like 40% as good for 0.0001% of the price. What that's going to do is put any artists below the 40th percentile out of work, make it a lot harder for the ones at the 60th percentile and hardly affect the ones at the 99th percentile at all.

            But the other thing it's going to do is cause there to be more "art". A lot of the sites with AI-generated images on them haven't replaced a paid artist, they've replaced a site without images on it. Which isn't necessarily a bad thing.

          • CamperBob21 day ago
            AI-generated "art" (it's not art at all in my eyes) is effectively a machine-based reproduction of actual art, but doesn't take the same skill level, time, and passion for the craft for a user to be able to generate an output, and certainly generates large profits for those that created the models.

            (Shrug) Artists were wrong when they said the same thing about cameras at the dawn of photography, and they're wrong now.

            If you expect to coast through life while everything around you stays the same, neither art nor technology is a great career choice.

            • throwaway23711 day ago
              There is no great career choice when AI can do most intellectual work for a fraction of the cost.
    • panarky2 days ago
      > how to segment communities locally

      So it's not about owning vs. renting property on the internet, it's about controlling the roads that connect the properties so you can keep the world out of your community.

      • NitpickLawyer2 days ago
        Ha, I have the same feeling on the recent good_social_network vs. bad_social_network debate that kinda goes on in the US. Looking from outside, it always felt that the main problem is control, and wanting more of it. The details, principles and "politics" don't matter in the grand scheme of things, it's control that people want, even tho they paint it differently.

        bad_social_network was good 10 years ago, because it was controlled by "a friend of ours". Now it's controlled by someone who's perceived as "a friend of theirs" and it's therefore bad. So the politik aktivists move to good_social_network, and rave about the good there. Echo chambers be damned, we have control. Until the next "friend of theirs" buys it out, and rinse and repeat. So silly.

        • frabcus2 days ago
          One of those social networks has a protocol and lets end users make their own feed algorithm and moderation system.

          The other never has.

          There can be technical differences between networks as well as social.

    • cxr1 day ago
      It'd be great if you folks would stop showing up and derailing the comments with threads like this.
  • xyzzy_plugh2 days ago
    This all sounds like renting to me. Instead you should:

      - put a rack in your home
      - buy an IPv4 block
      - buy dark fibre
      - start your own ISP
      - advertise routes over BGP
      - host your own email
      - found a registrar and transfer your domains over
    
    All easily obtainable for less than a million dollars in capital. Though once your FTTH and undersea cable operations ramp up you'll need further access to capital.
    • greyface-2 days ago
      You're still renting your ASN and IP space from ARIN/RIPE/etc, and they'll be revoked if you stop paying yearly dues. For true independence, you need to establish a base in international waters and establish a new RIR in accordance with ICANN ICP-2. https://www.icann.org/resources/pages/new-rirs-criteria-2012...
    • dend2 days ago
      You forgot the steps of pressing your own silicon wafers in the garage, followed by soldering motherboards from zero, and of course, having a steel mill to build racks and computer cases. The only way to ensure true independence.

      In all seriousness, I am a big advocate for hosting your own infrastructure as much as you can, but that requires you to be REALLY into doing it. Otherwise, it's just a completely unnecessary chore.

      • xyzzy_plugh2 days ago
        No, that comes far later, after energy self-sufficiency, at which point your operational expenses can be minimized. Then we can start talking about going fully vertical to reduce our supply chain dependencies.

        If you're concerned about silicon dependencies, I'd recommend starting with some real estate purchases (mines) and forming a Special Economic Zone first, though.

        Remember this is about being an owner and not a renter. The only restriction is our recurring costs, but we have unlimited room upfront, like a Lincoln.

      • TeMPOraL2 days ago
        Focusing the wrong problem. You'll also need to own everything that being a part of a nation state provides you, including markets and defense. If you can't do that, you can't live truly rent-free.
      • softwaredoug2 days ago
        Don’t forget creating a society capable of supporting industry and science. On a planet producing enough of the raw materials. In a Universe with appropriate fundamental physical constants etc etc.
        • The_Blade2 days ago
          If you wish to make an apple pie from scratch, you must first invent the universe.
    • richardw2 days ago
      The second you have a domain you can move it where you want. Change the software, look and feel. You keep subscribers. You don’t need to own every atom, just enough to give you options.

      You can’t move your X or FB account. They can block you anytime, or reduce traffic. Way fewer options.

      • jasonjayr2 days ago
        Recent story regarding itch.io: https://news.ycombinator.com/item?id=42363727

        Just owning your own domain, minding your own business, doesn't guarantee that it won't be taken down on a whim.

        • richardw2 days ago
          One of the top comments on BSky shows they got it back up fairly quickly. Try that when running on Google, say. If your business is on YouTube and they decide you’re dead, you’re dead.

          “But also, as someone who relies on your site, thank you so much for handling this in such a quick and effective way. Waking up and seeing that the site is already back up despite this all makes me proud and grateful to be on itchio.”

          Point is that you can choose your own adventure in a way that eclipses fully relying on another company. Beyond that, you can choose to own as many layers as you want and stop long before building your own hardware and fiber network.

        • dend2 days ago
          Not impossible, but the likelihood of a domain being taken away on a whim compared to a social network deciding to "shadowban" or just disable an account on their network is significantly lower. Also, with a domain you might have a legal recourse to pursue with the registrar/registry. With a social network account? You're on someone else's turf, and they can do what they want.
      • paxys2 days ago
        Having control over a domain name doesn't mean all that much when the data hosted on it can be hijacked or held hostage at any point.
        • dend2 days ago
          It offers significantly more flexibility and freedom compared to any social network. If your data is "hijacked" (not sure that that means in this context, but let's assume the host terminated your account), you can spin up another hosting account on one of the many hosting providers and point your domain to it. That's it (not to say that it's that trivial for large sites, but that's the gist of it).

          If your account on a major social network is terminated, if you had a large community there, you have quite literally no way to access them unless you had some kind of parallel presence somewhere else.

        • richardw2 days ago
          “Just enough to give you options” means you choose how much you need to own, and once you have the domain you can choose the rest. Back your data up, choose tech you can move. The point is that you don’t need to buy every single piece of infrastructure. Compare any of this to ruling your business purely on any platform whose domain you don’t own.
    • chilli_axe2 days ago
      To truly self-host anything, you must first invent the universe.
    • Suppafly2 days ago
      I really wouldn't mind doing steps 1-3 & 6-7 if it were easier. I don't really care about being an ISP though.
    • immibis2 days ago
      Don't let the perfect be the enemy of the good.

      There are also Tor hidden services, where the dependency on lower layers still exists but they can't find which is yours.

    • greenavocado2 days ago
      It's all fun and games until you need to deal with the right of way
      • chii2 days ago
        the way to deal with the right of way is "might makes right". As it always has.
      • dylan6042 days ago
        or boats with dragging anchors
    • 1 day ago
      undefined
    • naming_the_user2 days ago
      Step 1: Simply create your own Starlink.
      • sdwr2 days ago
        To host a raspberry pi from scratch, you must first create the internet
      • thfuran2 days ago
        Step 1 is always "Create the universe".
        • drewcoo2 days ago
          Alice: I have an urge.

          Bob: Oh yeah? I have a demiurge!

          Alice: I feel we're not really communicating.

      • nayuki2 days ago
        You need to constantly launch new satellites as old ones undergo orbital decay and run out of fuel.
        • sneak1 day ago
          To do this you need a cozy relationship with the satellite launch regulator in your country of operation.
    • RicoElectrico1 day ago
      The difference is that if you're not in a walled garden, all the elements of the stack could be replaced. Obviously one needs to avoid big3 clouds and adversarial domain registrars who make it difficult to transfer domains.
  • dusted2 days ago
    So this is a blog stating to be an owner and not a renter, and then proceeds to talk about how to rent hosting.. Sorry, but if it's not your hardware, on your property, then you're renting.

    Regulations have been waay too lose on, especially, american ISPs where I understand they are allowed to not only refuse you a public routable IP but also dictate what kind of traffic you're allowed to send and receive (for example, whether the traffic flowing is of "commercial" character and therefore should be on on a different subscription), this insanity should be illegal. Internet is a utility, and everyone should have the right to the same type of access, regardless of their need (those who do not need/want, can simply chose not to use it, but ISPs should not be allowed to differentiate).

    I've hosted my own web, and other servers on my own hardware since I was 13 years old, when I bought my first domain, I had to use a fax machine for the first time in my life, and fax my request form, along with my passport, to the agency responsible for the top level domain of my country. It was kind of convoluted back then, but everyone were helpful, and it was not that difficult, the technology was well understood, supporters were competent, and it was expected that people were going to use the internet for internet things. Today is my 39th birthday, and while the server hosting my stuff is mostly still located 3 meters from me, the path to having it online has nothing but degenerated, it's an uphill battle just to be on the internet these days.. The mail stuff is the easier part (dkim, dmarc, spf, certificates).. But the simple act of getting your f..king computer connected to the f..king internet like it was 1999, that's the real hassle.. ISP NAT, supporters beyond incompetent, blocked ports, missing (or unknown) relay hosts.. It's a joke.

    • dend2 days ago
      I generally agree with the sentiment (and yes, internet should be treated as an utility). However, the reality is that the vast majority of people will not be able to self-host on their own hardware for a myriad of reasons (lack of skill, lack of money, lack of interest, etc.) That's not a reason to gatekeep them from having their own corner and claim it as theirs.

      If you have a domain and your own site, even hosted on a colocated rack or in the cloud, you're already miles ahead of those that don't. And if you have a domain and can manage DNS records, then in the future that doesn't preclude you from "graduating" to your own hardware, if you so desire. The goal here is more or less self-sufficiency with web properties rather than a pure interpretation of "rent" vs. "own." Because at some point you have to rent something from someone (say, you're not running your own domain registry and registrar).

      • dusted2 days ago
        I don't want to gatekeep, I want to gate-unkeep! The way things are going, we're divinding the people and the companies into two classes, with the former having fewer rights and privileges than the latter. I want everyone to have the RIGHT to participate in the Internet, should they have the interest to learn how to do it. That right is under pressure when we accept this division, when we use the excuse that "most people don't know how to", to justify taking away everyones right to even try.

        If only companies have the right to participate on the internet, they are empowered even more to chose who should be allowed to even run a website.. It's a slippery slope that ends up in a very bad place, participation wise. It becomes like the airline industry, where the companies pushing hardest for more regulation and red-tape are the oldest, those who made their fortunes back when it was easier and cheaper, and who now use their enourmous wealth to make it harder for new players to enter their market.

        It's the same everywhere, when you start allowing power to concentrate.

      • BlueTemplar1 day ago
        There is a simple way though : have the ISPs provide all of this. If they can provide you a personal website, an email account and a NAS, they can also provide you a a personal website and an email account ON that NAS. (Especially now, with IPv6.)

        (Which of course assumes that there are laws in place against lock-in, just like there are already laws in place against lock-in for your pick of ISPs and obligations for mobile carriers to transfer your phone number to another carrier.)

        • dusted1 day ago
          I think this only shifts the problem, the whole idea with the internet is a distributed network of computers that talk with each other, and if the computers at the edge (end users) can't do that, then it's no longer the internet, it's something else, more akin to cable-tv where there are "providers" and "consumers". The playing field stops being level.
          • BlueTemplar1 day ago
            Well, yes, I am specifically calling for ISPs to build this edge infrastructure in people's homes, so I don't understand your point ?
            • dusted1 day ago
              Ah, I took it as you suggesting the ISPs providing VPS services for people..

              Thing is, that edge infrastructure has been there from the beginning of broadband and is only recently beginning to slip away, with the advent of ISP NAT, agressive IP rotations, blocking of ports and not providing public IPs at all.

        • sneak1 day ago
          ISPs don’t want lots of customers sending lots of data. Their model is based on millions of dumb consumers downloading (only) from the same 20 ASes.
    • IMSAI80801 day ago
      I think the sentiment was to keep your content (and audience) portable, not specifically that you don't rely on anyone else's services. If you post everything on Twitter and Twitter decides they don't like you, then that's the end of you. If you host on a personal domain and your rented web host decides to block you, there's plenty more options and you can take your audience with you and they will never know the difference.
      • dusted1 day ago
        I know it was, but the click-baity headline makes it seem like you're becoming independent of the whims of private companies, which you are very much not if you're renting a host somewhere. You're definitely not a property owner, you might be a domain owner (renter still, I don't know of any domains which you can pay for and keep forever).
        • TheCoelacanth1 day ago
          As long as the services you're renting are commoditizated, you are independent of their whims.

          You can easily replace a VPS provider with a different provider that will give you exactly the same service. You can't replace Facebook with a different Facebook.

    • franga20001 day ago
      There's a huge difference between being a renter on a VPS or on a social network platform.

      Youtube can demonetize or delete a channel and the creator is more or less fucked. They can find another platform, but they need to build their audience almost from scratch.

      By contrast, if my VPS provider kicks me out, I just clone it or restore from backup to any one of the thousands of competing providers, change a few DNS records and my audience (not that I have one) wouldn't even know that anything changed.

      Servers and domain names are transferable and neutral, platforms and usernames aren't.

    • iamacyborg2 days ago
      I started to self host my own personal website last year and found it relatively easy. I may have been somewhat lucky with my ISP and that I was already paying for a static IP but the hardest part was setting up cloudflare to mask my IP (and learning how to setup a Linux VM from scratch).
    • sneak1 day ago
      Hosting is a service, not a product.
  • muglug2 days ago
    This is good advice for people whose livelihood depends on the attention of anonymous strangers. For everyone else, it’s probably ok to ignore.
    • pcloadletter_2 days ago
      It's also just kind of cool and fun to hack together a personal website
      • dend2 days ago
        Totally. If I would tell 10-year-old me that I have my own website on my own domain, it would be seen as a shocking development. I find it really cool to be able to have a corner of the internet that is just mine.
        • arminiusreturns2 days ago
          I still get a rush with a new server on the internet!!! (especially my latest host: 12 core new gen epyc instances with 48G RAM.)
        • hackernewds2 days ago
          This is practically everyone's Facebook Instagram Whatsapp Airbnb anything at all profile already
      • anarwhal1 day ago
        Yes, and we're on _Hacker_ News after all - doing this just for fun should be sufficient reason in and of itself
      • Valord1 day ago
        Had the itch. Did it. No regrets
    • dend2 days ago
      You don't even need attention - it could be a public blog where you share about the things you learn. You never know who it's going to help. That's primarily my motivation with the blog.
    • dni02 days ago
      Indeed. If you only use social media to connect with friends and family... good luck getting them to visit literally any other website.
  • easterncalculus2 days ago
    In five letters: POSSE https://indieweb.org/POSSE
    • dend2 days ago
      Spot-on - I call it out at the end of the article. Principles are broadly the same.
  • indigodaddy2 days ago
    What this guy doesn't really get is that people don't want their own websites. What the hell would they do with a website? They love social media platforms. The why or where of how we got here doesn't really matter, it's the reality.

    I think the only way forward is better and less evil social networks, or perhaps some sort of cyclical sea change where the Internet sort of starts over again, ala how civilizations/humanity have every XX thousands of years according to various esoteric Sikh/Hindu traditions.

    • dend2 days ago
      That's a fair call-out, but that's why in the post I mention this:

      > Most people are perfectly content with everything living inside their Facebook account because it’s convenient and their family and friends are already there. Telling everyone to learn how git and GitHub Pages work to host their blog is not an effective way to drive change. But that’s also not the point. As I mentioned earlier, the goal is to start with a small niche community of people who are comfortable with building their own digital corners.

      Not everyone needs to have their website. Not everybody wants to have their website. And that's fine. They can use the social platforms as-is. But for those that have the means and interest in building their own corners (say, they want to bootstrap a business), they should not limit themselves to the social networks as the only place for their community. There is a better way than be a sharecropper on someone else's land.

      • BlueTemplar1 day ago
        What I don't understand is why you would think that GitHub Pages is an acceptable alternative : it's kind of the equivalent of being a doctor and recommending Oxycodone to cure hangover for someone of alcoholic tendencies...
        • Pooge1 day ago
          Because you can get your own domain in front of it and export your content (i.e. Markdown). Users would land on your domain name, but it doesn't really matter if it's hosted on your PC or GitHub. The point is that you would be able to change providers very easily without having to make people migrate to your new platform.
          • BlueTemplar1 day ago
            And it doesn't try to suck you into the Microsoft "ecosystem" ??
  • Toutouxc2 days ago
    Oh what a delight, the cartoon animal character in the infoboxes is “Krteček” [0], a popular (and quite old) character from my country.

    The website says that the guy is from Washington, but his name does sound vaguely Slavic. Interesting.

    [0] https://en.wikipedia.org/wiki/The_Little_Mole

    • cenamus2 days ago
      Delimarsky sounds more than just a bit Czech/Slovak :)
      • dend2 days ago
        Hah! Well that certainly is a good directional guess! I am of Eastern European descent.

        Also, big fan of Krtek - grew up on that cartoon, was quite an educational experience at that age.

        • cenamus1 day ago
          Ah yes, with transliteration that assumption goes out of the window haha
  • rednafi2 days ago
    Started my blog[1] around five years ago and never looked back. Before that, I got burned by Medium and decided: never again. Maintaining your own little corner on the internet is easier than ever, though I agree it still requires some technical know-how. Also, I’m not sure how much I trust these WYSIWYG blog engines like Squarespace or Weebly.

    [1]: https://rednafi.com

    • dend2 days ago
      Great resource, thank you for sharing! I should include it in the blog post footnotes.
  • xenodium2 days ago
    > the vibrant ecosystem of blogs, feeds, personal sites, and forums has been usurped by a few mega-concentrated players.

    I’m trying to do my bit for the web at https://lmno.lol

    Started a blogging service that doesn’t do things like the big players.

  • cyberax2 days ago
    Other notes: use projects that are either supported by a community, or have a motivated developer (buy their a Patreon sponsorship if you can).
    • dend2 days ago
      Absolutely. I am a big fan of supporting independent software and bootstrapped services.
  • veltas2 days ago
    This article recommends .eu domains.

    There is a caveat to this, in the unlikely event your state leaves the EU you will be forced off, this happened to many UK entities after Brexit, they were forced to stop using their .eu domains.

    • wiether1 day ago
      Furthermore, as a French person, and thus a European citizen, I must say that the .eu TLD is not popular and usually bears a _political_ meaning.

      Like it's used by institutions or organizations linked to the EU by their activity, but it's rarely used by companies or individuals whose activity is not focussed on the EU.

      Companies are still going to buy the .eu domains associated with their brands, but they will communicate with another TLD like .com or will provide located versions of their site under a country TLD like .fr for the French version and .de for the German one.

      What I see the most is:

      - country TLD for content that is located

      - .com/.net or weird (like .dev) for content that is in English or in multiple languages

    • dend2 days ago
      Fair point - I will add that as a clarification, thank you for calling it out.
  • drewcoo2 days ago
    This used to be a forum for startup-folk, not wanna-be-FAANG-employees.

    There's a lot of sense in this post. There's not a lot of sense in the reaction here.

    • scarface_742 days ago
      What do you think the goal of “startup folks” were if not to get acquired by FAANG or other BigTech adjacent companies?

      Out of the literally thousands of companies that YC has invested in, only about a dozen have gone public.

      YC is not interested in “lifestyle companies” that are a profitable ongoing concern it is interested in the “exit”

      • moron4hire2 days ago
        Yeah, no, that's definitely not the story YC sells though.
    • pembrook2 days ago
      An accurate rebrand could be renaming this place from “hackernews” to “middle manager IT dad news.”
  • econ2 days ago
    We shouldn't complaint as it is all our fault. (haha)

    I think OPML is the technology. Go build one and share it. Writing or recording your own stuff is a lot of work. Help promote all the cool things you've found outside the walled garden of plastic plants.

    Build a website for someone, teach them html for 20 minutes and set up a domain, hosting and an ftp client for them. They can always call you if they get stuck.

    • dend2 days ago
      That's the strategy I have with this blog. Instead of sitting and lamenting over the state of the internet, it's better to spread the word and encourage action. There are actual things we can do to reclaim parts of the web that died down. It doesn't need to be at the same scale as all the social networks out there, but even if it serves a small niche, that's a fantastic achievement.
      • econ1 day ago
        Domain names and hosting are really more of the same garbage.

        People should be able to host things on their computer and without having to learn anything. Friends should archive and mirror it and without having to check boxes.

        Only with file sharing the establishment went to war. Maybe ipfs will grow to that level.

  • sshine2 days ago
    tl;dr:

      - Don't depend on other people's software services.
      - Buy a domain and host your own website.
      - Don't pick a sketchy TLD or registrar.
      - Mailing lists beat social media accounts.
      - It's okay to depend on a cloud.
    
    I had the belief that the article was going to say the exact opposite wrt. cloud hosting. You're literally renting space, and if your stuff gets any heat, your cloud provider may simply shut you down without a trial.

    Even if you host your own server on your own legal property, most people don't have AS-numbers and peering agreements, so ultimately on the internet most people rent something.

    • Retr0id2 days ago
      My interpretation of their words on the cloud front was more like, don't depend on or become locked into any specific service. If you're using Azure to host a VPS, you could easily move it to AWS or, Hetzner, etc.

      Likewise, if you're using Cloudflare as a CDN (and only as a CDN!) there are other CDN providers available that you could switch to with relative ease.

    • andrewon2 days ago
      > - Don't depend on other people's software services. > - Buy a domain and host your own website.

      I had the exact chain of thought, only to find that traffic of the site I built is at the mercy of how Google decided to rank webpages, and putting AI > youtube > Reddit in front of everything else.

      > - Mailing lists beat social media accounts.

      Similarly, Google set the metric for what counts as spams. Your emails can all go to the spam folder if their AI decides it should.

      • dend2 days ago
        On (1), yes that is true to an extent - the domain discoverability is indeed mostly at the mercy of Google, and the whole "AI overview" is a garbage experience. However, looking at my own search console data (both Google and Bing), there are still quite a few folks landing on my pages through search, often for some obscure terms that I somehow documented, so it's still possible to get traffic that way. But again - the goal is less about "drive traffic ASAP" but rather point people from other networks to something you personally own.

        On (2), they can, but if you use a more established provider like Buttondown or Mailchimp, and you are not actually sending spam, a lot of folks have quite a bit of success building an audience that way. I've used Buttondown (not affiliated with them in any capacity) personally before and haven't had subscribers complain about deliverability. I am planning on rebooting that this year to see how it goes. I've heard most deliverability issues arise when folks trying to roll out their own email server.

    • ghaff2 days ago
      Ultimately you're on the Internet and you don't own the Internet. At that point, you're making decisions about the level of control you want to have and the types of events you may be subject to and the answer is almost certainly "it depends."
      • dend2 days ago
        Exactly this. At some point, you need to delegate. Make it portable, don't get locked into a proprietary architecture, and you'll be good. Not everyone will be able (or has any desire) to run their own rack.
    • lylejantzi3rd2 days ago
      I expected him to mention colocation. Kids these days. shakes his cane.
      • rmoriz2 days ago
        A homelab with an useable uplink can be sufficient for many services like blog,DNS, mail. I have 3 Lenovo ThinkCentre Mini PCs running Proxmox VE in HA mode off my basement. Picture at https://devops.science/
        • sobkas2 days ago
          > A homelab with an useable uplink can be sufficient for many services like blog, DNS, mail.

          I always felt like you are painting target on your homelab when you allow outside access.

          • rmoriz2 days ago
            You are. I'm tunneling a /23 which I let Vultr announce via BGP over WireGuard to a local router VM. I have a nftables firewall in place before routing the traffic through the tunnel. I block everything except for exposed IPs and ports/protocols just to keep my limited bandwidth free of noise.
          • dend2 days ago
            You do. That's why I wouldn't recommend it to anyone unless they absolutely know what they're doing. Can't tell you how many friends I had to have a talk with who had plain vanilla port forwarding done on their home router, exposing their entire home network to the web.

            Nowadays, I recommend them use Tailscale as an out-of-the-box Wireguard-based VPN to safely connect to their home servers from remote locations.

    • 2 days ago
      undefined
    • immibis2 days ago
      The insidious part of networking is that you cannot be on a network without agreeing with everyone else on the network. It's simply not possible.
  • lxgr1 day ago
    > Social media accounts that just post out-of-context links are of no interest to me and vast majority of people you probably care about. If I see an account do just that, it’s an instant unfollow. If I wanted just links I’d subscribe to a RSS feed.

    I strongly disagree. I am subscribed to several "link farm" accounts specifically because I don't use RSS anymore, and because it allows for an immediate discussion if enough people are subscribed to said account.

  • xixixao2 days ago
    Good advice, and pretty standard setup I’d say, even for companies that are trying to build a community.

    The one thing I wonder about is whether younger generations will use mailing lists. I never did and I’m already in my mid-30s.

    • dend2 days ago
      Shockingly, they do! Quite a few folks that I've talked to recently expressed that they are subscribed to more than one email newsletter and read them fairly consistently.
  • anarwhal2 days ago
    One other way to look at some of this (especially with the other comments around whether anything is "truly" owned) is in terms of "redundancy". TFA touches on some of the control/portability side but the thought experiment here is something like: how many people am I depending on? If you owned den.dev and den.xyz as mirrors on separate providers that's one less failure point where someone else has the power to disappear you (etc, all the way through to redundant underwater sea cables, I guess!)
  • psychoslave2 days ago
    Well, we literally can’t buy a domain name only rent it, and owning a TLD or possess a static numerical address is not an option that is accessible to mere mortal either.
    • seanw4442 days ago
      Would be nice if something like Namecoin could gain traction.
  • paweladamczuk2 days ago
    > e-mail is a universal protocol

    I wonder how long that will be true, considering how difficult hosting your own server for your own domain is these days.

  • BSOhealth2 days ago
    I have vr.dev and want to do something meaningful with it, additive to the community, and not just a trash content blog. any ideas? who’s down?
    • gavinhoward2 days ago
      I just beat a video game for the first time in my life. It was such a confidence boost and exactly what I needed because I bought the headset to fight depression (exercise).

      I realized that VR, as a medium, is uniquely tuned to people, and I think you would have a great blog if you focused on just that: people.

      What are some ideas for how can VR help people? Can it help people fight depression? Can it be used for physical or emotional therapy? Can it help them safely build skills that could improve their lives?

      On that last one, I am reading a book about ship handling. It was a Christmas present because I will never be a captain in my life. But ship handling would work so well in VR. Even the commands are so standardized that players could give voice commands, a real bridge.

      How many Make-a-Wish kids or others could have a wonderful time doing that? And that isn't even considering the idea of extending that game into space with Star Trek USS Enterprise or Star Wars Star Destroyers.

      Your site could be the nexus of those great ideas. You could even have guest posts. I'd write one.

      Anyway, sorry for the novel. VR has helped me feel better than I have in a long time.

      • BSOhealth1 day ago
        I totally agree with basically all that. I’m at an age where video games quickly bore me (totally respecting people who enjoy them), but I also have had some profound VR experiences that make me believe there is a moral imperative to create experiences for people to see what life is like in other peoples’ shoes, and travel to places they otherwise couldn’t, and to learn new things and concepts in ways we couldn’t before.

        Your comment made me think something like, “People of VR”

      • Thanks for sharing this. Really neat. Glad you are feeling better than you have been!

        > a book about ship handling

        Mind sharing the title? I'm very curious.

    • liontwist2 days ago
      Your question is kind of backwards, as an interesting website idea is more of a project than a good domain name. But here goes:

      Document every VR headset released with technical details and references

    • vonunov2 days ago
      slap up a mediawiki and let people contribute VR dev documentation
    • dijit2 days ago
      sounds like it would be an amazing domain for a forum related to development of VR projects.
  • fitsumbelay2 days ago
    this should be standard for every human being not just techies

    the LOE to self hosting and adding infra on demand should also be push button easy

    the good news is that it seems to get easier and cheaper as time passes, which makes it feel inevitable but obstacles remain because of corporate business interests of course

  • p3rls10 hours ago
    Eh, no be a renter.

    I'm a property owner on the internet, built the best website in my niche by far (really look it up and compare) but google rewards ugly dogshit wordpresses created by people with no expertise in my niche.

    When you're an owner you have to deal with things like taking another SEO hit for the holidays and have to face the dilemma of whether to fire people now, or when you have no money in two months and no reserves left to maneuver too.

  • Almondsetat2 days ago
    A bit funny to see "owner" and "buying a domain" next to each other
    • rednafi2 days ago
      Yeah, I wish there was a way to buy domains; not just rent them. It’s not like they’re limited in supply. For example, you kind of own your email addresses.
    • dend2 days ago
      Yeah, unfortunately, c'est la vie. Renting a domain is still better than not having it.
  • mtsolitary2 days ago
    Unfortunately it’s literally impossible to own and not rent your domain :(
    • dend2 days ago
      It's true, and yet it's still a better alternative to not having your own domain at all (at least IMO).
  • wavewrangler2 days ago
    Rules to live by…

    Although I swear this was posted on here just recently, was it not?

    • dend2 days ago
      Honestly, while I put this together, I know that the ideas on this are not unique to me by any stretch. It's just another arrow in the bunch.

      I added a few extra reading materials/references at the bottom of the article.

  • bithead1 day ago
    Social media makes me miss USENet
  • chromanoid1 day ago
    Ever heard of cooperatives?

    I think digital coops are the only really feasible way away from platform capitalism.

    e.g. https://www.hostsharing.net/

  • 2 days ago
    undefined
  • johnklos1 day ago
    The author makes some good points, and the overall message is good, but:

    Why would we care so much about being a property owner and not renter if we don't care about whether a hosting company can just turn us off?

    Why would we care so little about privacy that we'd willingly use services that negate privacy and introduce tracking? For instance, the author suggests using Cloudflare, Azure, AWS and more, and none of these aren't abusive. Funnily enough, the Cloudflare hosted images on the page didn't even load for me;)

    Does the author not know that Mastodon is software and not a social media network? Sure, it's a minor thing, but when people use terms incorrectly, it makes me wonder if they really know what they're talking about (people who'd rather fight about whether it's a term "everyone uses" instead of whether it's correct seem to not get this point).

    The author writes (or quotes) somewhat derisively:

    > Well of course it’s better to host your own blog! Also, while you’re at it, put your Mastodon server in a DigitalOcean droplet, throw some Cloudflare CDN in front of it, run your own Raspberry Pi to monitor uptime, and you’re golden! Oh, and don’t forget to also make sure to log into the droplet every once in a while to update the container, do an occasional database migration, and ensure that you check the logs for intrusions.

    This all-or-nothing attempt to make the idea of self-hosting seem ridiculous is by itself ridiculous. Nobody needs to do all that. On the other hand, the audience for what the author is advocating shouldn't have a problem setting up a simple machine or VM instance to host a web site, blog, perhaps even DNS server themselves. All or nothing is silly, so even making this example reduces credibility.

    All in all, I'd love to see people take ownership of their own things on the Internet. Nobody needs to self-host, but people should if they could, and they should ignore people who say it's too complex because most of the too-complex argument is the suggestion that it needs to be much more than it does. A single VM, a single small computer, even a Pi, can host most things.

    But whether people self host or use someone else, choosing where to host matters. Don't use companies that negate the benefits of owning your own things, whether by lock-in or by letting them do all the tracking that Facebook would normally do. Don't use companies that'll disable your account because some idiot wrote a letter. Don't use companies that're so big that you can't talk to a human!

    This article makes for an odd juxtaposition between doing thoughtful things and doing things that negate some or much of that thoughtfulness. I'd have a hard time recommending it to others without qualifications.

  • 2 days ago
    undefined
  • unit1492 days ago
    [dead]
  • [dead]
  • epa2 days ago
    Ctrl+F "Bitcoin", 0 results.
    • Terr_2 days ago
      Good!

      Beyond the many usual critiques, in this context "owning bitcoin" doesn't mean much, barely a step above having a hoard of hidden gold bars.

      While it is indeed more "on the internet" than precious metals, possessing that speculative-asset does not provide any special niche, ownership, or control over the broader, er, cybernetic means of production.

    • cyberax2 days ago
      There are also zero NFTs.

      It's a solid guide, in other words.

    • kinakomochidayo2 days ago
      Why would anybody own an asset with unresolved security budget issues?