417 points | by todsacerdoti3 天前
For residential usage, unless you're in an apartment tower where all your neighbors are software engineers and you're all behind a CGNAT, you can still do a pull here and there for learning and other hobbyist purposes, which for Docker is a marketing expense to encourage uptake in commercial settings.
If you're in an office, you have an employer, and you're using the registry for commercial purposes, you should be paying to help keep your dependencies running. If you don't expect your power plant to give you electricity for free, why would you expect a commercial company to give you containers for free?
My main complaint is:
They built open source tools used all over the tech world. And within those tools they privileged their own container registry, and provided a decade or more of endless and free pulls. Countless other tools and workflows and experiences have been built on that free assumption of availability. Similarly, Linux distros have had built-in package management with free pulling for longer than I’ve been alive. To get that rug-pull for open-source software is deeply disappointing.
Not only that, but the actual software hosted on the platform is other people’s software. Being distributed for free. And now they’re rent-seeking on top of it and limiting access to it.
I assume most offices and large commercial businesses have cached and other tools built into their tools, but for indie developers and small businesses, storage of a ton of binary blobs starts to add up. That’s IF they can even get the blobs the first time, since I imagine they could experience contention and queuing if you use many packages.
And many people use docker who aren’t even really aware of what they’re doing - plenty of people (myself included) have a NAS or similar system with docker-wrapping GUI pre-installed. My NAS doesn’t even give me the opportunity to login to docker hub when pulling packages. It’s effectively broken now if I’m on a CGNAT.
Cannot help but notice that, had Microsoft offered such a sweet deal, this place would've been ablaze with cries of "Embrace, extend, extinguish" and suchlike. (This still regularly happens, e.g., when new Github features are announced). Perhaps even justifiably so, but the community has failed to apply that kind of critical thinking to any other company involved in open source. If your workflow is not agnostic wrt where you pull images from, it is kind of silly to blame it on Docker Inc.
Having said that, it is definitely a problem for many. I work at a technical university and I am sure colleges/research institutes will hit the limit repeatedly and easily.
And I am not saying Docker is wrong to try and monetize. People have built entire business models on top of way more mundane things than the Docker Hub.
That would be more reasonable if they didn't go out of their way to make doing so painful: https://github.com/moby/moby/issues/7203
The reality is, DockerHub (though originally called the Docker Index), was the first Docker image registry to even exist, and it was the only one to exist when image references were created.
Now, I would say there are definitely some issues you could have referenced here that would be more relevant (e.g. mirrors only working for DockerHub).
* https://www.aquasec.com/blog/a-brief-history-of-containers-f...
For some reason it reminded me of the WAC model from WebAssembly component model https://component-model.bytecodealliance.org/creating-and-co... No particular comparison, but I'd like to understand how constructing a container image might compare to constructing a wasm module from components.
Docker wasn't stolen from anything. It built on top of existing things, even worked on new things, and provided a nice abstraction to everything.
> provide something like Dockerfile that's less golang-inspired and more linux-inspired
What? Is the thing that's golang inspired the image references? OK...
The Dockerfile takes from golang IMO, it's intentionally very low on syntax. Just like go's text/template and html/template.
Also note, Docker can build whatever format you want, it just defaults to the Dockerfile format, but you can give it whatever syntax parser you want.
Sure you can’t expect things to be available free forever, except many software package repositories have been available, free, forever. We have countless software package managers (apt, brew, pacman), countless language library managers (npm, maven, CPAN) and countless other tools that have had free and relatively unmetered access for literally a generation.
If anything, “Software X will always be a CLI installation away, free, forever” is an old Linux expectation that’s existed for over 30 years and not some “cloud” mentality.
My startup pays Docker for their registry hosting services, for our private registry. However, some of our production machines are not set up to authenticate towards our account, because they are only running public containers.
Because of this change, we now need to either make sure that every machine is authenticated, or take the risk of a production outage in case we do too many pulls at once.
If we had instead simply mirrored everything into a registry at a big cloud provider, we would never have paid docker a cent for the privilege of having unplanned work foisted upon us.
However, if you are using docker's registry without authentication and you don't want to go through the effort of adding the credentials you already have, you are essentially relying on a free service for production already, which may be pulled any time without prior notice. You are already taking the risk of a production outage. Now it's just formalized that your limit is 10 pulls per IP per hour. I don't really get how this can shift your evaluation from using (and paying for) docker's registry to paying for your own registry. It seems orthogonal to the evaluation itself.
This is by design, according to docker.
I’ve never encountered anyone at any of my employers that wanted to use docker hub for anything other than a one-time download of a base image like Ubuntu or Alpine.
I’ve also never seen a CD deployment that doesn’t repeatedly accidentally pull in a docker hub dependency, and then occasionally have outages because of it.
It’s also a massive security hole.
Fork it.
I have a vague memory of reading something to that effect on their bug tracker, but I always thought the reasoning was ok. IIRC it was something to the effect that the goal was to keep things simple for first time users. I think that's disservice to users, because you end up with many refusing to learn how things actually work, but I get the sentiment.
> I’ve also never seen a CD deployment that doesn’t repeatedly accidentally pull in a docker hub dependency, and then occasionally have outages because of it.
There's a point where developers need to take responsibility for some of those issues. The core systems don't prevent anyone from setting up durable build pipelines. Structure the build like this [1]. Set up a local container registry for any images that are required by the build and pull/push those images into a hosted repo. Use a pull through cache so you aren't pulling the same image over the internet 1000 times.
Basically, gate all registry access through something like Nexus. Don't set up the pull through cache as a mirror on local clients. Use a dedicated host name. I use 'xxcr.io' for my local Nexus and set up subdomains for different pull-through upstreams; 'hub.xxcr.io/ubuntu', 'ghcr.xxcr.io/group/project', etc..
Beyond having control over all the build infrastructure, it's also something that would have been considered good netiquette, at least 15-20 years ago. I'm always surprised to see people shocked that free services disappear when the stats quo seems to be to ignore efficiency as long as the cost of inefficiency is externalized to a free service somewhere.
Same. The “I don’t pay for it, why do I care” attitude is abundant, and it drives me nuts. Don’t bite the hand that feeds you, and make sure, regularly, that you’re not doing that by mistake. Else, you might find the hand biting you back.
This is really not complicated and your not entitled to unlimited anonymous usage of any service.
You would put this as a separate registry and storage from your actual self-hosted registry of explicitly pushed example.com/ images.
It's an extremely common use-case and well-documented if you try to RTFM instead of just throwing your hands in the air before speculating and posting about how hard or impossible this supposedly is.
You could fall back to DNS rewrite and front with your own trusted CA but I don't think that particular approach is generally advisable given how straightforward a pull-through cache is to set up and operate.
All the large objects in the OCI world are identified by their cryptographic hash. When you’re pulling things when building a Dockerfile or preparing to run a container, you are doing one of two things:
a) resolving a name (like ubuntu:latest or whatever)
b) downloading an object, possibly a quite large object, by hash
Part b may recurse in the sense that an object can reference other objects by hash.
In a sensible universe, we would describe the things we want to pull by name, pin hashes via a lock file, and download the objects. And the only part that requires any sort of authentication of the server is the resolution of a name that is not in the lockfile to the corresponding hash.
Of course, the tooling doesn’t work like this, there usually aren’t lockfiles, and there is no effort made AFAICT to allow pulling an object with a known hash without dealing with the almost entirely pointless authentication of the source server.
If you rewrite DNS, you should of course also have a custom CA trusted by your container engine as well as appropriate certificates and host configurations for your registry.
You'll always need to take these steps if you want to go the rewrite-DNS path for isolation from external services because some proprietary tool forces you to use those services.
Artifactory and Nexus are the two I've used for work. Harbor is also popular.
I can't think of the name right now, but there are some cool projects doing a p2p/distributed type of cache on the nodes directly too.
Announcing a new limitation that requires rolling out changes to prod with 1 week notice should absolutely shift your evaluation of whether you should pay for this company's services.
https://www.docker.com/blog/november-2024-updated-plans-anno...
At Docker, our mission is to empower development teams by providing the tools they need to ship secure, high-quality apps — FAST. Over the past few years, we’ve continually added value for our customers, responding to the evolving needs of individual developers and organizations alike. Today, we’re excited to announce significant updates to our Docker subscription plans that will deliver even more value, flexibility, and power to your development workflows.
We’ve listened closely to our community, and the message is clear: Developers want tools that meet their current needs and evolve with new capabilities to meet their future needs.
That’s why we’ve revamped our plans to include access to ALL the tools our most successful customers are leveraging — Docker Desktop, Docker Hub, Docker Build Cloud, Docker Scout, and Testcontainers Cloud. Our new unified suite makes it easier for development teams to access everything they need under one subscription with included consumption for each new product and the ability to add more as they need it. This gives every paid user full access, including consumption-based options, allowing developers to scale resources as their needs evolve. Whether customers are individual developers, members of small teams, or work in large enterprises, the refreshed Docker Personal, Docker Pro, Docker Team, and Docker Business plans ensure developers have the right tools at their fingertips.
These changes increase access to Docker Hub across the board, bring more value into Docker Desktop, and grant access to the additional value and new capabilities we’ve delivered to development teams over the past few years. From Docker Scout’s advanced security and software supply chain insights to Docker Build Cloud’s productivity-generating cloud build capabilities, Docker provides developers with the tools to build, deploy, and verify applications faster and more efficiently.
Sorry, where in this hyped up marketingspeak walloftext does it say "WARNING we are rugging your pulls per IPv4"?
Right at the top of the page it says:
> consumption limits are coming March 1st, 2025.
Then further in the article it says:
> We’re introducing image pull and storage limits for Docker Hub.
Then at the bottom in the summary it says again:
> The Docker Hub plan limits will take effect on March 1, 2025
I think like everyone else is saying here, if you rely on a service for your production environments it is your responsibility to stay up to date on upcoming changes and plan for them appropriately.
If I were using a critical service, paid or otherwise, that said "limits are coming on this date" and it wasn't clear to me what those limits were, I certainly would not sit around waiting to find out. I would proactively investigate and plan for it.
I mean just starting with the title:
> Announcing Upgraded Docker Plans: Simpler, More Value, Better Development and Productivity
Wow great it's simpler, more value, better development and productivity!
Then somewhere in the middle of the 1500-word (!) PR fluff there is a paragraph with bullet points:
> With the rollout of our unified suites, we’re also updating our pricing to reflect the additional value. Here’s what’s changing at a high level:
> • Docker Business pricing stays the same but gains the additional value and features announced today.
> • Docker Personal remains — and will always remain — free. This plan will continue to be improved upon as we work to grant access to a container-first approach to software development for all developers.
> • Docker Pro will increase from $5/month to $9/month and Docker Team prices will increase from $9/user/month to $15/user/mo (annual discounts). Docker Business pricing remains the same.
And at that point if you're still reading this bullet point is coming:
> We’re introducing image pull and storage limits for Docker Hub. This will impact less than 3% of accounts, the highest commercial consumers.
Ah cool I guess we'll need to be careful how much storage we use for images pushed to our private registry on Docker Hub and how much we pull them.
Well it's an utter and complete lie because even non-commercial users are affected.
————
This super long article (1500 words) intentionally buries the lede because they are afraid of a backlash. But you can't reasonably say “I told u so” when you only mentioned in a bullet point somewhere in a PR article that there will be limits that impact the top 3% of commercial users, then 4 months later give a one week notice that images pulls will be capped to 10 pulls per hour LOL.
The least they could do is to introduce random pull failures with an increasing probability rate over time until it finally entirely fails. That's what everyone does with deprecated APIs. Some people are in for a big surprise when a production incident will cause all their images to be pulled again which will cascade in an even bigger failure.
If the PR stuff isn't for you, fine, ignore that. Take notes on the parts that do matter to you, and then validate those in whatever way you need to in order to assure the continuity of your business based on how you rely on Docker Hub.
Simply the phrase "consumption limits" should be a pretty clear indicator that you need to dig into that and find out more, if you rely on Docker in production.
I don't get everyone's refusal here to be responsible for their own shit, like Docker owes you some bespoke explanation or solution, when you are using their free tier.
How you chose to interpret the facts they shared, and what assumptions you made, and if you just sat around waiting for these additional details to come out, is on you.
They also link to an FAQ (to be fair we don't know when that was published or updated) with more of a Q&A format and the same information.
The snippets about rate limiting give the impression that they're going to be at rates that don't affect most normal use. Lots of docker images have 15 layers; doesn't this mean you can't even pull one of these? In effect, there's not really an unauthenticated service at all anymore.
> “But the plans were on display…”
> “On display? I eventually had to go down to the cellar to find them.”
> “That’s the display department.”
> “With a flashlight.”
> “Ah, well, the lights had probably gone.”
> “So had the stairs.”
> “But look, you found the notice, didn’t you?”
> “Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”
I am saying that when change is coming, particularly ambiguous or unclear change like many people feel this is, it's no one's responsibility but yours to make sure your production systems are not negatively affected by the change.
That can mean everything from confirming data with the platform vendor, to changing platforms if you can't get the assurances you need.
Y'all seem to be fixated on complaining about Docker's motives and behaviour, but none of that fixes a production system that's built on the assumption that these changes aren't happening.
Somebody's going to have the same excuse when Google graveyards GCP. Till this change, was it obvious to anyone that you had to audit every PR fluff piece for major changes to the way Docker does business?
You seem(?) to be assuming this PR piece, that first announced the change back in Sept 2024, is the only communication they put out until this latest one?
That's not an assumption I would make, but to each their own.
This isn't exactly the same lesson, but I swore off Docker and friends ages ago, and I'm a bit allergic to all not-in-house dependencies for reasons like this. They always cost more than you think, so I like to think carefully before adopting them.
“Oh yes, well as soon as I heard I went straight round to see them, yesterday afternoon. You hadn’t exactly gone out of your way to call attention to them, had you? I mean, like actually telling anybody or anything.”
“But the plans were on display …”
“On display? I eventually had to go down to the cellar to find them.”
“That’s the display department.”
“With a flashlight.”
“Ah, well the lights had probably gone.”
“So had the stairs.”
“But look, you found the notice didn’t you?”
“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’.”
No kidding. Clashes with the “gotta hustle always” culture, I guess.
Or it means that they can’t hide their four full-time jobs from each of the four employers as easily while they fix this at all four places at the same time.
The “I am owed free services” mentality needs to be shot in the face at close range.
https://web.archive.org/web/20241213195423/https://docs.dock...
Here's the January 21st 2025 copy that includes the 10/HR limit.
https://web.archive.org/web/20250122190034/https://docs.dock...
The Pricing FAQ goes back further to December 12th 2024 and includes the 10/HR limit.
https://web.archive.org/web/20241212102929/https://www.docke...
I haven't gone through my emails, but I assume there was email communication somewhere along the way. It's safe to assume there's been a good 2-3 months of communication, though it may not have been as granular or targeted as some would have liked.
Announced in September 2024: https://www.docker.com/blog/november-2024-updated-plans-anno...
At least 6 months of notice.
I have a blog, do I have to give my readers notice before I turn off the service because I can't afford the next hosting charge?
Isn't this almost exclusively going to effect engineers? Isn't it more of the engineer's responsibility not to allow their mission critical software to have such a fragile signal point of failure?
> Probably because they don't want people to have enough notice to force their hand.
He says without evidence, assuming bad faith.
If you're production service is relying on a free-tier someone else provides, you must have some business continuity built in. These are not philanthropic organisations.
Not an acceptable interaction. This will be the end of Docker Hub if they don't walk back.
Docker doesn't know how to monetize.
And the exact time you have some production emergency is probably the exact time you have a lot of containers being pulled as every node rolls forward/back rapidly...
And then docker.io rate limits you and suddenly your 10 minute outage becomes a 1 hour outage whilst someone plays a wild goose chase trying to track down every docker hub reference and point it at some local mirror/cache.
And yes, you’re still using the free tier even if you pay them, if your usage doesn’t have any connection to your paid account.
I just don’t build my environment to rely on unstable companies.
That’s sort of the comedy of second order effects as by reducing the amount of free stuff, I think Docker will end up reducing their paid customers.
Indeed, you’d be paying the big cloud provider instead, most likely more than you pay today. Go figure.
It's not fair, people shout. Neither are second homes when people don't even have their first but that doesn't seem to be a popular opinion on here.
It's busy-work that provides no business benefit, but-for our supplier's problems.
> specific outbound IP addresses that they can then whitelist
And then we have an on-going burden of making sure the list is kept up to date. Too risky, IMO.
I dunno, if I were paying for a particular quality-of-service I'd want my requests authenticated so I can make claims if that QoS is breached. Relying on public pulls negates that.
Making sure you can hold your suppliers to contract terms is basic due diligence.
> We’re introducing image pull and storage limits for Docker Hub. This will impact less than 3% of accounts, the highest commercial consumers. For many of our Docker Team and Docker Business customers with Service Accounts, the new higher image pull limits will eliminate previously incurred fees.
this isn't a counterpoint is rewrapping the same point: free services for commercial enterprise is a counterproductive business plan
You would have had to authenticate to access that repo as well.
https://aws.amazon.com/ecr/pricing/?nc1=h_ls
> For unauthenticated customers, Amazon ECR Public supports up to 500GB of data per month. https://docs.aws.amazon.com/AmazonECR/latest/public/public-s...
I don't see how it's better.
> If we had instead simply mirrored everything into a registry at a big cloud provider, we would never have paid docker a cent for the privilege of having unplanned work foisted upon us.
I mean, if one is unwilling to bother to login to docker on their boxes, is this really even an actual option? Hm.
https://cloud.google.com/artifact-registry/docs/pull-cached-...
I have to say though, 90% of the dockers I use aren't on docker hub anymore. Most of them reside on the github docker repo now (ghcr.io). I don't know where the above playbook pulls from though as it's all automated in ansible.
And really docker is so popular because of its ecosystem. There are many other container management platforms. I think that they are undermining their own value this way. Hobbyists will never pay for docker pulls but they do generate a lot of goodwill as most of us also work in IT. This works the other way around too. If we get frustrated with docker and start finding alternatives it's only a matter of time until we adopt them at work too.
If they have an issue with bandwidth costs they could just use the infrastructure of the many public mirrors available that also host most Linux distros etc. I'm sure they'd be happy to add publicly available dockers.
Ironically, it's the paid rates that are being reduced more (though they don't have hourly limits still, so more flexibility, but the fair use thing might come up), as they were infinite previously, now Pro is 34 pulls/hour (on average, which is less than authenticated), Team is 138 pulls/hour (or 4 times Pro) and Business 1380 pulls/hour (40 times pro, 10 times team).
My feeling this is trying to get more people to create docker accounts, so the upsell can be more targeted.
They're entitled to do what they want and implement any business model they want. They're not entitled to any business, to my data, nor their business model working.
I couldn't give 2 shits whatever Docker does. They're a service, if I wanna use it I'll pay, if not then I'll use something else. Ez pz
I don't use Docker so I genuinely don't know this...
Is the Docker Library built on the back of volunteers which is then used to sell paid subscriptions?
Does this commercial company expect volunteers to give them images for free which give their paid subscriptions value?
Yes, to an extent, because it costs money to store and serve data, no matter what kind of data it is or it's associated IP rights/licensing/ownership. Regardless, this isn't requiring people to buy a subscription or otherwise charging anyone to access the data. It's not even preventing unauthenticated users from accessing the data. It's reducing the rate at which that data can be ingested without ID/Auth to reduce the operational expense of making that data freely (as in money) and publicly available. Given the explosion in traffic (demand) and the ability to make those demands thanks to automation and AI relative to the operational expense of supplying it, rate limiting access to free and public data egress is not in and of itself unreasonable. Especially if those that are responsible for that increased OpEx aren't respecting fair use (legally or conceptually) and even potentially abusing the IP rights/licensing of "images [given] for free" to the "Library built on the back of volunteers".
To what extent that's happening, how relevant it is to docker, and how effective/reasonable Docker's response to it are all perfectly reasonable discussions to have. The entitlement is referring to those that explicitly or implicitly expect or demand such a service should be provided for free.
Note: you mentioned you don't use docker. a single docker pull can easily be 100's of MB's (official psql image is ~150MB for example) or even in some cases over a GB worth of network transfer depending on the image. Additionally, there is no restriction by docker/dockerhub that prevents or discourages people from linking to source code or alternative hosts of the data. Furthermore you don't have to do a pull everytime you wish to use an image, and caching/redistributing them within your LAN/Cluster is easy. Should also be mentioned Docker Hub is more than just a publicly accessible storage endpoint for a specific kind of data, and their subscription services provide more that just hosting/serving that data.
Yes.
> Does this commercial company expect volunteers to give them images for free which give their paid subscriptions value?
Yes.
If you're only looking at Docker Hub as a host of public images, you're only seeing the tip of the iceberg.
Docker Hub subscriptions are primarily for hosting private images, which you can't see from the outside.
IMO, hosting public images with 10 pulls per hour is plenty generous, given how much bandwidth it uses.
So, yeah, they kind of taking advantage of people putting their work on DH to try&sell subs.
But nobody have to put their images on DH. And to be honest, I don't think the discoverability factor is as important on DH that it is on GitHub.
So if people want to pay for they own registry to make it available for free for everyone, it's less an issue than hosting your repo on your own GitLab/Gitea instance.
Someone (maybe the podman folks?) should do what every Linux distribution has done, and set up a network of signed mirrors that can be rsynced.
Debian is 5TB.
Five years ago when Docker changed a storage policy they said it would save 5PB. I can't find the current size of Docker Hub.
That's a huge cost to expect from a free mirror service, especially when a large fraction is of very limited interest, and unlike a Linux distribution Docker Hub isn't organized. (It's easy to only mirror the AMD64 packages for Debian, for example.)
The Docker client also isn't able to work with a partial mirror.
Limits per IPv4 address are really, really annoying. All I can do is flick on a VPN... which likely won't work either
https://www.docker.com/blog/docker-hub-registry-ipv6-support...
I don't know enough about IPv6, is this potentially its own problem?
LetsEncrypt AWS Azure GCP Github Actions
Failing to see how they don't mix well.
You could also, you know, pay Docker for the resources you're using.
If I'm an infrequent tinkerer that occasionally needs docker images, I'm not going to pay a monthly cost to download e.g. 1 image/month that happens to be hosted on Docker's registry.
(It sounds like you can create an account and do authenticated pulls; which is fine and pretty workable for a large subset of my above scenario; I'm just pointing out a reason paying dollars for occasional one-off downloads is unpopular)
The ire is because of the rug pull. (I presume) you know that. It’s predatory behavior to build an entire ecosystem around your free offering (on the backs of OSS developers) then do the good old switcheroo.
I’ve also seen plenty of docker-compose files which pull out this amount of images (typically small images).
I’m not saying that Docker Inc should provide free bandwidth, but let’s not also pretend that this won’t be an issue for a lot of users.
Replace "apartment tower" with "CS department at a university", and you have a relatively common situation.
If Docker explicitly offers a service for free, then users are well within their rights to use it for free. That’s not entitlement, that’s simply accepting an offer as it stands.
Of course, Docker has every right to change their pricing model at any time. But until they do, users are not wrong for expecting to continue using the service as advertised.
I've seen this "sense of entitlement" argument come up before, and to be clear: users expecting a company to honor its own offer isn’t entitlement, it’s just reasonable.
I started using explicit repository names for everything including Docker Hub 5+ years ago and I don't regret it. I haven't thought about mirrors since, and I find it easier to reason about everything. I use pull-through caches with dedicated namespaces for popular upstream registries.
- hub.example.com/ubuntu --> ubuntu from Docker Hub
- ghcr.example.com/org/projectA --> project from GHCR
I tried using mirrors at first, but it was a disaster with the shorthand notation because you can have namespace collisions. Consider: - docker.io/org/projectA (owner 1)
- ghcr.io/org/projectA (owner 2)
What happens below? What do you get? How do you know where the mirror admin is pulling from? - docker pull org/projectA
That only works if you have single source of truth or if you keep a mapping somewhere. Ex: - org/projectA --> docker.io
- org/projectB --> ghcr.io
That's not useful because your definitions are still ambiguous unless you go look at the mappings, so all you've done is add external config vs explicitly declaring the namespace.Plus, you can set up a pull-through cache everywhere it makes sense.
- locationA - hub.example.com = 192.0.2.1
- locationB - hub.example.com = 192.0.2.2
I'd be interested to hear about scenarios where mirrors are more than a workaround for failing to understand the power of Docker's namespacing and defaulting to the shorthand notation for everything.If the power company gave me free energy for 15 years, i would also be pissed. Rightly? No but hey thats not the issue.
Also with docker being the status quo for so long, it does hurt the ecosystem / beginners quite a lot.
Because they did. But you're right—they have no obligation to continue doing so. Now that you mention it, it also reminds me that GitHub has no such obligation either.
In a way, expecting free container images is similar to how we can download packages from non-profit Linux distributions or how those distributions retrieve the kernel tarball from its official website. So, I’m not sure whether it’s better for everyone to start paying Docker Hub for bandwidth individually or for container images to be hosted by a non-profit, supported by donations from those willing to contribute.
10 per hour is slightly lower than 100 per 6 hours, but not in any meaningful way from a bandwidth perspective, especially since image size isn't factored into these rate limits in any way.
If bandwidth is the real concern, why change to a more inconvenient time period for the rate limit rather than just lowering the existing rate limit to 60 per 6 hours?
It’s basically your apartment building example (esp. something like the STEM dorms)
When this stuff breaks in the hours leading up to a homework assignment being due, it’s going to discourage the next generation of engineers from using it.
There is always a commercial subscription. You need only a single $9/mo account and you get 25,000 pulls/month.
And if you are not willing to pay $9/mo, then you should be OK with using free personal account for experiments, or to spread out your experiments over longer timeline.
Bandwidth is cheap, especially at scale, unless you're in one of the large clouds that make a shitload of money gouging their customers on egress fees.
I don't say that Docker Inc should foot the bill for other multibillion dollar companies, but the fact that even after 8 years it still is impossible to use authentication in the registry-mirrors option is mind-boggling.
I think Docker started the bloated image mess. Have you ever seen a project with <100MB in size?
Guess pack everything with gzip isn't a good idea when size matters.
Docker Hub have a traffic problem, so does every intranet image registry. It's slow. The culprit is Docker (and maybe ppl who won't bother to optimize)
Then again, good that this measure forces fixing this bad behaviour, but as a user of buildpack you are not always in the know how to fix it.
Adding auth to pulls is easy. Mirroring images internally is easy. anyone that says otherwise is lazy.
bandwidth is super cheap if you dont use any fancy public cloud services.
They aren't likely able to go for peering arrangements ("free" bandwidth) because their traffic is likely very asymmetric, and that doesn't save the management/storage/compute costs.
I don't know what Docker's financials are, but I can imagine, as a business owner myself, situations where it was lean enough that that sort of cost could mean the difference between running the service and not.
People understand that bandwidth costs money but that seems to have been priced in to their previous strategy or they did it knowingly as a loss leader to gain market share. If they knew this was a fundamental limitation they should have addressed it years ago.
Perhaps they should have started by putting "we will enforce limits soon" in all documentation.. and in a few years, starting enforcement but with pretty high limits? and then slowly dialing limits down over a few years?
That's exactly what they did. I remember setting up the docker proxy 4 years ago when we started getting first "rate limit" errors. And if someone was ignoring the news for all that time.. Well, tough for them, there was definitely enough notice.
But idk maybe Docker shouldn't have pulled a bait-and-switch, which is also classical known as a "dick move".
Of if you want it a little more colorfully: capturing a market with free candy to get us into your van.
Or more accurately, this “free until we need something from you” is the moral equivalent of a free meal or ski trip but you are locked into a room to watch a timeshare marketing deck.
Open Source is built on a gift economy. When you start charging you break the rules of that socioeconomic model. So most of us make tools that we have a hope of keeping running the way they were designed. Some of us stay at the edges because we know we won’t still be interested in this tool in two years and we don’t want to occupy other people’s head space unfairly or dishonestly. Some of us believe we can persevere and then we find there’s a new programming language that we like so much more than this one that we fuck off and leave a vacuum behind us that others have to scramble to fill (I’m talking about you, TJ).
And then there’s these kinds of people who hoover up giant sections of the mindshare using VC money and don’t ever find a new revenue stream, like Mozilla managed. And it’s getting really fucking old.
One of the problems with XML is that the schemas aren’t cached by default, and so a build system or a test harness that scans dozens or hundreds of these files per run, or twenty devs running their sandboxes in a tight loop, can eat up ridiculous amounts of bandwidth. But the important difference is that they architected a way to prevent that, it’s just that it is fiddly to get set up and nobody reads the instructions. I found an email chain with the W3c.org webmaster complaining about it, and myself and a couple other people tried to convince him that they needed to add a 100ms delay to all responses.
My reasoning was that a human loading a single XML file would never notice this increase, but a human running 100s of unit tests definitely would want to know why they suddenly got slower, and doing something about it wouldn’t just get back that extra 20 seconds, they’d get back 40 (or in our case 5-10minutes) by making the call one more time and putting it into a PR. We only noticed we were doing it because our builds stopped working one day when the internet went down.
There’s no build tool I know of that will deal with being 429’d at 10 requests. Especially if you know anything about how layers work. There are tons that would work just fine being traffic shaped to a server with a lower SLA. Dedicate half of your cluster to paying customers and half to the free tier. Or add 250ms delay. It’ll sort itself out. People will install Artifactory or some other caching proxy and you’ll still have the mindshare at a lower cost per use.
Without dockerhub you would have to host your own repository, which would cost money.
This should help people understand a bit better why this feel a bit underhanded. The images are free, and I and many other OSS devs have used docker hub in partnership to provide access to software, often paying for the ability to publish there. In this case, any burden of extra cost was on the producer side.
Turning this into a way to "know" every user and extract some value from them is their prerogative, but it does not feel like it is good faith. It also feels a bit creepy in the sense of "the user is the product".
Part of it is perhaps by definition, “spreading” already assumes success. Still, I’d welcome some regulation; or at least awareness; e.g. a neologism for companies in that stage, growing at cost and only getting ready to develop symptoms.
[1]: The American Dialect Society selected “Enshittification” as its 2023 word of the year, source: https://en.m.wikipedia.org/wiki/Enshittification
One can't rely on library updates being done, thus one has to have a build chain form many images.
Upstream has blocked it. A fork over this one little feature is long overdue.
There's already a widely available way to specify exactly which repo you'd prefer in the docker client...
`docker pull [repo]/[image]:[tag]`
And that address format works across basically all of the tooling.
Changing the defaults doesn't actually make sense, because there's no guarantee that repo1/secure_image === repo2/secure_image.
Seems like a HUGE security issue to default named public images to a repo where the images may not be provided by the same owner.
Giving people the option to configure a default repository via the daemon.json would alleviate that issue, but I'm not sure if that's really enough to fork.
It's just not that hard to go fully qualified.
With these changes, I can imagine “intro to docker” tutorials breaking.
I suspect that’ll be enough to let a fork/competitor gain significant market share.
I doubt it.
Remember, tags are mutable. `latest` today can be different tomorrow.
And it expands to all other tags. Nothing prevents me from pushing a new 'version' of a container to an existing `v1.0.1` tag.
Tags are not a way of uniquely identifying a container.
The top level image hashes tend to change between repos (because the image name changes in the manifest and is included in the hash).
So you'd have to go through an verify each layer sha.
Good tool for selecting an exact image in a repo, not a replacement for trust at the naming level (it's a bit like the difference between owning a domain and cert pinning with hpkp).
Ex - lots of refs are to "multi-arch" images, Except... there's no such thing as a multi-arch image, the entire identifier is just a reference to a manifest that then points to a list of images (or other manifests) by arch, and the actual resolved artifact is a single entry in that list.
But it means the manifest needs to be able to reference and resolve other names, and that means including... names.
For a more concrete example, just check https://github.com/moby/moby/issues/44144#issuecomment-12578...
Basically - the digests weren't intended to support image verification across repos, and the tool doesn't treat them that way. The digest was intended to allow tighter specification than a tag (precisely because a publisher might push a different image to the same tag later).
One way to get around this is to just not use `latest` at all, and only push docker tags that perfectly mirror the corresponding git branches/tags.
So the source of the image can be decided on pull. Some more on this https://www.redhat.com/en/blog/manage-container-registries
It looks like it's ordered by priority.
unqualified-search-registries = ['registry.fedoraproject.org', 'registry.access.redhat.com', 'registry.centos.org', 'docker.io']
So you get a concatenation of all those registries and transient network failures are going to change the behavior. I'll take a pass on that one.(podman is a docker compatible replacement with a number of other nice features besides being able to configure the registry)
that said, you can configure "registry-mirrors" in /etc/docker/daemon.json although it is not the same thing
I can see the use case for base images: they're the canonical, trusted source of the image.
But for apps that are packaged? Not as much. I mean, if I'm using a PaaS, why can't I just upload my Docker image to them, and then they store it off somewhere and deploy it to N nodes? Why do I have to pay (or stay within a free tier) to host the blob? Many PaaS providers I've seen are happy to charge a few more bucks a month just to host Docker images.
I'm not seeing any sort of value added here (and maybe that's the point).
But obviously the real problem is that you're asking the wrong question. We don't "need" a centralized image repository. We WANT one, because the feature that Docker provides that "just use a tarball" doesn't (in addition to general ease-of-use, of course) is authentication, validation and security. And that's valuable, which is why people here are so pissed off that it's being locked behind a paywall.
But given that it has value... sorry folks, someone's got to pay for it. You can duplicate it yourself, but that is obviously an engineering problem with costs.
Just write the check if you're a heavy user. It's an obvious service with an obvious value proposition. It just sucks if it wasn't part of your earlier accounting.
I generally agree with you, but to be fair to the complainers, what sucks is that Docker didn't make it clear up front that it should be part of your accounting. I don't know if they always intended to monetize this way (if so, we'd call that a bait and switch) or if they sincerely had other plans that just didn't pan out, but either way the problem is the same: There's a trend across all of technology of giving your stuff away for free until you become the obvious choice for everything, then suddenly altering the deal and raising prices.
That kind of behavior has in the past been deemed anticompetitive and outlawed, because it prevents fair competition between solutions on their merits and turns it into a competition for who has the deepest war chest to spend on customer-acquisition-by-free-stuff.
At one point Docker probably may have had an authentic intent for a free service, but costs along the way changed the reality of operations and long-term cash flow and success of the business required making changes. Maybe the cash saved from bandwidth is what makes the next project possible that helps them grow the bottom line.
Further what was once a positive value proposition 18 months ago can turn into a losing proposition today, and a company should be allowed to adapt to new circumstances and be allowed to make new decisions without being anchored and held back by historical decisions (unless under contract).
As fun as it is hold executives to unrealistic standards, they're not fortune-tellers that can predict the future and they're making the best possible decisions they can given the constraints they're under. And I don't begrudge them if those decisions are in their own best interest, such as is their responsibility.
I'll give Docker the benefit of the doubt that this wasn't a bait-and-switch, that they never excepted it to become so successful, and that costs outpaced their ability to monetize the success and were eating into cash reserves faster than plan. I think the current outcome isn't so bad, and that we're still getting a considerable amount of value for free. It's unfortunate that some people are only finding out now, and are now under pressure to address an issue they didn't sign up for.
Standing up your own registry is trivial at the kind of scales (dozens-to-hundreds of images pulls per day!) that we're talking about. It's just expensive, so people want Docker, Inc. to do it for free. Well...
Those are generally solved using SSL, no need for centralized storage.
1. Setup a pull through mirror. Google Artifact Registry has decent limits and good coverage for public images. This requires just one config change and can be very useful to mitigate rate limits if you're using popular images cached in GAR.[1]
2. Setup a private pull through image registry for private images. This will require renaming all the images in your build and deployment scripts and can get very cumbersome.
3. Get your IPs allowlisted by Docker, especially if you can't have docker auth on the servers. The pricing for this can be very high. Rough numbers: $20,000/year for 5 IPs and usually go upwards of $50k/year.
4. Setup a transparent docker hub mirror. This is great because no changes need to be made to pipelines except one minor config change (similar to 1). We wrote a blog about how this can be done using the official docker registry image and AWS.[2] It is very important to NOT use the official docker registry image [3] as that itself can get throttled and lead to hairy issues. Host your own fork of the registry image and use that instead.
We spent a lot of time researching this for certain use cases while building infrastructure for serving Github actions at WarpBuild.
Hope this helps.
[1] https://cloud.google.com/artifact-registry/docs/pull-cached-...
by default anything you need from helm charts will be pulled from docker hub. and its normal to have a storage daemon, networking agents, loggers on every node so if you launch enough at once during an autoscale event, you'd trigger this limit.
A giving person could also set one of these up publicly facing and share it out.
{
"registry-mirrors": ["https://mirror.gcr.io"]
}
% whois gcr.io | grep 'Creation Date'
Creation Date: 2014-11-17T19:32:25Z
% whois ghcr.io | grep 'Creation Date'
Creation Date: 2020-04-16T16:48:05Z
Also it still takes some gymnastics to optionally support docker creds in a workflow https://github.com/orgs/community/discussions/131321
Very hard to find anything definitive still left on the web. This is all I could find...
https://github.com/actions/runner-images/issues/1445#issueco...
https://github.com/actions/runner-images/issues/1445#issueco...
> Very hard to find anything definitive still left on the web
Probably a lot happened behind closed doors so there probably wasn’t much to begin with.
I.e Docker terms of service restrict distribution in this way?
Is there any technical restraints?
I.e Docker specify no-cache
I expect Docker don't want their images cached and would want you to use their service and transform you in to a paying subscriber through limitations on free tier.
My feeling is the way the naming scheme was defined (and subsequent issues around modifying the default registry), docker wanted to try to lock people into using docker hub over allowing public mirrors to be set up easily. This failed, so they've needed to pivot somewhat to reduce their load.
We at Depot [0] work around this by guaranteeing that a new runner brought online has a unique public IP address. Thus avoiding the need to login to Docker to pull anything.
Subsequently, we also do the same unique public IP address for our Docker image build product as well. Which helps with doing image builds where you're pulling from base images, etc.
If your project can’t afford to pay for servers and sometime to maintain it, I think we should stick with local shell scripts and precommit hooks.
My blog post on the same at https://avilpage.com/2025/02/free-dockerhub-alternative-ecr-...
Edit: Not exactly, it looks like ECR mirrors docker-library (a.k.a. images on docker hub no preceded by a namespace), not all of Docker Hub.
Edit 2: I think the example you give there is misleading, as Ubuntu has its own namespace in ECR. If you want to highlight that ECR mirrors docker-library, a more appropriate example might be `docker pull public.ecr.aws/docker/library/ubuntu`.
The rate limit for unauthenticated pulls is 1/second/IP, source: https://docs.aws.amazon.com/general/latest/gr/ecr-public.htm...
Not something we'd encountered before but seems earlier than these changes are meant to come into effect.
We've cloned the base image into ECR now and are deriving from there. This is all for internal authenticated stuff though.
With a competent caching strategy (the sort of thing you'd set up with nix or bazel) it's often faster to send the git SHA and build the image on the other end than it is to move built images around. This is because 99% of that image you're downloading or pushing is probably already on the target machine, but the images don't contain enough metadata to tell you where that 1% is. A build tool, by contrast, understands inputs and outputs. If the inputs haven't changed, it can just use the outputs which are still lying around from last time.
Does it have to? It seems it should be possible to diff the layers and only invalidate if there are conflicts.
RUN echo 1 > A
RUN echo "$(cat A) + 1" | bc > B
So that's two layers each with one file.If, in a later version, the first command changes to `echo 3 > A` then the contents of B should become "4", even though the second command didn't change. That is, neither layer can be reused because the layers depend on each other.
But maybe there's no dependency. If your Dockerfile is like this:
RUN echo 1 > A
RUN echo 2 > B
Then the second layer could in theory be re-used when the first layer changes, and not built/pushed/downloaded a second time. RUN echo 3 > A # new
RUN echo 2 > B # no dependency on layer 1, can be reused
But docker doesn't do this. It plays it safe and unnecessarily rebuilds both layers anyway. And since these files end up with timestamps, the hashes of the layers differ, so both layers are consequently reuploaded and redownloaded.Build tools like nix and bazel require more of the user. You can't just run commands all willy nilly, you have to tell them more info about which things depend on which other things. But the consequence is that instead of a list of layers you end up with a richer representation of how dependency works in your project (I guess it's a DAG). Armed with this, when you try to build the next version of something, you only have to rebuild the parts that actually depend on the changes.
Whether the juice is worth the squeeze is an open question. I think it is.
I can't stress enough how much I dislike Rancher. I know we moved to it as a cost saving measure as I am assuming we would have to buy subs for Docker.
Yet there is nothing I found easier to use than Docker proper. Rancher has a Docker compatible mode and it falls down in various ways.
Now that this has happened, I wonder if Rancher is pulling by default from the Docker Hub registry, in which case now we'll need to setup our own registry for images we use, keep them up to date etc. Which feels like it would be more costly than paying up to Docker to begin with.
All this makes me almost miss Vagrant boxes.
Reasonable price for better dev efficiency. Free for personal use.
Neither are drop in replacements for Docker Desktop, that much I am certain about, thus far.
The team that will have to do this won't have it as a priority, and unfortunately that means it'll always lag behind.
Some of this I realize is company quirk specific, but even if we had our own mirror it doesn't negate the problem entirely.
The interface is very basic. I had to get plugins for very basic functionality that has been built into Docker Desktop for years, like Logs Explorer.
It seemingly always prompts for Admin Access on the computer, even though Docker long ago stopped doing this and has worked without admin access for some time.
The prompt for enabling admin access is funny. If you don't have it already, it will prompt you to enable it, if you have it enabled, it will pop up another window, very similar, and the wording will say "Startup Rancher Desktop without administrator access" but its easy to miss the wording difference, cause the font is small.
I've had stability issues, containers randomly crashing or the daemon going down out of nowhere. Happened more than once.
It claims to be a drop in for Docker CLI, but while I don't have the list handy at the moment, I know this isn't true, particularly with docker-compose
I could go on, but its still really rough around the edges.
The unauthenticated limit doesn't bother me as much, though I was little upset when I first saw it. Many business doesn't bother setting up their own registry, even though they should, nor do they care to pay for the service. I suspect that many doesn't even know that Docker can be used without Docker Hub. These are the freeloaders Docker will be targetting. I've never worked for company that was serious about Docker/Kubernetes and didn't run their own registry.
One major issue for Docker is that they've always ran a publicly available registry, which is the default and just works. So people have just assumed that this was how Docker works and they've never bothered setting up accounts for developers nor production systems.
Like, I get it, but it adds considerable work and headaches to thousands (millions?) of people.
Not Docker, but I worked on a project that used certain Python libraries, where the author would yank the older versions of the library everything they felt like rewriting everything, this happened multiple times. After that happened the second time we just started running our own Python package registry. That way we where in control of upgrades.
Nexus is very easy to set up.
You should also run your own apt/yum, npm, pypi, maven, whatever else you use, for the same reasons. At a certain scale it's just prudent engineering.
Own your dependency chain.
Finally, a use for IPv6!
I assume so anyway, as I think ISPs that support ipv6 will give you multiple IPv6 /64 spaces if requested.
The pull limits have also been delayed at least a month.
Vote with your feet and your wallets.
Can one of the big tech companies please use their petty cash account to acquire what remains of docker.com? Maybe OSS any key assets and donate docker hub, trademarks, etc. to some responsible place like the Linux Foundation which would be a good fit. This stuff is too widely used to leave taken hostage by an otherwise unimportant company like Docker. And the drama around this is getting annoying.
MS, Google, AWS, anyone?
Alternatively, let's just stop treating docker.io as a default place where containers live. That's convenient for Docker Inc. but not really necessary otherwise. Docker Inc is overly dependent on everybody just defaulting to fetching things without an explicit registry host from there. And with these changes, you wouldn't want any of your production environments be dependent on that anyway because those 429 errors could really ruin your day. So, any implied defaults should be treated like what they are: a user error.
If most OSS projects stop pushing their docker containers to docker hub and instead spin up independent registries, most of the value of docker hub evaporates. Mostly the whole point of putting containers there was hassle free usage for users. It seems that Docker is breaking that intentionally. It's not hassle free anymore. So, why bother with it at all? Plenty of alternative ways to publish docker containers.
We fixed the problem by using a pull through registry
Will need to find a way to kick docker.io to the curb. Ridiculous
I think a lot of people have misconceptions about how much bandwidth really costs.
It’s been almost a decade so it’s possible things have slowed considerably, or demand has outstripped supply, but given how much data steam seems to be willing to throw at me, I know pricing is likely no where near what it was last I looked (it’s the only metered thing I regularly see and it’s downloading 10’s of GB daily for a couple games in my collection).
Using egress pricing is also the wrong metric. You’d be better off looking at data costs between regions/datacenters to get a better idea about wholesale costs, since high egress costs is likely a form of vender lockin, while higher looking at cross region avoids any “free” data costs through patch cables skewing the numbers.
Not sure about bandwidth between countries, there’s different economics there. I’d expect some self similarity there, but laying trunks might be so costly that short of finding ways to utilize fiber better is the only real way to increase supply.
If bandwidth costs are important, there are plenty of options that will let you cut the cost by 10x (or more). Either with a caching layer like an external CDN (if that works for your application), or by moving to any of the mid-tier clouds (if bandwidth costs are an important factor, and caching won’t work for your application).
AWS, GCP, and Azure are the modern embodiment of the phrase “nobody ever got fired for buying IBM.”
Most companies don’t benefit from those big 3 mega clouds nearly as much as they think they do.
So, sure, send a note to your Azure rep complaining about the cost of bandwidth… nothing will change, of course, because companies aren’t willing to switch away from the mega clouds.
> and other providers
Other providers, like Hetzner, OVH, Scaleway, DigitalOcean, Vultr, etc., do not charge anywhere near the same for bandwidth as Azure. I think they are all about 8x to 10x cheaper.
Eg Fastly prices: US/Europe $0.10/GB India $0.28/GB
Not all bandwidth is equal. eg Hetzner will pay for fast traffic into Europe but don't pay the premium that others like AWS do to ensure it gets into Asia uncongested.
I didn’t say all CDNs are cheaper. Some CDNs see an opportunity to charge a premium, and they do!
Fastly sees themselves as far more than just a CDN. They call themselves an “edge cloud platform”, not a CDN.
> Not all bandwidth is equal. eg Hetzner will pay for fast traffic into Europe but don't pay the premium that others like AWS do to ensure it gets into Asia uncongested.
Sure… there are sometimes tradeoffs, but for bandwidth-intensive apps, you’re sometimes (often?) better off deploying regional instances that are closer to your customers, rather than paying a huge premium to have better connectivity at a distance. Or, for CDN-compatible content, you’re probably better off using an affordable CDN that will bring your content closer to your users.
If you absolutely need to use AWS’s backbone for customers in certain geographic regions, there’s nothing stopping you from proxying those users through AWS to your application hosted elsewhere, by choosing the AWS region closest to your application and putting a proxy there. You’ll be paying AWS bandwidth plus your other provider’s bandwidth, but you’ll still be saving tons of money to route the traffic that way if those geographic regions only represent a small percentage of your users… and if they represent a large percentage, then you can host something more directly in their region to make the experience even better.
For many types of applications, having higher latency / lower bandwidth connectivity isn’t even a problem if the data transfer is cheaper and saves money… the application just needs to do better caching on the client side, which is a beneficial thing to do even for clients that are well-connected to the server.
It depends, and I am not convinced there is a one-size-fits-all solution, even if you were to pay through the nose for one of the hyperscalers.
I have plenty of professional experience with AWS and GCP, but I also have professional experience with different degrees of bare metal deployment, and experience with mid-tier clouds. If costs don’t matter, then sure, do whatever.
egress in the cloud is deliberately expensive as an anti-competitive measure to lock you in and stop you using competitors services
I love how everyone is arguing about networking costs inside the tiny prison cell is "the cloud". Because obviously the only way to push bits over the wire is through an AWS Internet Gateway, which was the very first packet-switched routing ever.
Docker can't really market to machines doing most of downloads autonomously and probably can't monetize download data well to, so they want you to start paying them... or go use something else.
If I read these limits correctly, looks like lots of things are going to break on March 1st
Teaching people to use Docker is not uncommon. The entire class pulling an image at (roughly) the same time is not uncommon either.
Yes, you can ask people to set up an account (provided you don't have policies against requiring students to sign up for unvetted US-based third-party services and provide personal data to them), but that complicates things.
"Docker Desktop is free for small businesses (fewer than 250 employees AND less than $10 million in annual revenue), personal use, education, and non-commercial open source projects."
I think that's reasonable, but it's hard for me to believe everyone's paying when they should be. I set up podman instead and I haven't had any major issues.
https://dev.to/shohams/5-alternatives-to-docker-desktop-46am
I don’t use windows, put presumably you can just use their built in linux environment and docker cli.
Docker just needs to be open source software, there's no real revenue model that makes sense, but damn they're trying. Now I guess dockerhub is also just off the table.
There is a huge difference in images carefully curated, with separate build layers and shipped layers vs the ones that dump in the codebase, install a whole compiler toolchain needed to build the application / wheels / (whatever its called in Node.JS), package it, and then ship off the image.
Clearing your apt cache and removing extraneous packages is peeing in the wind when faced with GB worth of shared objects.
A quick Google search resulted in this [1]: then I realized that the author of this project is pretty much the company I work for. Wow, such a small world.
[1]: https://github.com/Sqooba/k8s-mutate-image-and-policy-webhoo...
That's understandable, but if the claim would be that this is primarily related to the costs of bandwidth, shouldn't the instructions to deploy an image caching solution (e.g. Sonatype Nexus or anything else) be at the forefront?
Like, if the same image gets pulled for some CI process that doesn't have a cache for whatever reason or gets redeployed often, having a self-hosted proxy between the user and Docker Hub would solve it really well with quite limited risks.
I'd like to find something that:
- Can pull and serve private images
- Has UI to show a list of downloaded images, and some statistics on how much storage and bandwidth they use
- Can run periodic GC to delete unused images
- (maybe) Can be set up to pre-download new tags
IIRC Artifactory has some support for Docker images, but that seems like a big hammer for this problem. [1]
[0] https://docs.docker.com/docker-hub/image-library/mirror/
It... does not have a UI or the GC/pre-download stuff, but it absolutely works for private images (see: https://distribution.github.io/distribution/recipes/mirror/#...)
I've been using it as a cache for a while locally and it's a solid choice.
---
I guess an edit - it does also have basic TTL, which might cover your GC case, but it's not very configurable or customizable. It's literally just a TTL flag on the proxied image.
They already set up a URL in harbor that mirrors docker.io containers.
Well, that's ominous. No mention what they consider consider excessive or how much they might charge. They're essentially saying they can send you whatever bill they want.
edit: Oh, per hour. I thought that was per MONTH. Okay, I can survive with this, but it's still puts me on notice. Need to leave dockerhub sooner than later.
This forces pretty much everyone to move to a Pro subscription or to put a cache in front of docker.io.
Still doable though.
{
"registry-mirrors": [
"https://pt-dh.int.xeserv.us"
]
}
Where the URL points to your pull-through docker hub cache.I switched to podman during their last stunt in 2020 and have been a happy user since.
Going forward, the cheapest (free) container hub today is probably github.
I know that places like Circle already do a lot of stuff to automatically set up local caches as it can to avoid redownloading the same thing over and over from the outside world, and I hope that becomes more of the norm.
This timeline is kinda wild thouhg.
I would be happy to give back to the community by hosting a container p2p host.
would that be even possible out of the box?
You can run your own following Docker's own guide here[0] if you'd like. It's not peer-to-peer in the sense that the lines between clients and servers are blurred, as with torrenting, but it allows for a distributed registry architecture, which I think is the part that matters here.
[0] https://www.docker.com/blog/how-to-use-your-own-registry-2/
They could just give X Budget to public images and create a status code for 'server overloaded, pls consider buying premium' or whatever.
It would create the same responose: Either paying or mirroring it yourself but it wouldn't harm the reputation that much.
It’s much healthier for the ecosystem to have lots of small registries rather than all depend on a single central one.
I get that bandwidth is expensive, but this feels a bit like the usual "make it free to get lots of users, and then start charging when everyone is locked in" plan.
If they really just want to reduce their own costs, they should be evangelizing the use of a caching proxy, and providing a super easy way for people to set one up, both on the server and client side. (Maybe they already do this; I haven't looked.)
If everybody did a fair-use of the Docker Hub maybe we wouldn't have the rate-limits in the first place? But I think we all learned that won't be happening in the open Internet.
Setting up a pull-through cache is pretty straight-forward, you can find the instructions in Docker's documentation: https://docs.docker.com/docker-hub/image-library/mirror/
Something that doesn't require me to go through 50+ container setups and manually move every one of them to use my custom proxy?
If that's not enough, you could tunnel through HE's tunnelbroker and get a /48 which has 65,536 separate subnets for 655,360 pulls per hour.
Though, honestly, for the effort involved you're probably better off just mirroring the images.
Really, all this networking expertise floating around, and Docker artifacts already being content-addressable, there should be a way to torrent them.
> 2.4 You may not access or use the Service for the purpose of bringing an intellectual property infringement claim against Docker or for the purpose of creating a product or service competitive with the Service.
Which is a great reason to default to / publish on other registries.
I understand Docker is paying for the bandwidth, but it's relatively cheap for them at the scale they operate. ghcr.io doesn't impose any rate limit at all (although it isn't really GitHub's main product), which I'd say proves that it's sustainable. In any case, 100 to 10 and 200 to 40 are both huge decreases and are unjustifiable for me.
If you don't want to host an OSS repository, just decide to not do that. And this is the first I've heard of it so now it's an emergency to work around this rug pull.
Now for every image I'm going to have to try to find a trustable alternative source. (things like postgres, redis, nginx) or copy and rehost everything.
I'm biased (i.e., co-founder of Depot [0]) and don't have the business context around internal Docker things. So this is just my view of the world as we see it today. There are solutions to the egress problem that negates needing to push that down to your users. So, this feels like an attempt to get even more people onto their Docker Desktop business model and not explicitly related to egress costs.
This is why when we release our registry offering, we won't have this kind of rate limiting. There are also solutions to avoiding the rate limits in CI. For example, our GitHub Actions runners come online with a public unique IP address for every job you run. Avoiding the need to login to Docker at all.
Please do elaborate on what those are!
There are always lots of comments like this providing extremely vague prescriptions for other people's business needs. I'd love to hear details if you have them, otherwise you're just saying "other companies have found ways to get someone else besides their customers to pay for egress costs" without any context for why those people are willing to pay the costs in those contexts.
I figured some kind of smart download manager and caching system would save the day but frankly I saw Docker as a step backward because I had been doing a really good job of installing 100+ web services on a single server since 2003 or so. [1] [2]
Looking back it, I'm sure that a short timeout was a deliberate decision by the people running Docker Hub, as people with slow internet connections because telcos choose not to serve us with something better are unpeople.
[1] Nothing screams "enterprise feature, call sales for pricing" like being able to run your own local hub
[2] My experience with docker is roughly: if you can write a bash script to build your environment, you can write a Dockerfile; the Dockerfile is the gateway to a system that will download 5GB of images when you really want to install 50MB of files, so what's the point? Sure, Docker accelerates your ability to have 7 versions of the JVM and 35 different versions of Python, but is that something to be proud of, really?
I agree.
> Sure, Docker accelerates your ability to have 7 versions of the JVM and 35 different versions of Python, but is that something to be proud of, really?
No, but it's not my fault that the python packaging ecosystem is broken and requires isolation, and that every Java project relies on a brittle toolchain. At least docker means that nonsense is isolated and doesn't affect the stuff I write.
The bigger problem is when projects only officially ship as docker images for some banal reason.
As opposed to what? SystemD?
The only folks likely to feel pain from this change were those either deliberately abusing Docker’s prior generosity or using bad development and deployment practices to begin with. I suspect that for 99% of us regular users, we won’t see or feel a thing.