Here are the notes I made at the time: https://simonwillison.net/2020/Jul/30/fun-binary-data-and-sq...
I built https://datasette.io/plugins/datasette-media as a plugin for serving static files from SQLite via Datasette and it works fine, but honestly I've not used it much since I built it.
A related concept is using SQLite to serve map tiles - this plugin https://datasette.io/plugins/datasette-tiles does that, using the MBTiles format which it turns out is a SQLite database full of PNGs.
If you want to experiment with SQLite to serve files you may find my "sqlite-utils insert-files" CLI tool useful for bootstrapping the database: https://sqlite-utils.datasette.io/en/stable/cli.html#inserti...
Your JS bundler (if you use one!) might be effectively rewriting imports anyway so it can be convenient to do the content-hash-named-files rewrites in the same place. But not everyone wants to use a bundler for that.
Clace can also run containers, where the UI is served by the container. For example, Streamlit/Gradio based apps. In that case, Clace acts like an app server and reverse proxy to the container, no file name rewrites are done.
Clace includes esbuild for being able to import npm packages as ES modules https://clace.io/docs/develop/javascript/. There is no support for any JS bundling.
One major downside I see would be backing up the database. You can do granular backups on a filesytem. Even rsync would work fine. You’d need to basically snapshot the entire thing here.
requests-cache caches requests in SQLite by (date,URI) IIRC. https://github.com/requests-cache/requests-cache/blob/main/r...
/? pyfilesystem SQLite: https://www.google.com/search?q=pyfilesystem+sqlite
/? sendfile mmap SQLite: https://www.google.com/search?q=sendfile+mmap+sqlite
https://github.com/adamobeng/wddbfs :
> webdavfs provider which can read the contents of sqlite databases
There's probably a good way to implement a filesystem with Unix file permissions and xattrs extended file attribute permissions atop SQLite?
Would SQLite be faster or more convenient than e.g. ngx_http_memcached_module.c ? Does SQLite have per-cell ACLs either?
Your server on its own, whether it uses SQLite or the filesystem, cannot save you from having broken webapps during an update. Each page in the browser is a _tree_ of resources, which are fetched with separate HTTP requests and therefore not subject to your server-side transaction/atomic update system. You can change all resources transactionally on the server side but the browser can still observe a combination of old and new resources.
The common solution here is to ensure that all sub-resources of a page (javascript bundle(s), stylesheet(s), media, etc) are named (ie, in the URL) using a content hash or version. All resources (from the root HTML document down) need to refer to the specific content-hash or versioned name, so that if the browser loads version X of the root HTML document then it will load the _corresponding version_ of all sub-resources. Not only that, but if you're updating a page from version X to version Y, all the version-X sub-resources need to remain available after you start serving page version Y, until you're confident that no browser could reasonably be still loading version X of the page, otherwise you can break page version X while someone is still loading it.
This means you actually specifically don't want to put sub-resources for a page into one atomically-switched-out bundle along with the root HTML document, because if you do that you'll be removing the previous versions of sub-resources while they may still be referenced.
Also of course in some cases there might be some sub-resources (e.g., perhaps some media files) that you want to be versioned "separately" from the HTML document that contains them, so that you can update them without busting caches for all of the page/app structural elements like your javascript blobs and stylesheets and so on, and you _might_ need to take that into account in your page build system as well.
During experiments of this stuff at a big company (which saw a large portion of the web during this time):
- we saw most (>80%) of users staying on a web app ~2-3 days (most likely skewed from people leaving tabs open over the weekend).
- 95% was ~2 weeks
- 100% was about 600 day (yes, apparently we had users with a tab open for nearly 2 years)
If you are seeking 100%, you are going to be waiting a while.
: this is 100% from memory, I don't work there anymore.
I'm impressed you kept the measurements going long enough to track the 600-day users though!
I agree generally with your comment. Transactional updates prevent only one category of update related issues. There can be other issues at the app level which can cause a broken experience. Continuing to serve older versions of static content when referenced by content hash is possible, it is not currently implemented by Clace.
The main trick is upload all of the non-html changes before the html changes, so that no file is referenced before it exists. If you want to make the app as complex as possible you do a depth first search for upload. But if you value your sanity you relax the problem and have assets-first in your app.
It was hella fast how quickly we could load our assets (this was for a mobile game so only a handful of assets were not in the db). It's been neat seeing people adopt it further.
One aspect someone might also not realize is you can store near infinite metadata alongside your content, so you can always just query the database to find "similar" files. We threw so much metadata into the DB and I think in the end the pak file was 200MB, and the database was something like 20MB. Again, this was a mobile game. I think the worst thing we had happen on the client side was a single double inner join that we couldn't figure out how to reduce due to some server side complexity (we weren't able to implement the server, which was frustrating as the people we worked with were very bad™ at software development so our builds would break out of nowhere when they would change the entire backend spec without alerting us )
We also used a separate sqlite3 database for game replays so you could (post match completion) replay an entire game and see what each opponent was doing during said game. This was super nice for automated testing as well.
- File locking, concurrency, etc., are problems that SQLite solves
- Using SQLite allows drumroll querying files with SQL instead of platform-dependent fs APIs
- Using SQL queries is typesafe with Kysely https://kysely.dev/ (without the need for an ORM)
What are your reasons for advocating for SQLite as a filesystem abstraction?
Wow, this is even better than what I've seen people do with F# type providers. Cool cool cool.
[0] https://github.com/opral/monorepo/blob/99356e577f558f4442a95...
diff --git a/packages/lix-sdk/src/query-utilities/is-in-simulated-branch.ts b/packages/lix-sdk/src/query-utilities/is-in-simulated-branch.ts
index 7d677477e..39502f245 100644
--- a/packages/lix-sdk/src/query-utilities/is-in-simulated-branch.ts
+++ b/packages/lix-sdk/src/query-utilities/is-in-simulated-branch.ts
@@ -21,10 +21,9 @@ export function isInSimulatedCurrentBranch(
// change is not in a conflict
eb("change.id", "not in", (subquery) =>
subquery.selectFrom("conflict").select("conflict.change_id").unionAll(
- // @ts-expect-error - no idea why
subquery
.selectFrom("conflict")
- .select("conflict.conflicting_change_id"),
+ .select("conflict.conflicting_change_id as change_id"),
),
),
// change is in a conflict that has not been resolved
- target the browser
- supports any file format, not just text files
- understands changes, not just versioning of files
- suited to build web apps on top
- ultimately enable 1000's of artists, designers, lawyers, civil engineers, etc. to collaborate
we are 2 years into the journey. dec 16, we'll release a public preview
Modern web servers like Nginx employ optimal strategies for handling static files, starting with sendfile and extending to io_uring and splice operations, all within well-designed thread pools based on epoll, kqueue, or eventport (whichever suits your needs).
Meanwhile, the best SQLite can offer (at least natively) is its memory-mapped I/O support (https://www.sqlite.org/mmap.html). This might work well for single-client services, like locally hosted web apps (see also https://github.com/electron/asar). However, for large websites, as mentioned in other comments, it attempts to solve a problem that doesn't exist.
Source code is available here if you wanna take a look: https://code.up8.edu/pablo/cigala
A warning: the code is not written to be pretty, it's written to be simple and effective. The initial goal being to have the whole software in a single PHP file that could even be used offline to produce a fully static website.
A friend of mine in astronomy once observed that a lot of people in the sciences should get comfortable working with databases because otherwise they just end up inadvertently spending a huge amount of effort rolling their own really crappy database.
> The amount of web traffic that SQLite can handle depends on how heavily the website uses its database. Generally speaking, any site that gets fewer than 100K hits/day should work fine with SQLite. The 100K hits/day figure is a conservative estimate, not a hard upper bound. SQLite has been demonstrated to work with 10 times that amount of traffic ... The SQLite website (https://www.sqlite.org/) uses SQLite itself, of course, and as of this writing (2015) it handles about 400K to 500K HTTP requests per day, about 15-20% of which are dynamic pages touching the database. Dynamic content uses about 200 SQL statements per webpage. This setup runs on a single VM that shares a physical server with 23 others and yet still keeps the load average below 0.1 most of the time. See also: https://news.ycombinator.com/item?id=33975635
Clace is built for use cases where a team wants to deploy web tools for internal use. Tens or hundreds of (small) apps can be hosted on one machine. Clace provides blue-green staged deployment, preview env, OAuth access control, secrets management etc for the apps. Apps can be updated atomically, i.e. a reload command can update multiple apps in one go. If any of them fail to update, the whole operation is aborted.
Clace apps can be of three types 1) Backend actions (defined in Starlark) which have an auto generated form UI 2) Hypermedia based apps where Go Templates and HTMX are used to create a custom UI 3) Containerized apps, where an container image is created and started. Clace can be used to just proxy the container APIs or again build a hypermedia based UI for those APIs.
A SQLite database is used for all app metadata (version info, config etc). The same database is used to store all the files for the app. This would be all the files for the hypermedia app, templates and static files, or the files to build the container image. Basically the source code is uploaded from GitHub or local disk to the database.
So a command like
clace app create --approve https://github.com/claceio/apps/utils/bookmarks /bookmarks
will get the source code for the app from GitHub and write it into the SQLite database. Later, running clace app reload --promote /bookmarks
will fetch the latest source code from GitHub and update the app to use that. The older version is retained in the database. A command like clace version switch previous /bookmarks
will change back to the previous version. No call to GitHub is required for this switch, since the versioning is done in the database. This works even if git is not being used (local disk was used for initial app creation).The versioning, de-duplication and other file metadata handling are much easier because of using the database. It would have been possible to do it on the file system, but I believe it would have required more effort.
Because file system excels at handling files. Need an atomic update? Checkout into a new directory and switch a symlink.
I've seen various versions of using a database as a filesystem. They have a nice side to them and then a nightmare side to them when the shit hits the fan.
You can then also use something like btrfs to dedup at the filesystem layer as well.
but....the SQLite file is locked against reads while writing in order to achieve serializable isolation. which sort of indicates you're better off doing your database work in an offline file, then just swapping the new file for the old one that's in production. which sort of indicates....just use a tar file, or a separate directory that you swap in to update the new content.
it is so much easier to serve static files statically rather than from some program that's trying to manage live SQLite connections and achieve some weird "update concurrency" magic when this problem is not at all that hard to solve. It's fine to manage your CMS in a sqlite database but when you serve the content live and it's static, use static files.
Not with WAL: https://www.sqlite.org/wal.html
Used to be hackily done by overwriting symlinks, but now there's https://manpages.debian.org/testing/manpages-dev/renameat2.2...
using sqlite for static content might have other benefits, but i don't think that this is the main one
I've tried various solutions like litestream and even xdelta3 (which generates patches in excess of the actual changeset size), but I haven't found a solution I'm confident in other than backing up complete snapshots.
I can see how you'd use it for replication though.
Seeing as you need some kind of layer between web and sqlite, you might as well keep a layer between web and FS who nets you most or all of the benefits.
Use git to manage versions of the static files.
Use [git worktree][1] to have multiple separate working trees in different directories.
Use [openat()][2] in your app whenever serving a request for a static file. On each request, open the directory, i.e. `/var/www/mysite.com`, and then use `openat()` with the directory's file descriptor when looking up any files under that directory.
`/var/www/mysite.com` is a symbolic link to some working tree.
When modifying the site, make modifications to the git repository. Then check out the modified working tree into a new directory.
Create a symbolic link to the new directory and `mv` (rename) it to `/var/www/mysite.com`.
Then some time later, `rm -r` the old working tree.
[1]: https://git-scm.com/docs/git-worktree
[2]: https://pubs.opengroup.org/onlinepubs/9799919799/functions/o...
bad match -- git stores every version in the history forever. Do you really need every revision of a binary file, completely duplicated? big files, more bad
[1]: https://www.reddit.com/r/git/comments/wk2kqy/delete_files_un...
1. I should make blog engine using flat files. That way I can diff the plain text.
2. But flat files are hard to read/write structured data from, so I should use SQLite instead.
3. But SQLite is hard to diff. GOTO 1.
There's no reason you can't do the same but including the content too.
That way you're always committing a text friendly version of the database rather than the database binary encoding. And your diffs will work great.
diff <(sqlite3 db 'select text from posts where id = 1') <(sqlite3 db 'select text from posts where id = 2')
On the other hand, the state of art when it comes to performance is zero-copy from disk to NIC using io_uring and DMA, combined with NICs that support TLS & checksumming offloading and separated header and payload that can be filled independently by the OS and user-space.
I wonder if the authors of these projects ask themselves: what reasons are there to not do it like this? This thread has a few answers.
I am not claiming a SQLite database is always the right approach for serving static files. In my particular use case for Clace, I found a database to work better.
Otherwise the file system is far superior for every "benefit" they list. And far easier to work with other software and more robust over time (nothing to change/break).
From what I remember, this was an atomic operation, and if there was a download in progress it would continue using the old file, because the data was still on disk and the filename was simply a pointer to the node.
This may behave differently on other file systems. This was done on an old Linux server with ext3
Seems like a simpler solution than using a db
I think the bigger benefit though is the increased read performance on many small files (saving system call overhead). To which amount that applies to static files that a smart server might keep in cache, I don't know.
Isn't this pointless when those files are transferred via multiple requests? What does it matter if you atomically update files A and B together when the browser requests A before your update and B after it.
And if you don't care about that then rsync with --delay-updates is just as good.
Something I want to try is using sqlite as my filesystem but just storing content hashes that would point to an S3-compatible object store, so you get the atomicity and rollbackability and all but you also get massive parallelization and multiregionality of an object store
Edit: I googled it again and find that multiple processes can read sqlite concurrently so shouldn't be a problem
https://stackoverflow.com/questions/4060772/sqlite-concurren...
As far as the results go, though, I don't see any realistic scenario where this is a net win vs a symlink. :)
There is a plugin for that but it's super expensive.
----
I don't really "get" this. A lot of the comments here are focused on SQLite, but the page clearly says:
> Clace currently runs on a single node. When multi-node support is added later, the plan is to use a shared Postgres database instead of using local SQLite for metadata and file data storage.
Let's start with some basics:
1: What problem is Clance trying to solve? (IE, did the authors encounter problems hosting / generating static sites, and Clance is designed to solve a specific problem they had?)
2: Why can't Clance be multi-node today? (What potential problems would happen if I had multiple Clance nodes pointed to the same database, or github repo?)
Naively, I thought Clance wasn't serving truly static sites; IE, I thought Clance would be serving sites that allow querying static / infrequently changing data. I thought it would be for sites that are heavy on charts, searches, or other forms of data analysis.
In the authors use case, it might also be easier to drop the executable and one big database file on a random shared file store in the company network instead of many small files and a database file for metadata.
And it might be faster to read files from db than from fs https://sqlite.org/fasterthanfs.html
TL;DR: No real difference in low throughput environments, 2.3x slower on high throughput
So let's go over the claims about SQLite:
>Transactional Updates : This is the main benefit. Updating multiple files can be done in one transaction. Isolation ensures that there are no broken webapps during the update.
Yep, big one
>Deployment Rollbacks: Another of the transactional benefits is the ability to roll back deployment in case of errors. If multiple apps are being updated, all of them can be rolled back in one go. Rolling back a database transaction is much easier than cleaning up files on the file system.
I guess if you don't keep old artifact of what you are serving up. We zip up our entire static site as part of pipeline and dump into Azure Storage before deployment.
>File De-duplication Across Versions: Clace automatically versions all updates. This can lead to lots of duplicate files. The file data is stored in a table with the schema
INTERNAL SCREAMING Deduplication gives me nightmares from cache issues and so forth. Show me SQLite level of tests here to save a few kilobytes.
>De-duplication Across Apps : Each production app in Clace has an staging app. Apps can have multiple previews apps. This can lead to lots of duplication of files. Using the database helps avoid all the duplication. Even across apps, there are files which have the same contents. Files are de-duplicated across apps also.
INTERNAL SCREAMING Oh god, I hope this section of the code is extremely battle tested because de-duplication at this level gives me SRE nightmares.
>Easy Backups: Using SQLite means that backups are easy. The state of the whole system, metadata and files can be backed up easily using SQLite backup tools like Litestream.
tar directory is difficult?
>Content Hashing: For content caching on the browser, web servers return a ETag header. Using the database for files makes it easy to save the content SHA once during file upload without having to recompute it later.
Most web servers handle this for you.
>Compression: The file contents are saved Brotli compressed in the SQLite table. The database approach has the advantage that contents can be saved in multiple formats easily. GZip compressed data and uncompressed data can be added by just adding a column in the files table.
Cool, now you have to uncompress it every time you want to serve it? Is there massive shortage on disk space and I missed the memo?
I mean, I don't know what Clace does, seriously OP, reading your front page and I have no idea but disk space/speed is rarely a factor. Hell, most stuff probably gets loaded into RAM and is served from there at the end. Meanwhile, your content is being served out this proprietary file system and you are kind of locked in.
Clace is built for use cases where teams want to be able to easily deploy Streamlit/Gradio/FastAPI type internal apps (python is supported out of the box, other languages work if they have a Dockerfile present). Clace adds the image building, GitOps, OAuth access control, blue-green staged deployment, secrets management etc on top. So use cases where Kubernetes and an IDP would be overkill for.
If this person is anything like me, it's this "feature" that is giving them nightmares. This code has to be rock solid and impossible for anyone to mess up. It only takes one little bug and EVERY SINGLE APP GOES DOWN. Congrats on inventing a new single-point-of-failure!
Not necessarily! If the client supports Brotli encoded data (Accept-Encoding: br), you can pass it straight through without decompressing. Ditto for gzip, or zstd for newer browsers.