IMO, this kind of thing is what AoC is good for - you get to play with weird/obscure stuff without affecting your day job code.
https://blog.tracefunc.com/2011/11/19/parsing-json-in-sql-ht...
Luckily it’s not a handwritten XML parser though: https://learn.microsoft.com/en-us/sql/t-sql/functions/openxm...
For me, the biggest problem was memory. Recursive CTEs are meant to generate tables, so if you are doing some maze traversal, you have to keep every step in memory, until you are done.
I really would like Postgres to have proper support for writing Datalog queries, and with better and more efficient incremental evaluation algorithms as opposed to the iterative semi-naive algorithm that is only supported now.
(I’m a fan of MS SQL but it’s Microsoft and also hard to financially justify for many companies. But if you ever get to use it, it is a very solid RDBMS, even if the rest of your stack is open source.)
The model I am most familiar with is a 10-20 employee B2B SaaS startup running one "big" instance on a single vm in the cloud somewhere. If this is approximately all you require, then the cost should not be a dominating factor in your decision.
I think "because Microsoft" is also really poor justification if we are being serious about the technological capabilities and pursuing high quality business outcomes. If your business is fundamentally about open source advocacy and you are operating as a non profit, I totally get it. But, this is probably not your business model.
This sentiment against it really always comes only from people who have not used it or have never touched the enterprise version which is a very mature ecosystem with lotta features available for ages now.
Most of my career I’ve been dealing with DBs and MSSql is the easiest to admin perhaps being also tightly integrated with all scripting in the platform. It also runs Linux and is doing it better than the rest can say for running Windows.
Which makes sense as both are declarative logic-based languages. IMHO, SQL and Prolog fundamentally have much in common.
I laughed aloud at "It's not as bad as you might think if you know what you are doing."
... because that pretty much describes all human activity :-)
Suspicious. Need to investigate if taco bell has interesting ANSI SQL flavored chicken nuggets or I've been taken for a fool!
are we to feel Taco Bell menu about it all, what am I missing?
The cycle in tech will shift from one side to another; last decade was a lot of no-sql where all this logic and implementation had to reside on the application side. I'm seeing a shift back to DBs again (e.g. supabase), and the reality is that it's probably a continuum somewhere in-between, depending on your requirements (as always, the answer is "Yes, but it depends").
Remember, these are all "just" tools.
A small fraction of people bothered to learn how they work and know how and when to use them properly.
Everyone else just perpetuates a giant in-joke of “it’s black magic!”
It seems to me that we have the abstraction inverted when it comes to ORMs, which is why so many experienced devs dont like ORMs (but can't really articulate why).
Here's my take, in the context of business logic:
The schema represents the business rules. The ORM papers over the DB so that the code doesn't have to match the business rules. Then the dev implements those rules again in code, but poorly.
If you simply skip the programming language using something like Postgrest you end up with a declarative structure of the business rules.
The first problem is that most developers would rather hunt a bug or perform a modification on a 30k Java or C# or python or ruby program than in 5k of SQL.
The next problem is that tooling for SQL is crap. It's declarative so good luck stepping through the logic. The dialects are all different so editors aren't great at spotting errors.
The last major problem is that changing it is hard - you can't simply change the schema and redeploy because migrations have to be made on the existing data.
All of these are surmountable in some ways, but they certainly ain't easy.
Doing LoB apps have opened my eyes a lot: maybe 99% of logic is shorter when expressed in SQL. Of course, that means that the only use of an ORM in this case is connecting to the DB, sending the query, and sending back results.
Mystery triggers, unclear or badly managed stored procedures, opaque invariants. It’s not to say that all of these things are bad, but a certain level of DB complexity will naturally set off alarm bells for some devs.
Our Industry pretends solutions must be immediately understandable with common sense, without study or explanation. But nothing that actually matters does this (relational databases in the first place).
No other engineering discipline has this confusion. The world has complexity and it’s our job to understand it.
It is tedious to specify all that information (access, constraints, foreign key, cascading, views), but it’s all essential information you would end up specifying in another way.
The problem being complex is the issue. SQL is a standard, condensed, extremely performant, actually testable, and logical language. Sure, not anybody can instantly maintain it but that would be the same as if it was a lot of lines and functions in Java. The more lines the more risk for bugs.
I also love how deep SQL goes. And that makes totally sense. It powers the world of data since 40+ years so of course people asked for niche features. One of my favorite is the model clause in Oracle with which you can implement multidimensional arrays. A friend implemented Conway's game of life with it in way less lines you expect.
Ended up rewriting it in native code, and it run in less than a second. Most of the work was proving it produced same results... and writing and documenting test cases, so next person wouldn't have to go through that.
After that experience, I have generally avoided putting a lot of business logic in SQL...
But if both are fine there is no way that any native code will be faster than a database. No network latency, set operations, index usage, etc. DBs have all the the info to make data access fast.
Only if, ONLY IF, you have a lot of people that are well versed in SQL. It's very easy to write bad SQL. It's difficult to unravel thousands of lines of bad SQL spread across hundreds of procedures / views / functions. Ask me how I know...
It seems crazy at first, and then I continue thinking about it and it still seems crazy for wanting complexity to be in SQL.
Personally, I want anything complex to ideally easily testable, both manually and automatically. SQLs is easy to test manually, but harder to test automatically, especially compared to lines of code in a programming language. At least you can somewhat untangle balls of spaghetti code into less dense, then attack parts of it. How would you deal with a tangled ball of SQL spaghetti?
> The more lines the more risk for bugs.
I don't fully agree with this either, not all lines are equal. One big fat line of a 400 character long SQL query has higher chance of containing issues not easy to glance compared to 400 lines of Java code, and I say this even as a person who despises Java for various reasons.
I only made it to Day 6, and not even both stars every day. I'm pretty confident my Day 7 solution is correct, but I hit per-cell character limits on longer inputs.
Enjoy :) oh but don't open it on mobile, some sheets crash the app
https://docs.google.com/spreadsheets/d/10FY-89y19tnRM_EAAnCd...
Does it use Google App Script? Would be a way to gain extra power I think.
Isn't that just straight up JavaScript at that point? Feels kind of like cheating if the goal is to complete AoC with a spreadsheet.
Code, frameworks, ORMs, "best practices", patterns, et. al. are ultimately a distraction. There are a million ways to get the data in & out of the database. Moving the bits around is not valuable. There are many overblown software solutions out there that could have been a simple merge statement or CSV import job.
I think that a lot of the misconceptions and bad moods about SQL come out of being forced to work with nasty schemas. The language itself is really domain specific. Consider that one wouldn't complain as much about a super fucked up nested query (and resulting SQL syntax woes) if it wasn't necessary to write such a query in the first place. Aligning tuples and relations to the way the business typically talks about them means you will be less likely to be fighting these things over time. Often, it isn't possible to refactor the schema from zero, but you can put replicas/views around a "bad" schema and target it with your new development & refactors.
And in "modern" SQL this is solved with CTEs. Use them to unwrap the schema the way you want it first, before implementing the business logic.
This dis not arise as a need to solve business mess, but as a need to skip DDL for views/mviews.
You can have arbitrary dimensions sliced in CTEs which does not immediately imply a goal of business schema alignment.
Besides the top SQL devs I’ve met don’t lose time to align schema at all, but write the DB table names and columns as they are, because well… because they were usually the people who created the mess this way.
Many reports such as month-to-month increase of sales or other statistical stuff is much more readable when implemented with CTes. Besides - older versions of DB software didn’t always have window functions or proper ranking, so problems such as top-n were not (and in some occasions still are not) trivial to write unless view/cte is used. We talking pages of single query here, not textbook examples.
So is really about dice and slice in a convenient way, but less about business schema or the ideal E/R
In my experience, normalizing tables is mostly for simplifying inserts. When you write business logic it is rather the opposite, you want non-normalized data because this is where the data is all in one place. And this de-normalization is a great use of CTEs. As well as a general cleanup of the input data, which was my original point.
I wish more people would see the beauty. After a session of SQL, when I take a step back and think.
"Hold on. What I have been doing lately is just pure logic. No library dependency resolution, no concurrency problems (even though massive concurrency is certainly under the hood). No mutability issues. Just logic."
SQL obviously has its warts, some of them serious, like testability. But at the end of the day, I wish all programming was like that. Let the computer decide how to do stuff under the hood. And let the human focus on the logic.
I have somewhat half-assed tried to read up on Prolog for trying to take it to the next level, but failed sofar unfortunately. (It was also a goal to try to unlearn some SQL to avoid getting stuck in some local optimum). Maybe somewhere between SQL and Prolog is the future of programming.
Must be Datalog then ;)
I'm sure I'm wrong and missing something but that's where I stop.
I'm not sure what database technology/data storage that doesn't involve doing those two things to get started.
I haven't done any Prolog, but Datalog which is similar and focused on querying data, and the benefits of Datalog for me is that you can write complex/medium-complex queries a lot easier than in SQL, at least for me. Simpler queries are just less characters, but pretty much the same beyond that.
Probably helps that the Clojure ecosystem embraced Datalog so it mostly feels like writing Clojure code although it's really Datalog. You don't get that same feeling regardless of what SQL library/DSL you use.
it was, it most probably is
I dunno... I've written a tremendous amount of SQL, and I still have to think imperatively (iteratively) in order to write queries that are actually performant, and to know which indexes need to exist.
It would be wonderful if I could just think in terms of set operations, but that tends to result in queries that take 5 minutes to execute rather than 5 milliseconds.
My entire thought process is basically -- what table do I start with, what rows in what order, joining to what, under what conditions, aggregating how, rinse and repeat... It's entirely a mental model of loops and aggregation, never of set operations.
But BigQuery allows you to scale it to 100s of CPUs without having to worry about indexes.
I would do almost any amount of iteration and index tuning to keep the query on a single machine rather than deal with a networked distributed system.
When you get slow queries the real problem is algorithmic complexity and linear workers only can do so much,
But even with BigQuery, you've still got to worry about partioning and clustering, and yes they've even added indexes now.
The only time you really just get to think in sets, is when performance doesn't matter at all and you don't mind if your query takes hours. Which maybe is your case.
But also -- the issue isn't generally CPU, but rather communications/bandwidth. If you're joining 10 million rows to 10 million rows, the two biggest things that matter are whether those 10 million rows are on the same machine, and whether you're joining on an index. The problem isn't CPU-bound, and more CPU's isn't going to help much.
But to me those optimizations are not imperative in nature.
(And BQ will probable eat the 10 million to 10 million join for breakfast...)
People skip ahead to all kinds of nonsense; but most of software engineering is putting the right data into the right format, and moving it around reliably.
I just did a major refactor of a complex distributed code base. I pretty much only count the work I did on schema re-design as the actual “job”, the rest was many hours of coding, but that’s really just implementation.
There are other ways to define schema than SQL of course, but it’s really the perfect way to learn true systems engineering.
I often talk to developers who hate that idea though. They want me to instead move all the data to the backend, for a massive IO hit, just so that the computations can be expressed in a "real" programming language.
I think SQL the concept is good but SQL the language is the problem. There are so many awkward things in it -- not strange as it has had no competition in 40(?) years!! The mental model of the program I write is fine but I really need to overlook the syntax and to the program I am really writing to see any elegance...
What we need I think is a properly designed programming language, designed for existing databases (Postgres, MSSQL) compiling to SQL dialects. I see some contenders but they all have some attachment to a special domain, such as not allowing data modifications (PreQL) or being associated with other databases.
Itching to do it myself, but it's a lot of work and a long long road to any adoption with no guarantee of success, and no revenue stream that I can think of.
The most popular backend languages were made by large companies, but I think coding in SQL is in a catch-22 where it will be frowned upon until there is a better language and no better language until it is more popular..
There's a lot that is very right about SQL, but a few clunky bits around the edges.
CTEs made a lot of difference, and window functions - which can be a bit head bending - made difficult things a tiny bit easier.
I'm using BigQuery, which supports structs and arrays, but only recently allowed arrays to be grouped by, although there is still no equality check etc.
BigQuery is slowly adding more sugar, like aggregate UDFs and polymorphic UDFs using ANY TYPE parameters etc, and I find myself putting more reused logic into tidy functions, but my pet want is for temporary functions to be declared and scoped like CTEs so they integrate a lot better into tooling like DBT that wants everything to be in one statement.
And the one most productive thing they could add? Allowing you to specify null behaviour on JOIN USING. (Having to spell out foo.bar IS NOT DISTINCT FROM bar.bar on a join is unobvious and ugly. Something like USING (bar RESPECT NULLS) would be so much nicer.)
Some of my tweets (I should write a blog post):
That wasn't a solution the data scientist in me (which I'm not) was ok with. I ran my high level thoughts about the problem past a data scientist buddy of mine and he said "sounds like you're on the right track", but not being a data scientist it was hard for me to put together a soludion.
I told them that if I was asked in my work capacity to do something like this, I'd probably be reaching for a reporting package like Crystal Reports, but I haven't touched it in ~30 years. "Sure, I get that, just write some code to generate the report".
I had written all the Ansible and related goodies to spin up an instance, get MySQL set up and configured, and figured the "right" solution was there in the SQL arcana. I played with it and played with it, mostly writing off the job because what kind of company judges a sys admin based on building a reporting package? They had set the expectation that the homework should take 2-4 hours, and I chewed on it for longer than that and finally said "thanks but no thanks, this is outside my wheelhouse".
But I kept chewing on the problem. A couple weeks later I had the right set of materialized views and queries and joins that not only solved the problem entirely in SQL, but solved it QUICK.
SQL is amazing, but there are a log of tricks of the trade you've got to have at your fingertips to make it fly. I'm still convinced that a lot of it is just throwing a bunch of things at the wall until something sticks.
But the problem actually mapped well to SQL's capabilities in some ways. The basic algorithm was:
1-Loop through items in descending size order
2-Test X different positions and orientations such that one corner is butted up against an existing corner (of container, or previously placed items, etc.)
3-Choose the winner based on some sort of point systems that encapsulates some set of rules (e.g. preferred orientation, or minimizes small gaps between sides of items, etc.)
These were some aspects that lined up well with SQL:
A-Testing of X number of positions and orientations all with a relative simple statement, using as input: existing corner points from previously placed items, some control table with the variations in X,Y,Z orientations.
B-The ability to maintain a set of previous options for each iteration (e.g. item #2 was tested in these positions with these stats and all are temporarily being considered reasonable options), add new item to each one of those previous tests and accumulate stats. It was pretty easy to configure a "window" of how many different open options were being tracked to optimize the result without overwhelming memory. The SQL to include the options was almost the same as if there was only one previous chosen option.
Some aspects were a bit painful to shift the model mentally to fit in with SQL.
I teach an introductory coding course and I was surprised when one student asked me to give them some reasons why learning to code is useful instead of just using a platform like Wix. A similar question came up again (concerning a different website-building platform) in a different discussion with a family member.
I was kind of shocked that the distinction between a website and a data-driven application seems to have faded out of many people's consciousness. I'm guessing that you can probably add dynamic widgets to Wix and other similar platforms which make it seem like you can build complete apps but they can rarely get you all the way to your goal for a long-term project... And when you hit that wall, you almost need to learn the entire field of computer science from scratch just to get that last 10% of requirements implemented. You go from not having to know anything at all so literally understanding everything about computer science just to get the last 10%; or hire someone (and hope that they have the skills you need).
The most powerful primitive that the dialect has is (mutual) recursion, which is effectively a mechanism for outputting a fixed-point of a query.
https://clickhouse.com/docs/en/integrations/data-formats/tem...
I agree with all the points in the post. It's really not that bad. If I had more time to devote to it, I think I could have reasonably completed more.
Here's my writeup, contains a link to the repo if you want to see some of the soutions.
https://github.com/ty-porter/advent-of-code/tree/master/2024...