Advent of Code 2024 in pure SQL

(databasearchitects.blogspot.com)

473 points | by greghn2 days ago

28 comments

scop2 days ago
I reacted to this title the way I react a new menu item at Taco Bell: a strange mixture of desire, shame, and admiration for human ingenuity.
- qsort2 days ago
  I work a lot with databases and I've seen... stuff. It's not as bad as you might think if you know what you are doing. Most RDBMSs support recursive CTEs, it feels like writing Prolog with a slightly sadistic syntax. For something like AoC the most difficult part is probably parsing the input.
  dunham2 days ago
  Speaking of parsing, back around y2k we were building an app that used XML everywhere, which was the style at the time, and our DBA wanted to write an xml parser in SQL (the api would involve sending XML to the database). That got vetoed.
  IMO, this kind of thing is what AoC is good for - you get to play with weird/obscure stuff without affecting your day job code.
  jamie_ca2 days ago
  I did something with JSON back before there was reasonable native support - it's certainly not robust, but it handled a few syntax variants for a use case where we had an extra attribute column that serialized JSON, and wanted to surface one of the fields as a proper column on the table.
  https://blog.tracefunc.com/2011/11/19/parsing-json-in-sql-ht...
  jmg_2 days ago
  Funnily, I’m actively working on rewriting a stored procedure which parses an XML snippet as one of its arguments
  Luckily it’s not a handwritten XML parser though: https://learn.microsoft.com/en-us/sql/t-sql/functions/openxm...
  yazmeya1 day ago
  Just around the same time I was working at a place that used Oracle's web app extension, with CGI endpoints written completely in PL/SQL. I did end up writing an XML parser/serializer for it.
  youerbt1 day ago
  I do AoC in SQL, I wish it was true. With Postgres, you have lots of regex/string manipulation functions that make it easy.
  For me, the biggest problem was memory. Recursive CTEs are meant to generate tables, so if you are doing some maze traversal, you have to keep every step in memory, until you are done.
  ulrikrasmussen2 days ago
  It is closer to Datalog I think, or can you express cut? CTEs are fairly restricted compared to logic programmming languages though, at least for Postgres. In particular, relations cannot be mutually recursive and your rules may only be linearly recursive in themselves (i.e. can contain only one instance of themselves in the right hand side). Postgres is overly restrictive in the latter and requires at most once recursive reference over all subqueries in the UNION even though it would be safe to only restrict the number of recursive calls for each subquery (each corresponding to a separate Datalog rule for the same relation). It is possible to work around that restriction using a local WITH expression (a hack really), but then you are also on your own since it disables all checks and allows you to write rules which uses actual nonlinear recursion and will give incorrect result sets when evaluated.
  I really would like Postgres to have proper support for writing Datalog queries, and with better and more efficient incremental evaluation algorithms as opposed to the iterative semi-naive algorithm that is only supported now.
  harrall2 days ago
  Haven’t written SQL in a while (and I used to write a lot) but I think SQL Server recursive CTEs are fairly unbounded so it’s just a Postgres limitation unfortunately.
  (I’m a fan of MS SQL but it’s Microsoft and also hard to financially justify for many companies. But if you ever get to use it, it is a very solid RDBMS, even if the rest of your stack is open source.)
  bob10291 day ago
  The cost of MSSQL is largely controlled by how the system is designed and the complexity of the business.
  The model I am most familiar with is a 10-20 employee B2B SaaS startup running one "big" instance on a single vm in the cloud somewhere. If this is approximately all you require, then the cost should not be a dominating factor in your decision.
  I think "because Microsoft" is also really poor justification if we are being serious about the technological capabilities and pursuing high quality business outcomes. If your business is fundamentally about open source advocacy and you are operating as a non profit, I totally get it. But, this is probably not your business model.
  larodi1 day ago
  MS SQL is one fascinating piece of software and much closer to commercial big bro offering such as Oracle and DB2, yet much more user friendly and convenient.
  This sentiment against it really always comes only from people who have not used it or have never touched the enterprise version which is a very mature ecosystem with lotta features available for ages now.
  Most of my career I’ve been dealing with DBs and MSSql is the easiest to admin perhaps being also tightly integrated with all scripting in the platform. It also runs Linux and is doing it better than the rest can say for running Windows.
  UltraSane2 days ago
  A company I worked for uses Syteline ERP which heavily relies on SQL Server. But the DBA was constantly complaining about how slow the Syteline SQL was. One major issue was long running transactions taking 10 minutes locking rows/tables for too long and using a lot of memory. You would think very expensive ERP systems would have decent SQL.
  AdieuToLogic2 days ago
  > Most RDBMSs support recursive CTEs, it feels like writing Prolog with a slightly sadistic syntax.
  Which makes sense as both are declarative logic-based languages. IMHO, SQL and Prolog fundamentally have much in common.
  marxisttemp1 day ago
  I did a semester at the university of Edinburgh and took database systems and logic programming at the same time, and I definitely felt the synergy between them.
  xdavidliu1 day ago
  parsing is most difficult for probably the first third of the problems. when you get to day 19 or so, the input is still just a grid or a bunch of ints just like day 1, but the algorithms required are considerably more challenging than the parsing part. (I've done all 25 problems in all years)
  emmelaich2 days ago
  Thanks for that comment.
  I laughed aloud at "It's not as bad as you might think if you know what you are doing."
  ... because that pretty much describes all human activity :-)
- brettgriffin2 days ago
  I'm as equally amazed by the solutions in this post's github repo as I am with Taco Bell's new chicken nuggets.
  upghost1 day ago
  ⊂ •͡˘∠•͡˘ ⊃
  Suspicious. Need to investigate if taco bell has interesting ANSI SQL flavored chicken nuggets or I've been taken for a fool!
- 2 days ago
  undefined
- larodi2 days ago
  but why? what would make you react at human ingenuity with shame and desire? is this something about you or something about them in particular? isnt the whole of HN about human ingenuity...?
  are we to feel Taco Bell menu about it all, what am I missing?
  knicholes2 days ago
  This is just a guess, but if the OP's reason is similar to mine, DBMSs should be reserved for managing databases and not implementing complex logic.
  xarope2 days ago
  In the past (why do I feel so old when I say this?), DBMSs WERE used to implement complex logic. Not just complex business logic, but even authentication, authorization, etc were implement with stored procs, embedded sql, native DB features.
  The cycle in tech will shift from one side to another; last decade was a lot of no-sql where all this logic and implementation had to reside on the application side. I'm seeing a shift back to DBs again (e.g. supabase), and the reality is that it's probably a continuum somewhere in-between, depending on your requirements (as always, the answer is "Yes, but it depends").
  Remember, these are all "just" tools.
  harrall2 days ago
  SQL is like regex.
  A small fraction of people bothered to learn how they work and know how and when to use them properly.
  Everyone else just perpetuates a giant in-joke of “it’s black magic!”
  lelanthran1 day ago
  I do line of business apps right now.
  It seems to me that we have the abstraction inverted when it comes to ORMs, which is why so many experienced devs dont like ORMs (but can't really articulate why).
  Here's my take, in the context of business logic:
  The schema represents the business rules. The ORM papers over the DB so that the code doesn't have to match the business rules. Then the dev implements those rules again in code, but poorly.
  If you simply skip the programming language using something like Postgrest you end up with a declarative structure of the business rules.
  The first problem is that most developers would rather hunt a bug or perform a modification on a 30k Java or C# or python or ruby program than in 5k of SQL.
  The next problem is that tooling for SQL is crap. It's declarative so good luck stepping through the logic. The dialects are all different so editors aren't great at spotting errors.
  The last major problem is that changing it is hard - you can't simply change the schema and redeploy because migrations have to be made on the existing data.
  All of these are surmountable in some ways, but they certainly ain't easy.
  Doing LoB apps have opened my eyes a lot: maybe 99% of logic is shorter when expressed in SQL. Of course, that means that the only use of an ORM in this case is connecting to the DB, sending the query, and sending back results.
  liontwist2 days ago
  Why not? I think the main barrier is the programmers are not comfortable with recursion, rather than a technical limitation.
  FridgeSeal2 days ago
  I’d hazard a guess that quite a few devs have at one point, been lumped with a db where someone pushed too much logic into it. Probably great for the one person who wrote it, and awful for everyone else.
  Mystery triggers, unclear or badly managed stored procedures, opaque invariants. It’s not to say that all of these things are bad, but a certain level of DB complexity will naturally set off alarm bells for some devs.
  scop2 days ago
  One of the most eye opening moments in my junior dev career was when I found a really nifty way to achieve an operation using SQL alone and brought it to our VP (start up, small company size). It wasn’t esoteric, but used some SQL features not well-trodden by web devs. I thought I was the bees knees and this would be a clear demonstration of my sizable brain. He complimented the thought behind it, but immediately vetoed it as too complex for the majority of developers who would have to touch the code down the road. In hindsight he was completely right.
  liontwist2 days ago
  I don’t think this is a win. Cautioning against adding new technology to learn and maintain is one thing. But not using an effective solution in your toolset because it’s too hard is absurd.
  Our Industry pretends solutions must be immediately understandable with common sense, without study or explanation. But nothing that actually matters does this (relational databases in the first place).
  No other engineering discipline has this confusion. The world has complexity and it’s our job to understand it.
  liontwist2 days ago
  SQL tends to be very dense (more thinking than typing).
  It is tedious to specify all that information (access, constraints, foreign key, cascading, views), but it’s all essential information you would end up specifying in another way.
phartenfeller2 days ago
Nicely done. I know this seems crazy at first but in my opinion big SQLs are one of the best ways to store complexity.
The problem being complex is the issue. SQL is a standard, condensed, extremely performant, actually testable, and logical language. Sure, not anybody can instantly maintain it but that would be the same as if it was a lot of lines and functions in Java. The more lines the more risk for bugs.
I also love how deep SQL goes. And that makes totally sense. It powers the world of data since 40+ years so of course people asked for niche features. One of my favorite is the model clause in Oracle with which you can implement multidimensional arrays. A friend implemented Conway's game of life with it in way less lines you expect.
- sharpy1 day ago
  Once upon a time, as an intern, I had the 'fun' task of optimizing the performance of a stored procedure written by someone with a math phd. It was more than 6 pages when printed, and took more than 30 minutes to run (it was used in billing), and had no tests.
  Ended up rewriting it in native code, and it run in less than a second. Most of the work was proving it produced same results... and writing and documenting test cases, so next person wouldn't have to go through that.
  After that experience, I have generally avoided putting a lot of business logic in SQL...
  phartenfeller23 hours ago
  I definitely see how this happens. SQL is easy to start with but sup hard to master. And it requires good data models.
  But if both are fine there is no way that any native code will be faster than a database. No network latency, set operations, index usage, etc. DBs have all the the info to make data access fast.
  sitkack1 day ago
  I wrote a 120 line sql procedure that replaced a 8k line java royalty payment processor.
- haolez1 day ago
  I get this feeling as well, but then again debugging large SQL queries can be very opaque. Things like pl/pgsql help, but then it starts becoming more and more like a normal programming language.
- diggan1 day ago
  > but in my opinion big SQLs are one of the best ways to store complexity.
  It seems crazy at first, and then I continue thinking about it and it still seems crazy for wanting complexity to be in SQL.
  Personally, I want anything complex to ideally easily testable, both manually and automatically. SQLs is easy to test manually, but harder to test automatically, especially compared to lines of code in a programming language. At least you can somewhat untangle balls of spaghetti code into less dense, then attack parts of it. How would you deal with a tangled ball of SQL spaghetti?
  > The more lines the more risk for bugs.
  I don't fully agree with this either, not all lines are equal. One big fat line of a 400 character long SQL query has higher chance of containing issues not easy to glance compared to 400 lines of Java code, and I say this even as a person who despises Java for various reasons.
- NDizzle1 day ago
  > in my opinion big SQLs are one of the best ways to store complexity.
  Only if, ONLY IF, you have a lot of people that are well versed in SQL. It's very easy to write bad SQL. It's difficult to unravel thousands of lines of bad SQL spread across hundreds of procedures / views / functions. Ask me how I know...
trjordan2 days ago
If you like this kind of degeneracy, I tried AoC in Google Sheets this year.
I only made it to Day 6, and not even both stars every day. I'm pretty confident my Day 7 solution is correct, but I hit per-cell character limits on longer inputs.
Enjoy :) oh but don't open it on mobile, some sheets crash the app
https://docs.google.com/spreadsheets/d/10FY-89y19tnRM_EAAnCd...
- hu31 day ago
  I'm on my phone so I can't open it now.
  Does it use Google App Script? Would be a way to gain extra power I think.
  diggan1 day ago
  > Does it use Google App Script? Would be a way to gain extra power I think.
  Isn't that just straight up JavaScript at that point? Feels kind of like cheating if the goal is to complete AoC with a spreadsheet.
- tnvmadhav2 days ago
  holy sh*t, I kneel.
SoftTalker2 days ago
Over my career I've certainly written more SQL than any other type of code. Not so much in the last five years so I'm sure I've lost some of it, but I used to really enjoy it. Once you stop thinking iteratively and start thinking in set operations it becomes quite natural and powerful.
- bob10292 days ago
  Over the years I have been pushing more and more responsibilities into the RDBMS. I now see things mostly in terms of ETL, SQL and schema. Virtually every conversation I've ever had about the application of technology to the business could be expressed in these terms. Business logic defined as SQL queries can be quite intuitive when the schema is structured well and aligned to the business stakeholders' perspectives.
  Code, frameworks, ORMs, "best practices", patterns, et. al. are ultimately a distraction. There are a million ways to get the data in & out of the database. Moving the bits around is not valuable. There are many overblown software solutions out there that could have been a simple merge statement or CSV import job.
  I think that a lot of the misconceptions and bad moods about SQL come out of being forced to work with nasty schemas. The language itself is really domain specific. Consider that one wouldn't complain as much about a super fucked up nested query (and resulting SQL syntax woes) if it wasn't necessary to write such a query in the first place. Aligning tuples and relations to the way the business typically talks about them means you will be less likely to be fighting these things over time. Often, it isn't possible to refactor the schema from zero, but you can put replicas/views around a "bad" schema and target it with your new development & refactors.
  fifilura2 days ago
  > Consider that one wouldn't complain as much about a super fucked up nested query (and resulting SQL syntax woes) if it wasn't necessary to write such a query in the first place.
  And in "modern" SQL this is solved with CTEs. Use them to unwrap the schema the way you want it first, before implementing the business logic.
  larodi2 days ago
  this is poor man's SQL to unwrap for business logic. SQL is such more more about everything else and so litte about the trouble with mapping business logic into storage.
  bbkane2 days ago
  Could you please add some examples? I'm still living in poor man's SQL land
  solumunus2 days ago
  Please elaborate.
  larodi1 day ago
  CTEs can indeed align mismatched and denormalised storage to some E/R which follows some business logic. That much you can say. But CTEs allow for recursion and save time when you need complexity but you want to follow a precise step-by-step reasoning.
  This dis not arise as a need to solve business mess, but as a need to skip DDL for views/mviews.
  You can have arbitrary dimensions sliced in CTEs which does not immediately imply a goal of business schema alignment.
  Besides the top SQL devs I’ve met don’t lose time to align schema at all, but write the DB table names and columns as they are, because well… because they were usually the people who created the mess this way.
  Many reports such as month-to-month increase of sales or other statistical stuff is much more readable when implemented with CTes. Besides - older versions of DB software didn’t always have window functions or proper ranking, so problems such as top-n were not (and in some occasions still are not) trivial to write unless view/cte is used. We talking pages of single query here, not textbook examples.
  So is really about dice and slice in a convenient way, but less about business schema or the ideal E/R
  fifilura1 day ago
  Maybe I misunderstood, and if I did it makes for an interesting discussion, but your explanation is mainly about what CTEs can be used for not what they can not be used for.
  In my experience, normalizing tables is mostly for simplifying inserts. When you write business logic it is rather the opposite, you want non-normalized data because this is where the data is all in one place. And this de-normalization is a great use of CTEs. As well as a general cleanup of the input data, which was my original point.
  solumunus19 hours ago
  That makes sense. I misunderstood your OP.
  fifilura17 hours ago
  I am happy you understood! I still feel this discussion is above my head, so if you are still there i would love an explanation!
- fifilura2 days ago
  I second all of that!
  I wish more people would see the beauty. After a session of SQL, when I take a step back and think.
  "Hold on. What I have been doing lately is just pure logic. No library dependency resolution, no concurrency problems (even though massive concurrency is certainly under the hood). No mutability issues. Just logic."
  SQL obviously has its warts, some of them serious, like testability. But at the end of the day, I wish all programming was like that. Let the computer decide how to do stuff under the hood. And let the human focus on the logic.
  I have somewhat half-assed tried to read up on Prolog for trying to take it to the next level, but failed sofar unfortunately. (It was also a goal to try to unlearn some SQL to avoid getting stuck in some local optimum). Maybe somewhere between SQL and Prolog is the future of programming.
  Rendello2 days ago
  > Maybe somewhere between SQL and Prolog is the future of programming.
  Must be Datalog then ;)
  coliveira2 days ago
  Prolog is very powerful, if you see what professionals can do with it it's eye opening. Unfortunately, it takes a complete relearning of programming to achieve that proficiency level. And after you reach it, you probably cannot use it in your day job...
  arkh1 day ago
  My problem with Prolog is every time I want to start using it I feel like I'm populating a database then doing queries on it. So it feels like SQL with another syntax and less power.
  I'm sure I'm wrong and missing something but that's where I stop.
  coliveira1 day ago
  Prolog is in fact a kind of database. However it is untrue that it has less power than SQL, just the opposite. Probably it feels harder to use because you're already so used to how SQL works.
  diggan1 day ago
  > I feel like I'm populating a database then doing queries on it. So it feels like SQL
  I'm not sure what database technology/data storage that doesn't involve doing those two things to get started.
  I haven't done any Prolog, but Datalog which is similar and focused on querying data, and the benefits of Datalog for me is that you can write complex/medium-complex queries a lot easier than in SQL, at least for me. Simpler queries are just less characters, but pretty much the same beyond that.
  Probably helps that the Clojure ecosystem embraced Datalog so it mostly feels like writing Clojure code although it's really Datalog. You don't get that same feeling regardless of what SQL library/DSL you use.
  larodi2 days ago
  > Maybe somewhere between SQL and Prolog is the future of programming.
  it was, it most probably is
- crazygringo2 days ago
  > Once you stop thinking iteratively and start thinking in set operations it becomes quite natural and powerful.
  I dunno... I've written a tremendous amount of SQL, and I still have to think imperatively (iteratively) in order to write queries that are actually performant, and to know which indexes need to exist.
  It would be wonderful if I could just think in terms of set operations, but that tends to result in queries that take 5 minutes to execute rather than 5 milliseconds.
  My entire thought process is basically -- what table do I start with, what rows in what order, joining to what, under what conditions, aggregating how, rinse and repeat... It's entirely a mental model of loops and aggregation, never of set operations.
  fifilura2 days ago
  It may be true, until you do your ETL in an index-less database such as BigQuery or Trino. Postgres will always be faster for optimized, end user serving, queries.
  But BigQuery allows you to scale it to 100s of CPUs without having to worry about indexes.
  liontwist2 days ago
  This sounds awful.
  I would do almost any amount of iteration and index tuning to keep the query on a single machine rather than deal with a networked distributed system.
  When you get slow queries the real problem is algorithmic complexity and linear workers only can do so much,
  RobinL2 days ago
  You can use duckdb on a single machine. It's also indexless (or more accurately, you don't have to explicitly create indexes)
  fifilura2 days ago
  You are rightfully proud of your skills!
  crazygringo2 days ago
  Yes, I'm talking about end user queries. Not reports that take 2 hours to run.
  But even with BigQuery, you've still got to worry about partioning and clustering, and yes they've even added indexes now.
  The only time you really just get to think in sets, is when performance doesn't matter at all and you don't mind if your query takes hours. Which maybe is your case.
  But also -- the issue isn't generally CPU, but rather communications/bandwidth. If you're joining 10 million rows to 10 million rows, the two biggest things that matter are whether those 10 million rows are on the same machine, and whether you're joining on an index. The problem isn't CPU-bound, and more CPU's isn't going to help much.
  fifilura2 days ago
  Of course there are optimizations to be made, such as not joining on the raw data or saving the order by to last. And avoid outer joins between two large sized partitioned tables.
  But to me those optimizations are not imperative in nature.
  (And BQ will probable eat the 10 million to 10 million join for breakfast...)
- griomnib2 days ago
  Being able to master the theoretical, practical, and skill-based components of designing a good database schema is the absolute truest test of understanding any systems design.
  People skip ahead to all kinds of nonsense; but most of software engineering is putting the right data into the right format, and moving it around reliably.
  I just did a major refactor of a complex distributed code base. I pretty much only count the work I did on schema re-design as the actual “job”, the rest was many hours of coding, but that’s really just implementation.
  There are other ways to define schema than SQL of course, but it’s really the perfect way to learn true systems engineering.
  SoftTalker2 days ago
  Very true. My manager at one of my first jobs liked to say "get the data model right and everything else will be easy" and that has largely been proven true in my experience (and it even applies if you're not using an RDBMS).
- maCDzP2 days ago
  SQL clicked once I read the original paper and it explained it terms of sets.
- pjmlp2 days ago
  Yeah, I also kind of like coding in SQL, with PL/SQL being my favourite extension language, which is kind of heresy in HN, but whatever.
dagss2 days ago
I have been writing a ton of SQL -- implementing a lot of the business logic of a (stream processing) application in it. I really really like it, especially that I bring the computation to the data instead of the data to the computation.
I often talk to developers who hate that idea though. They want me to instead move all the data to the backend, for a massive IO hit, just so that the computations can be expressed in a "real" programming language.
I think SQL the concept is good but SQL the language is the problem. There are so many awkward things in it -- not strange as it has had no competition in 40(?) years!! The mental model of the program I write is fine but I really need to overlook the syntax and to the program I am really writing to see any elegance...
What we need I think is a properly designed programming language, designed for existing databases (Postgres, MSSQL) compiling to SQL dialects. I see some contenders but they all have some attachment to a special domain, such as not allowing data modifications (PreQL) or being associated with other databases.
Itching to do it myself, but it's a lot of work and a long long road to any adoption with no guarantee of success, and no revenue stream that I can think of.
The most popular backend languages were made by large companies, but I think coding in SQL is in a catch-22 where it will be frowned upon until there is a better language and no better language until it is more popular..
- willvarfar2 days ago
  Me too me too :D
  There's a lot that is very right about SQL, but a few clunky bits around the edges.
  CTEs made a lot of difference, and window functions - which can be a bit head bending - made difficult things a tiny bit easier.
  I'm using BigQuery, which supports structs and arrays, but only recently allowed arrays to be grouped by, although there is still no equality check etc.
  BigQuery is slowly adding more sugar, like aggregate UDFs and polymorphic UDFs using ANY TYPE parameters etc, and I find myself putting more reused logic into tidy functions, but my pet want is for temporary functions to be declared and scoped like CTEs so they integrate a lot better into tooling like DBT that wants everything to be in one statement.
  And the one most productive thing they could add? Allowing you to specify null behaviour on JOIN USING. (Having to spell out foo.bar IS NOT DISTINCT FROM bar.bar on a join is unobvious and ugly. Something like USING (bar RESPECT NULLS) would be so much nicer.)
- wink1 day ago
  I can't put my finger on it but I think many people see this as two operating modes, for lack of a better term. The more monolithic and enterprisey your solution (and with bespoke DBMS), the more it leans towards anything more complex than an index and maybe a couple triggers - and the more microservices-y where every small service owns its own database (and only half of them are RDBMS) the less complex code is desired in the DB itself, because you're also migrating off of single instances/clusters a lot (just taking a relatively dumb data dump with you, or even just adding new replicas, ship-of-theseus-like).
- 3921 day ago
  PRQL is awesome. There's a similar competitor whose name I can't find now.
pearjuice2 days ago
Doing it in pure SQL is really impressive but I think the real tell-tale sign of peak "cracked engineer energy" is the maintained, decade-old blogspot site. Can't exactly put my finger on it, but really gives off "niche mastery". I don't even know the authors but I'm sure in the right circles a few dudes maintaining a blogspot site called "database architects" for a decade probably don't need an introduction.
- Moosturm2 days ago
  Those authors are the elite of DB mastery.
- 2 days ago
  undefined
1st12 days ago
FWIW I tried doing AoC for a few days with EdgeQL, and the experience proved to be quite interesting.
Some of my tweets (I should write a blog post):
- https://x.com/1st1/status/1864069589245858083
- vs SQL https://x.com/1st1/status/1864412869108092997
mbowcut22 days ago
Absolutely horrifying. Well done.
- teeray2 days ago
  "Thanks, I hate it"
sceadu2 days ago
for those who don't know, the author is one of the best DB researchers in the world
- beoberha2 days ago
  Thomas Neumann doing Thomas Neumann things
linsomniac2 days ago
Long ago I was interviewing for an operations job, and their "leetcode" interview question was to create an invoice report of some fairly large public data-set. Because of the size of the dataset and the way they wanted it sliced and diced, it wasn't just a straightforward set of JOINS, and they clearly wanted you to do some simple SQL queries and then have some code loops that sliced and diced it all.
That wasn't a solution the data scientist in me (which I'm not) was ok with. I ran my high level thoughts about the problem past a data scientist buddy of mine and he said "sounds like you're on the right track", but not being a data scientist it was hard for me to put together a soludion.
I told them that if I was asked in my work capacity to do something like this, I'd probably be reaching for a reporting package like Crystal Reports, but I haven't touched it in ~30 years. "Sure, I get that, just write some code to generate the report".
I had written all the Ansible and related goodies to spin up an instance, get MySQL set up and configured, and figured the "right" solution was there in the SQL arcana. I played with it and played with it, mostly writing off the job because what kind of company judges a sys admin based on building a reporting package? They had set the expectation that the homework should take 2-4 hours, and I chewed on it for longer than that and finally said "thanks but no thanks, this is outside my wheelhouse".
But I kept chewing on the problem. A couple weeks later I had the right set of materialized views and queries and joins that not only solved the problem entirely in SQL, but solved it QUICK.
SQL is amazing, but there are a log of tricks of the trade you've got to have at your fingertips to make it fly. I'm still convinced that a lot of it is just throwing a bunch of things at the wall until something sticks.
RaftPeople2 days ago
I wrote a cubing/containerization system in SQL, but it did use the additional capabilities in a sproc for looping etc., so not just a single SQL stmt.
But the problem actually mapped well to SQL's capabilities in some ways. The basic algorithm was:
1-Loop through items in descending size order
2-Test X different positions and orientations such that one corner is butted up against an existing corner (of container, or previously placed items, etc.)
3-Choose the winner based on some sort of point systems that encapsulates some set of rules (e.g. preferred orientation, or minimizes small gaps between sides of items, etc.)
These were some aspects that lined up well with SQL:
A-Testing of X number of positions and orientations all with a relative simple statement, using as input: existing corner points from previously placed items, some control table with the variations in X,Y,Z orientations.
B-The ability to maintain a set of previous options for each iteration (e.g. item #2 was tested in these positions with these stats and all are temporarily being considered reasonable options), add new item to each one of those previous tests and accumulate stats. It was pretty easy to configure a "window" of how many different open options were being tracked to optimize the result without overwhelming memory. The SQL to include the options was almost the same as if there was only one previous chosen option.
Some aspects were a bit painful to shift the model mentally to fit in with SQL.
ilya_m1 day ago
I recommend Frank McSherry's real-time log of working through AoC 2023 using a SQL dialect:: https://github.com/frankmcsherry/blog/blob/master/posts/2024...
The most powerful primitive that the dialect has is (mutual) recursion, which is effectively a mechanism for outputting a fixed-point of a query.
tveita2 days ago
I found Clickhouse easier to handle than pure SQL for this - as an analytics database it has conveniences like parsing input files with regexp, user defined functions and array functions. It's often possible to treat it more like an array programming language. But I still sometimes couldn't avoid the tuple explosion problem.
https://clickhouse.com/docs/en/integrations/data-formats/tem...
- zX41ZdbW1 day ago
  I'd like to check what is the tuple explosion problem.
pawptart2 days ago
I was crazy enough to try this as well this year. It would be an extreme stretch to consider me a SQL expert, but I did make it to Day 14.
I agree with all the points in the post. It's really not that bad. If I had more time to devote to it, I think I could have reasonably completed more.
Here's my writeup, contains a link to the repo if you want to see some of the soutions.
https://github.com/ty-porter/advent-of-code/tree/master/2024...
neriymus2 days ago
Slightly related, here's one I saw being solved using Cypher: https://www.linkedin.com/posts/halfterpierre_adventofcode-di...
pavel_lishin2 days ago
I've been slowly working through AoC2024 in Google Sheets. I should really write up what I've been up to.
CommanderData2 days ago
I'm angry but in a good way.
datadrivenangel2 days ago
The data parsing must have been painful. High wizardry.
- taberiand2 days ago
  I feel like it'd be entirely acceptable to load the inputs into tables first and still qualify as pure sql, because string parsing in sql is so blergh
  fifilura2 days ago
  Input in AoC is always well-formed. And you can always use a regexp? Seems like the smallest problem to me. As soon as you get past that recursive one row per line trick in the posted solutions
  taberiand2 days ago
  Obviously it's possible, particularly in this context, but generally speaking I'd rather avoid it.
- fifilura2 days ago
  The data parsing is just one recursive query to read one line at a time, and it is done pretty much the same way in each problem.
hobs2 days ago
When you realize recursive SQL allows iterative calculation then you can pretty much do whatever you want, no matter how bad.
thehours2 days ago
Does 'pure SQL' have a specific definition (or reference spec)? The author doesn't mention it, except to say he tested it across Umbra, Postgres, and DuckDB. Even then, some days weren't 'supported by DuckDB due to a missing xor operator'.
- grumple2 days ago
  There is an ISO spec (you have to pay for access): https://www.iso.org/standard/76583.html
forgot_old_user2 days ago
haha I couldn't help but silently mouth "BUT WHHYY" before I clicked this link
yakshaving_jgt2 days ago
A former colleague did the same with most of AoC 2020 a few years ago.
https://github.com/Jell/advent2020
drewbitt2 days ago
Insanity. I wonder how much cleaner it could look in EdgeQL.
- 1st12 days ago
  Somewhat, depending on what you appreciate in languages. See my other comment in this thread: https://news.ycombinator.com/item?id=42578562
superdogsniffer1 day ago
Day 19 fails the actual test input. I guess nobody checked his work.
- forgotpwd161 day ago
  Tried both test & full input and worked (using duckdb v0.10.2) fine.
nostradumbasp2 days ago
It takes an incredible human to do something like this. Pure art. Not enough of that in the programming world.
- apavlo2 days ago
  Thomas is the greatest database systems researcher in the world. He is an incredible human.
spiralshape2 days ago
Great work showcasing SQL’s versatility!
ikiris2 days ago
This is impressive. Well done. I hate it
jongjong1 day ago
The problem with simple silver bullet solutions is that the best ones can handle maybe 80% of reasonable requirements... But it's absolutely impossible to solve any problem which happens to fall in the remaining 20% of problems. This fact is as true as ever and yet marketing of silver bullets is at an all-time high.
I teach an introductory coding course and I was surprised when one student asked me to give them some reasons why learning to code is useful instead of just using a platform like Wix. A similar question came up again (concerning a different website-building platform) in a different discussion with a family member.
I was kind of shocked that the distinction between a website and a data-driven application seems to have faded out of many people's consciousness. I'm guessing that you can probably add dynamic widgets to Wix and other similar platforms which make it seem like you can build complete apps but they can rarely get you all the way to your goal for a long-term project... And when you hit that wall, you almost need to learn the entire field of computer science from scratch just to get that last 10% of requirements implemented. You go from not having to know anything at all so literally understanding everything about computer science just to get the last 10%; or hire someone (and hope that they have the skills you need).