EDIT: 10 years ago... lol it was over 20! http://download.nust.na/pub6/mysql/doc/refman/5.0/en/join.ht... search for 5.0.12 on this page.
EDIT: I realized I completely misread the question. I thought it was asking the difference of using a WHERE clause or putting conditionals in the JOIN (see my reply above for an example). The original SO questions is about an old-school CROSS JOIN `table1,table2` (which is a cartesian product in relational algebra) and a JOIN.
Edit 2: "old school" in the sense of joining tables before JOINs existed. I think it was System R the first engine to introduce efficient JOINs vs cross products.
Here are the two tested queries:
Query 1:
SELECT
t.passenger_name,
t.ticket_no,
bp.seat_no
FROM
Flights f
JOIN
Ticket_flights tf ON f.flight_id = tf.flight_id
JOIN
Tickets t ON tf.ticket_no = t.ticket_no
JOIN
Boarding_passes bp ON t.ticket_no = bp.ticket_no AND tf.flight_id = bp.flight_id
WHERE
f.arrival_airport = 'OVB';
Query 2: SELECT
t.passenger_name,
t.ticket_no,
bp.seat_no
FROM
Flights f
JOIN
Ticket_flights tf ON (f.flight_id = tf.flight_id AND f.arrival_airport = 'OVB')
JOIN
Tickets t ON tf.ticket_no = t.ticket_no
JOIN
Boarding_passes bp ON t.ticket_no = bp.ticket_no AND tf.flight_id = bp.flight_id
Then I ran EXPLAIN for both of them and the query plan is THE same. So there's not a big difference at least in Postgres.Here's the GPT conversation: https://i.imgur.com/dIzcfnc.jpeg
It doesn't let me share it because it contains an image
SELECT
t.passenger_name,
t.ticket_no,
bp.seat_no
FROM
Flights f,
Ticket_flights tf,
Tickets t,
Boarding_passes bp
WHERE
f.flight_id = tf.flight_id
AND f.arrival_airport = 'OVB'
AND tf.ticket_no = t.ticket_no
AND t.ticket_no = bp.ticket_no
AND tf.flight_id = bp.flight_id;
SELECT
t.passenger_name,
t.ticket_no,
bp.seat_no
FROM Flights f
JOIN Ticket_flights tf USING (flight_id),
JOIN Tickets t USING (ticket_no),
JOIN Boarding_pass bp USING (ticket_no, flight_id)
WHERE f.arrival_airport = 'OVB';
Additionally, putting joins in the where clause breaks the separation of concerns:
FROM: specify tables and their relationships
WHERE: filter rows
SELECT: filter columns
*according to a LLM. Did you verify this?
Postgres is incredibly easy to spin up in a container to test things like this, and the mentioned schema (Postgres Air) is also freely available.
But if you can't infer from the column name which table they will come from, I find having the option to check far more preferable to that of having no way of knowing.
Not only does writing your code in such a way that it states your intent make it easier to read for other humans, it also makes it easier for compilers/query planners to understand what you're trying to do and turn it into a more efficient process at run-time. Now query planners are usually pretty good at distilling joins from WHERE clauses, but that form does also make it easier for mistakes to creep in that can murder your query performance in subtle and hard-to-debug ways.
Hopefully that's not true. SQL's a declarative language, where you describe what you want and the system figures out how to do it. If you describe exactly the same thing using two different syntaxes, the system shouldn't really do anything different. That just makes the programmer's job harder.
Also if you work a lot with databases you often need to do outer joins, a full cartesian product is almost never what you want. The join syntax is more practical if you need to change what type of join you are performing, especially in big queries.
I'd be curious to know how many SQL people nowadays know what that does.
This is assuming that your SQL variant supports it.
as in
join orders returns on returns.original_order_id = orders.order_id
WHERE <-------- Should order type be in the WHERE clause?
returns.order_type = 'RETURN'
vs join orders returns on returns.original_order_id = orders.order_id
AND <-------- Should order type be in the JOIN clause?
returns.order_type = 'RETURN'
One argument was that JOINs are specifically for tickling indexes and only for structural joins between tables, not for describing the table itself, so it should go in the WHERE clause. ("ALL context should be in the where clause.")One argument was that sometimes a definition is limited by context, so strategically it makes sense to add to the JOIN clause. ("the `returns` alias MUST be limited to return orders, so it should be in the JOIN"; you'd have to adjust your indexes to account for the "order_type".)
I try to keep JOIN logic to only on the keys. I like to think this makes reading the query more linear: now I have all of these tables, what data gets retained/removed?
Imagine you're using SQL fragments. Using JOIN clauses make your code extremely clean:
<sql id="getRecentReturns">
<fragment id="ReturnOrderFragment">
where returns.order_date between sysdate - 30 and sysdate <------- Nice, clean code. No chance bugs from forgetting to add `returns.order_type = "Return"` or `employee.customer_type = "Employee"` or whatever.
</sql>
That said - if the guy making the table didn't make an index that includes your attribute (order_type and customer_type above), don't use it."> ON table1.foreignkey = table2.primarykey
The calculated dataset is the result, after the ON clause.
In other words it's not just that this particular case returns the same result, but that in all cases the result must be the same.
The first one says 'put these two tables together then apply the where conditions that link them together and other stuff'. The second one says 'put these together but pay attention to this on condition when doing so and some other stuff'.
Now the set theory algebra lets the optimizers pick out parts of the where condition and move them up to the matching part. You see that in your query plan. The optimizer will also add other coditions such as what you see with the index pick and the type of matching/filtering it is using.
I personally have used/abbused the on condition to make things more clear things that really only apply to that table and matching.
Now he does ask is it the same with MySQL. Probably but someone would need to look at the query plan it spits out. But my guess is yes.
The question is sort of like asking is x + y = z the same as y + x = z. Well yeah sorta mostly. You can get weird cases where doing the order of operations differently gets you different speed profile results though. But the output is the same. But that is the fun of playing with optimizers and your knowing your platform characteristics.
(Saved you a click)
In 2024.
Really?