Что быстрее inner join или left join
Перейти к содержимому

Что быстрее inner join или left join

  • автор:

Что быстрее: данные из подзапроса или из LEFT JOIN

Реализую двухступенчатый поиск в большой БД на MySQL: сначала беглый по хэшу, затем детальный по всем полям. Какой из следующих вариантов SQL-запроса будет работать быстрее? Индекс в таблице составной из ID и хэша (хотя ID у всех уникальный, хэш только для скорости обработки). 1) Передавать данные напрямую из вложенного запроса:

SELECT AS matched_2 FROM ( SELECT ID, , AS matched_1 FROM table HAVING matched_1 > 0.5 ) AS basic HAVING matched_2 > 0.7 ORDER BY matched_2 

2) Или же на внешний приделать LEFT JOIN:

SELECT AS matched_2 FROM ( SELECT ID, AS matched_1 FROM table HAVING matched_1 > 0.5 ) AS t1 LEFT JOIN ( SELECT ID, FROM table ) AS t2 ON t1.ID = t2.ID HAVING matched_2 > 0.7 ORDER BY matched_2 

Любые другие советы по оптимизации очень приветствуются.

LEFT JOIN Significantly faster than INNER JOIN

I have a table ( MainTable ) with a bit over 600,000 records. It joins onto itself via a 2nd table ( JoinTable ) in a parent/child type relationship:

SELECT Child.ID, Parent.ID FROM MainTable AS Child JOIN JoinTable ON Child.ID = JoinTable.ID JOIN MainTable AS Parent ON Parent.ID = JoinTable.ParentID AND Parent.SomeOtherData = Child.SomeOtherData 

I know that every child record has a parent record and the data in JoinTable is acurate. When I run this query it takes literally minutes to run. However if I join to Parent using a Left Join then it takes < 1 second to run:

SELECT Child.ID, Parent.ID FROM MainTable AS Child JOIN JoinTable ON Child.ID = JoinTable.ID LEFT JOIN MainTable AS Parent ON Parent.ID = JoinTable.ParentID AND Parent.SomeOtherData = Child.SomeOtherData WHERE . [some info to make sure we don't select parent records in the child dataset]. 

I understand the difference in the results between an INNER JOIN and a LEFT JOIN . In this case it is returning exactly the same result as every child has a parent. If I let both queries run, I can compare the datasets and they are exactly the same. Why is it that a LEFT JOIN runs so much faster than an INNER JOIN ? UPDATE Checked the query plans and when using an inner join it starts with the Parent dataset. When doing a left join it starts with the child dataset. The indexes it uses are all the same. Can I force it to always start with the child? Using a left join works, it just feels wrong. Similar questions have been asked here before, but none seem to answer my question. e.g. the selected answer in INNER JOIN vs LEFT JOIN performance in SQL Server says that Left Joins are always slower than Inner joins. The argument makes sense, but it’s not what I’m seeing.

LEFT JOIN Significantly faster than INNER JOIN

I have a table ( MainTable ) with a bit over 600,000 records. It joins onto itself via a 2nd table ( JoinTable ) in a parent/child type relationship:

SELECT Child.ID, Parent.ID FROM MainTable AS Child JOIN JoinTable ON Child.ID = JoinTable.ID JOIN MainTable AS Parent ON Parent.ID = JoinTable.ParentID AND Parent.SomeOtherData = Child.SomeOtherData 

I know that every child record has a parent record and the data in JoinTable is acurate. When I run this query it takes literally minutes to run. However if I join to Parent using a Left Join then it takes < 1 second to run:

SELECT Child.ID, Parent.ID FROM MainTable AS Child JOIN JoinTable ON Child.ID = JoinTable.ID LEFT JOIN MainTable AS Parent ON Parent.ID = JoinTable.ParentID AND Parent.SomeOtherData = Child.SomeOtherData WHERE . [some info to make sure we don't select parent records in the child dataset]. 

I understand the difference in the results between an INNER JOIN and a LEFT JOIN . In this case it is returning exactly the same result as every child has a parent. If I let both queries run, I can compare the datasets and they are exactly the same. Why is it that a LEFT JOIN runs so much faster than an INNER JOIN ? UPDATE Checked the query plans and when using an inner join it starts with the Parent dataset. When doing a left join it starts with the child dataset. The indexes it uses are all the same. Can I force it to always start with the child? Using a left join works, it just feels wrong. Similar questions have been asked here before, but none seem to answer my question. e.g. the selected answer in INNER JOIN vs LEFT JOIN performance in SQL Server says that Left Joins are always slower than Inner joins. The argument makes sense, but it’s not what I’m seeing.

INNER JOIN vs LEFT JOIN performance in SQL Server

I’ve created SQL command that uses INNER JOIN on 9 tables, anyway this command takes a very long time (more than five minutes). So my folk suggested me to change INNER JOIN to LEFT JOIN because the performance of LEFT JOIN is better, despite what I know. After I changed it, the speed of query got significantly improved. I would like to know why LEFT JOIN is faster than INNER JOIN? My SQL command look like below: SELECT * FROM A INNER JOIN B ON . INNER JOIN C ON . INNER JOIN D and so on Update: This is brief of my schema.

FROM sidisaleshdrmly a -- NOT HAVE PK AND FK INNER JOIN sidisalesdetmly b -- THIS TABLE ALSO HAVE NO PK AND FK ON a.CompanyCd = b.CompanyCd AND a.SPRNo = b.SPRNo AND a.SuffixNo = b.SuffixNo AND a.dnno = b.dnno INNER JOIN exFSlipDet h -- PK = CompanyCd, FSlipNo, FSlipSuffix, FSlipLine ON a.CompanyCd = h.CompanyCd AND a.sprno = h.AcctSPRNo INNER JOIN exFSlipHdr c -- PK = CompanyCd, FSlipNo, FSlipSuffix ON c.CompanyCd = h.CompanyCd AND c.FSlipNo = h.FSlipNo AND c.FSlipSuffix = h.FSlipSuffix INNER JOIN coMappingExpParty d -- NO PK AND FK ON c.CompanyCd = d.CompanyCd AND c.CountryCd = d.CountryCd INNER JOIN coProduct e -- PK = CompanyCd, ProductSalesCd ON b.CompanyCd = e.CompanyCd AND b.ProductSalesCd = e.ProductSalesCd LEFT JOIN coUOM i -- PK = UOMId ON h.UOMId = i.UOMId INNER JOIN coProductOldInformation j -- PK = CompanyCd, BFStatus, SpecCd ON a.CompanyCd = j.CompanyCd AND b.BFStatus = j.BFStatus AND b.ProductSalesCd = j.ProductSalesCd INNER JOIN coProductGroup1 g1 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup1Cd ON e.ProductGroup1Cd = g1.ProductGroup1Cd INNER JOIN coProductGroup2 g2 -- PK = CompanyCd, ProductCategoryCd, UsedDepartment, ProductGroup2Cd ON e.ProductGroup1Cd = g2.ProductGroup1Cd 

872 3 3 gold badges 16 16 silver badges 34 34 bronze badges
asked Apr 28, 2010 at 3:36
9,468 24 24 gold badges 84 84 silver badges 134 134 bronze badges

Do you project any attribute from coUOM ? If not you may be able to use a semi join. If yes, you would be able to use UNION as an alternative. Posting just your FROM clause is inadequate information here.

Dec 15, 2011 at 9:44
I’ve wondered this so often (because I see all the time).
Mar 16, 2013 at 22:44

Did you miss out an Order By in your brief schema? I just recently faced an issue where changing an INNER JOIN to LEFT OUTER JOIN speeds up the query from 3 minutes to 10 seconds. If you really have Order By in your query, I will explain further as an answer. It looked like all the answers didn’t really explain the case that I faced.

Oct 12, 2015 at 10:55

9 Answers 9

A LEFT JOIN is absolutely not faster than an INNER JOIN . In fact, it’s slower; by definition, an outer join ( LEFT JOIN or RIGHT JOIN ) has to do all the work of an INNER JOIN plus the extra work of null-extending the results. It would also be expected to return more rows, further increasing the total execution time simply due to the larger size of the result set.

(And even if a LEFT JOIN were faster in specific situations due to some difficult-to-imagine confluence of factors, it is not functionally equivalent to an INNER JOIN , so you cannot simply go replacing all instances of one with the other!)

Most likely your performance problems lie elsewhere, such as not having a candidate key or foreign key indexed properly. 9 tables is quite a lot to be joining so the slowdown could literally be almost anywhere. If you post your schema, we might be able to provide more details.

Edit:

Reflecting further on this, I could think of one circumstance under which a LEFT JOIN might be faster than an INNER JOIN , and that is when:

  • Some of the tables are very small (say, under 10 rows);
  • The tables do not have sufficient indexes to cover the query.

Consider this example:

CREATE TABLE #Test1 ( ID int NOT NULL PRIMARY KEY, Name varchar(50) NOT NULL ) INSERT #Test1 (ID, Name) VALUES (1, 'One') INSERT #Test1 (ID, Name) VALUES (2, 'Two') INSERT #Test1 (ID, Name) VALUES (3, 'Three') INSERT #Test1 (ID, Name) VALUES (4, 'Four') INSERT #Test1 (ID, Name) VALUES (5, 'Five') CREATE TABLE #Test2 ( ID int NOT NULL PRIMARY KEY, Name varchar(50) NOT NULL ) INSERT #Test2 (ID, Name) VALUES (1, 'One') INSERT #Test2 (ID, Name) VALUES (2, 'Two') INSERT #Test2 (ID, Name) VALUES (3, 'Three') INSERT #Test2 (ID, Name) VALUES (4, 'Four') INSERT #Test2 (ID, Name) VALUES (5, 'Five') SELECT * FROM #Test1 t1 INNER JOIN #Test2 t2 ON t2.Name = t1.Name SELECT * FROM #Test1 t1 LEFT JOIN #Test2 t2 ON t2.Name = t1.Name DROP TABLE #Test1 DROP TABLE #Test2 

If you run this and view the execution plan, you’ll see that the INNER JOIN query does indeed cost more than the LEFT JOIN , because it satisfies the two criteria above. It’s because SQL Server wants to do a hash match for the INNER JOIN , but does nested loops for the LEFT JOIN ; the former is normally much faster, but since the number of rows is so tiny and there’s no index to use, the hashing operation turns out to be the most expensive part of the query.

You can see the same effect by writing a program in your favourite programming language to perform a large number of lookups on a list with 5 elements, vs. a hash table with 5 elements. Because of the size, the hash table version is actually slower. But increase it to 50 elements, or 5000 elements, and the list version slows to a crawl, because it’s O(N) vs. O(1) for the hashtable.

But change this query to be on the ID column instead of Name and you’ll see a very different story. In that case, it does nested loops for both queries, but the INNER JOIN version is able to replace one of the clustered index scans with a seek — meaning that this will literally be an order of magnitude faster with a large number of rows.

So the conclusion is more or less what I mentioned several paragraphs above; this is almost certainly an indexing or index coverage problem, possibly combined with one or more very small tables. Those are the only circumstances under which SQL Server might sometimes choose a worse execution plan for an INNER JOIN than a LEFT JOIN .

Добавить комментарий

Ваш адрес email не будет опубликован. Обязательные поля помечены *