Monday, March 12, 2012

I smell a big, fat compiler bug

Last two or three years now I've been spending a lot of time doing
SQLServer performance tuning. The last couple of weeks I've been
doing this at a place using a whole lot of inline table-valued UDFs,
an issue both in itself and because the database does a LOT of
joining, and in this place, I've been finding one technique is working
for me just a little too often.
Time and again, there is some complex query that looks like it should
be OK, but is running 100k's of logical reads and taking ten, twenty,
thirty seconds to return a handfull of records. No doubt someone who
can really, really read execution plans might be able to glance at the
plan and see what's wrong, but I haven't reached quite that point yet.
Time and again, what I find is that if I break out a chunk of code and
run it stand-alone in QA, suddenly it runs "correctly", that is, with
99% fewer reads and 80-99% faster execution. OK, we've all seen that,
we understand how much more the optimizer knows with explicit
constants and such, and how SPs have to live with "parameter sniffing"
and such. But I've gone the next step, wrapped ten or twenty or fifty
lines in a relatively small SP or UDF, and darned if it doesn't still
run fast, with all the same parameters and variables that it had in
the original big SP.
So, I accuse the compiler/optimizer of SQLServer 2000 sp 3x: given a
large SP, it is not properly resetting the flags or counters or other
internal doodads it uses to compute optimizations.
Suggestion to speed freaks: break apart large queries inside the SP,
often that works, but when that doesn't work, try moving chunks into
separately compiled SPs, and you may find significantly better
performance falls your way.
Microsoft, take a look, wouldya?
J.
Hi
Without seeing the procedures then it is hard to comment on this. You may
want to read Kalens articles on Reuse of Query Plans and Conditional
Recompilation in SQL Server Magazine also
http://www.sql-server-performance.co...erver_udfs.asp may be useful
reading.
John
"JXStern" wrote:

> Last two or three years now I've been spending a lot of time doing
> SQLServer performance tuning. The last couple of weeks I've been
> doing this at a place using a whole lot of inline table-valued UDFs,
> an issue both in itself and because the database does a LOT of
> joining, and in this place, I've been finding one technique is working
> for me just a little too often.
> Time and again, there is some complex query that looks like it should
> be OK, but is running 100k's of logical reads and taking ten, twenty,
> thirty seconds to return a handfull of records. No doubt someone who
> can really, really read execution plans might be able to glance at the
> plan and see what's wrong, but I haven't reached quite that point yet.
> Time and again, what I find is that if I break out a chunk of code and
> run it stand-alone in QA, suddenly it runs "correctly", that is, with
> 99% fewer reads and 80-99% faster execution. OK, we've all seen that,
> we understand how much more the optimizer knows with explicit
> constants and such, and how SPs have to live with "parameter sniffing"
> and such. But I've gone the next step, wrapped ten or twenty or fifty
> lines in a relatively small SP or UDF, and darned if it doesn't still
> run fast, with all the same parameters and variables that it had in
> the original big SP.
> So, I accuse the compiler/optimizer of SQLServer 2000 sp 3x: given a
> large SP, it is not properly resetting the flags or counters or other
> internal doodads it uses to compute optimizations.
> Suggestion to speed freaks: break apart large queries inside the SP,
> often that works, but when that doesn't work, try moving chunks into
> separately compiled SPs, and you may find significantly better
> performance falls your way.
> Microsoft, take a look, wouldya?
> J.
>
|||Hi
Well, can you post some data that we will be able to reprocude the
problem/bug?
Also, at least, can you show an execution plan of the query?
"JXStern" <JXSternChangeX2R@.gte.net> wrote in message
news:aieof11bdp2vfid400cslukgm94ivkqnjh@.4ax.com...
> Last two or three years now I've been spending a lot of time doing
> SQLServer performance tuning. The last couple of weeks I've been
> doing this at a place using a whole lot of inline table-valued UDFs,
> an issue both in itself and because the database does a LOT of
> joining, and in this place, I've been finding one technique is working
> for me just a little too often.
> Time and again, there is some complex query that looks like it should
> be OK, but is running 100k's of logical reads and taking ten, twenty,
> thirty seconds to return a handfull of records. No doubt someone who
> can really, really read execution plans might be able to glance at the
> plan and see what's wrong, but I haven't reached quite that point yet.
> Time and again, what I find is that if I break out a chunk of code and
> run it stand-alone in QA, suddenly it runs "correctly", that is, with
> 99% fewer reads and 80-99% faster execution. OK, we've all seen that,
> we understand how much more the optimizer knows with explicit
> constants and such, and how SPs have to live with "parameter sniffing"
> and such. But I've gone the next step, wrapped ten or twenty or fifty
> lines in a relatively small SP or UDF, and darned if it doesn't still
> run fast, with all the same parameters and variables that it had in
> the original big SP.
> So, I accuse the compiler/optimizer of SQLServer 2000 sp 3x: given a
> large SP, it is not properly resetting the flags or counters or other
> internal doodads it uses to compute optimizations.
> Suggestion to speed freaks: break apart large queries inside the SP,
> often that works, but when that doesn't work, try moving chunks into
> separately compiled SPs, and you may find significantly better
> performance falls your way.
> Microsoft, take a look, wouldya?
> J.
>
|||On Thu, 11 Aug 2005 23:31:16 -0700, John Bell
<jbellnewsposts@.hotmail.com> wrote:
>Without seeing the procedures then it is hard to comment on this. You may
>want to read Kalens articles on Reuse of Query Plans and Conditional
>Recompilation in SQL Server Magazine also
>http://www.sql-server-performance.co...erver_udfs.asp may be useful
>reading.
Yes, I'm aware of the threat and menace of scalar UDFs in where
clauses and the like, but that isn't what I'm talking about here.
I'm not going to try to post actual code and plans from the app, cuz
(even if the code were not proprietary) it would take the schema and
stats as well to make any sense out of it. What I'm fishing for here
are similar stories and suspicions, or any official or unofficial
further descriptions of the problem or workarounds.
If I find myself with lots of time, I may try to synthesize an entire
duplicatable scenario, but it's liable to require a bunch of tables
and data to get the thing off the ground.
J.
|||Yes , I also experienced this. I have a sp 1000 lines long and when I run it
took too much time and when I devided it in 4 small sps it run about 150%
faster.
Even execution plan is almost same I can not find what causes the sp to run
faster when it is in small code.
May be locks are not released when they should be ?
"Uri Dimant" <urid@.iscar.co.il> wrote in message
news:eZssQiwnFHA.1948@.TK2MSFTNGP12.phx.gbl...
> Hi
> Well, can you post some data that we will be able to reprocude the
> problem/bug?
> Also, at least, can you show an execution plan of the query?
>
>
> "JXStern" <JXSternChangeX2R@.gte.net> wrote in message
> news:aieof11bdp2vfid400cslukgm94ivkqnjh@.4ax.com...
>
|||Hi
You may want to look at
http://www.sql-server-performance.co...recompiles.asp and
subsequent links if you are getting recompiles.
John
"AM" wrote:

>
> Yes , I also experienced this. I have a sp 1000 lines long and when I run it
> took too much time and when I devided it in 4 small sps it run about 150%
> faster.
> Even execution plan is almost same I can not find what causes the sp to run
> faster when it is in small code.
> May be locks are not released when they should be ?
> "Uri Dimant" <urid@.iscar.co.il> wrote in message
> news:eZssQiwnFHA.1948@.TK2MSFTNGP12.phx.gbl...
>
>
|||I had much the same problem on a project a couple of years ago. It
usually turned out that a less than optimal plan was cached for the
original SP - mostly due to the SP being run the first time with
non-typical params. I used DBCC FREEPROCCACHE and DBCC
DROPCLEANBUFFERS to get a more accurate benchmark.
Good luck.
Payson
|||Thanks for your feedback.
I will try to give you a few pieces of information that may help explain
what you are seeing (which, as others have mentioned, would be more
definitive with the plan + additional information in question) - so, I am
guessing.
User latency for ad hoc queries is generally a function of compilation time
+ execution time. If your query is cached and run multiple times, it is
likely a function of execution time alone.
In either case, there are a number of algorithmic problems that can impact
plan choice and thus user latency, even if the query is run multiple times.
For example, if you join N tables together, there are a large number of ways
to actually execute this query based on the join order picked. Once you
have a sufficiently large number of tables, it actually becomes
algorithmically intractible to search the complete set of possible cases (in
a reasonable time - however, eventually it just becomes impossible). So,
heuristics are used to try to find a good join order relatively quickly.
This may or may not be the exact, optimal plan for a particular machine.
Alternatives are considered from this "spot" in the universe of possible
join orders. In some cases, it may be that the initial heuristic order is
not "close" to the optimal choice and significant searching is required to
get to the right plan. Additionally, it may be that the optimal plan is so
far away from the optimal choice as to make it impossible to consider that
choice in a reasonable amount of time using the heuristics I mentioned.
If your query is one of these cases, manually rewriting the query to remove
some of the joins could possibly make things run faster for your query +
data. It's not impossible to have such a case. I'll suggest that you can
also use query hints, in many cases, to force the join order into one that
may be more optimal for your query.
Root causes for picking an initial, heuristic order that is suboptimal
include missing statistics, statistics with a sample rate that is too low,
physical grouping of values on disk that causes problems in sampled
statistics interpolation, correlations between columns that cause the
optimizer to over- or under-estimate cardinality for a join or filter, or
others. In some cases, the technique you describe (moving part of the query
into a temporary result) can correct errors because part of the query
becomes materialized - the cardinality of that intermediate result thus does
not contain some of the errors and the rest of the query may have enough
information to pick a good join order.
Join order is one way in which this condition can happen - others also
exist, but the basic impact is generally the same.
I hope that this gives you some insight into the complexities involved in
this process.
Sincerely,
Conor Cunningham
SQL Server Query Optimization Development Lead
"JXStern" <JXSternChangeX2R@.gte.net> wrote in message
news:aieof11bdp2vfid400cslukgm94ivkqnjh@.4ax.com...
> Last two or three years now I've been spending a lot of time doing
> SQLServer performance tuning. The last couple of weeks I've been
> doing this at a place using a whole lot of inline table-valued UDFs,
> an issue both in itself and because the database does a LOT of
> joining, and in this place, I've been finding one technique is working
> for me just a little too often.
> Time and again, there is some complex query that looks like it should
> be OK, but is running 100k's of logical reads and taking ten, twenty,
> thirty seconds to return a handfull of records. No doubt someone who
> can really, really read execution plans might be able to glance at the
> plan and see what's wrong, but I haven't reached quite that point yet.
> Time and again, what I find is that if I break out a chunk of code and
> run it stand-alone in QA, suddenly it runs "correctly", that is, with
> 99% fewer reads and 80-99% faster execution. OK, we've all seen that,
> we understand how much more the optimizer knows with explicit
> constants and such, and how SPs have to live with "parameter sniffing"
> and such. But I've gone the next step, wrapped ten or twenty or fifty
> lines in a relatively small SP or UDF, and darned if it doesn't still
> run fast, with all the same parameters and variables that it had in
> the original big SP.
> So, I accuse the compiler/optimizer of SQLServer 2000 sp 3x: given a
> large SP, it is not properly resetting the flags or counters or other
> internal doodads it uses to compute optimizations.
> Suggestion to speed freaks: break apart large queries inside the SP,
> often that works, but when that doesn't work, try moving chunks into
> separately compiled SPs, and you may find significantly better
> performance falls your way.
> Microsoft, take a look, wouldya?
> J.
>
|||Connor,
Thank you for your response. I'll comment on a few of your notes
below. However, my accusation remains in place, that I am seeing too
many cases where simply removing a lump of code to a separate SP gets
a totally different (and better) result from the compiler/optimizer,
such that I suspect it is not properly finding cutpoints in the text
of large SPs.
Joshua Stern
On Fri, 12 Aug 2005 13:01:35 -0700, "Conor Cunningham [MS]"
<conorc_removeme@.online.microsoft.com> wrote:

>Thanks for your feedback.
>I will try to give you a few pieces of information that may help explain
>what you are seeing (which, as others have mentioned, would be more
>definitive with the plan + additional information in question) - so, I am
>guessing.
>User latency for ad hoc queries is generally a function of compilation time
>+ execution time. If your query is cached and run multiple times, it is
>likely a function of execution time alone.
I make certain to null that out through reruns, which also generally
ends up caching the data and getting down to zero physical reads.
Most of the SPs I'm talking about here have runtimes of at least a
minute when I begin to work on them, and still take five or ten
seconds after improvements are in place, so the compilation is not
likely a huge factor in any case (well, except where there are a lot
of temp tables involved, but even there it's a fairly minor point,
which I realize I could check further with the profiler and such).

>In either case, there are a number of algorithmic problems that can impact
>plan choice and thus user latency, even if the query is run multiple times.
>For example, if you join N tables together, there are a large number of ways
>to actually execute this query based on the join order picked. Once you
>have a sufficiently large number of tables, it actually becomes
>algorithmically intractible to search the complete set of possible cases (in
>a reasonable time - however, eventually it just becomes impossible). So,
>heuristics are used to try to find a good join order relatively quickly.
>This may or may not be the exact, optimal plan for a particular machine.
>Alternatives are considered from this "spot" in the universe of possible
>join orders. In some cases, it may be that the initial heuristic order is
>not "close" to the optimal choice and significant searching is required to
>get to the right plan. Additionally, it may be that the optimal plan is so
>far away from the optimal choice as to make it impossible to consider that
>choice in a reasonable amount of time using the heuristics I mentioned.
All true in theory and practice, yet I wonder if there isn't some room
for improvement here. Where I have had success breaking a join of six
or sixteen tables into multiple queries, it hasn't seemed all that
difficult for me to guess how to decompose them best, and I do wonder
at how much better the heuristics could be.

>If your query is one of these cases, manually rewriting the query to remove
>some of the joins could possibly make things run faster for your query +
>data. It's not impossible to have such a case. I'll suggest that you can
>also use query hints, in many cases, to force the join order into one that
>may be more optimal for your query.
Never had much luck with hints.
What I'd like are optimizer hints, like "hey, this temp table is going
to only hold ten rows, so please use it as the root of your plan!")

>Root causes for picking an initial, heuristic order that is suboptimal
>include missing statistics, statistics with a sample rate that is too low,
>physical grouping of values on disk that causes problems in sampled
>statistics interpolation, correlations between columns that cause the
>optimizer to over- or under-estimate cardinality for a join or filter, or
>others. In some cases, the technique you describe (moving part of the query
>into a temporary result) can correct errors because part of the query
>becomes materialized - the cardinality of that intermediate result thus does
>not contain some of the errors and the rest of the query may have enough
>information to pick a good join order.
I don't believe I've had much of the cardinality issue, fwiw.

>Join order is one way in which this condition can happen - others also
>exist, but the basic impact is generally the same.
Any hints here, should the major tables be listed *last*? I think
most developers tend to list them first. I recall back in 6.x days
there was some order-dependency in where clauses (?) that I believe
went away as of SQL7.

>I hope that this gives you some insight into the complexities involved in
>this process.
Yes, it does. Until I started to specialize in this area, I had no
proper appreciation for just what it takes to turn a couple of lines
of SQL into an appropriate plan. I guess I would now wish for some
additional "transparency" on just WHY the optimizer chooses what it
does, so that I could use that as a basis for reorganizing code as
necessary. No doubt there are all sorts of additional topics to
explore in the area. But all in all, I *still* think there's a big,
fat bug in the compiler right now that is preventing it from doing
proper optimization of (logically independent) chunks of code in large
SPs.
Joshua Stern

No comments:

Post a Comment