Magento Business Intelligence Help Center

Optimizing your SQL queries

BROUGHT TO YOU BY

Avatar
Avatar

Aside from allowing you to grow with Magento BI as you step up your analysis game, we built the SQL Report Builder to allow you query and iterate on those queries at any given. Your business moves at break-neck speed and you can’t always wait for an update cycle to finish before finding out a column or report you created needs tweaking.

Before a query is executed, we estimate its cost. Cost takes into consideration the length of time and number of resources required to execute a query. If that cost is deemed to be too high or if the number of returned rows exceeds our limits, the query won’t run. We put together a list of recommendations for querying your data warehouse, which will ensure you’re writing the most streamlined queries possible.

Regarding SELECT * or selecting all columns…

Selecting all columns doesn’t make for a timely, easily executed query. Queries that use SELECT * can take quite a bit of time to run, especially if your table has a large number of columns.

For this reason, we recommend you avoid using SELECT * wherever possible and only include the columns you need:

Instead of this... Try this!

Regarding full outer joins…

Outer joins select the entirety of both tables being joined, which will increase the computational cost of the query. This means that your query will take longer to run and is more likely to fail, as it may take longer than the execution limit to return the results.

Instead of using this type of join, consider using an inner or left join. Inner joins return results only where there’s a columnar match between tables (for example, order_id exists in both a typical customers and orders table); left joins will return all results from the left (first) table along with the matching results in the right (second) table.

Let’s take a look at how we can rewrite a FULL OUTER JOIN query:

Instead of this... Try this!

As you can see, these queries are identical in every way except for the type of JOIN they use. Pretty easy to remember, right?

Regarding multiple joins...

While you can include multiple joins in your query, remember that it may drive the query’s cost up. To keep from hitting the cost threshold, we recommend avoiding multiple joins where possible.

Regarding filters…

In short: use them whenever possible! WHERE and HAVING clauses will filter your results and give you only the data you really want.

Regarding filters in JOIN clauses…

If you’re using a filter when performing a join, be sure to apply it to both tables in the join. Even if it’s redundant, this will reduce the computational cost of the query and speed up the execution time.

Instead of this... Try this!

Regarding operators…

When writing queries, consider using the ‘least expensive’ operators possible. What do we mean by this? Every query has a computational cost, which is determined by the functions, operators, and filters that make up the query. Some operators require less computational effort, which makes them less expensive than other operators.

Comparison operators (>, <, =, and so on) are the least expensive, followed by LIKE. SIMILAR TO and POSIX operators bring up the rear as the most expensive of the operators.

Using EXISTS versus IN…

Using EXISTS versus IN depends on the type of results you’re trying to return. If you’re only interested in a single value, use the EXISTS clause instead of IN. IN is used in conjunction with lists of comma-separated values, which will increase the computational cost of the query.

Why is this? When IN queries are run, the system must first process the subquery (the IN statement), then the entire query based on the relationship specified in the IN statement. EXISTS is far more efficient because the query doesn’t have to be run through multiple times - a true/false value is returned while checking the relationship specified in the query.

To put it simply: the system doesn’t have to process as much when using EXISTS.

Instead of this... Try this!

Regarding ORDER BY...

ORDER BY is an expensive function in SQL and can significantly raise the cost of a query. If you receive an error message saying that the EXPLAIN cost of your query is too high, try eliminating any ORDER BYs from your query unless absolutely required.

This isn’t to say that ORDER BY can’t be used - just that it should only be used when necessary.

Regarding GROUP BY and ORDER BY...

While there may be a few situations where this approach doesn’t jibe with what you’re trying to do, the general rule is that if you’re using a GROUP BY and ORDER BY, you should put the columns in both clauses in the same order. For example:

Instead of this... Try this!

Wrapping up

The best way to learn to write SQL - and do so efficiently - is through trial and error. To find what works best for you, try to recreate a few reports using only the SQL editor. You can also check out the list of resources we put together to expand your knowledge and improve your skills.

Have another recommendation that we didn’t list? Let us know!

Was this article helpful?
0 out of 0 found this helpful

Comments

Powered by Zendesk