SQL Optimization

Use ‘regexp_like’ to replace ‘LIKE’ clauses

SQL ❌

SELECT *
FROM
    table1
WHERE
    lower(item_name) LIKE '%samsung%' OR
    lower(item_name) LIKE '%xiaomi%' OR
    lower(item_name) LIKE '%iphone%' OR
    lower(item_name) LIKE '%huawei%'
    --and so on

SQL ✅

SELECT *
FROM
    table1
WHERE
    REGEXP_LIKE(lower(item_name),
    'samsung|xiaomi|iphone|huawei')

Use ‘regexp_extract’ to replace ‘Case-when Like’

SQL ❌

SELECT
CASE
    WHEN concat(' ',item_name,' ') LIKE '%acer%' then 'Acer'
    WHEN concat(' ',item_name,' ') LIKE '%advance%' then 'Advance'
    WHEN concat(' ',item_name,' ') LIKE '%alfalink%' then 'Alfalink'
...
AS brand
FROM item_list

SQL ✅

SELECT
    regexp_extract(item_name,'(asus|lenovo|hp|acer|dell|zyrex|...)')
AS brand
FROM item_list

Convert long list of IN clause into a temporary table

SQL ❌

SELECT *
FROM Table1 as t1
WHERE
    itemid in (3363134,5189076, ..., 4062349)

SQL ✅

SELECT *
FROM Table1 as t1
JOIN (
    SELECT
        itemid
    FROM (
        SELECT
            split('3363134, 5189076, ...,', ', ') as bar
    )
    CROSS JOIN
        UNNEST(bar) AS t(itemid)
) AS Table2 as t2
ON
    t1.itemid = t2.itemid

Always order your JOINs from largest tables to smallest tables

SQL ❌

SELECT
    *
FROM
    small_table
JOIN
    large_table
ON small_table.id = large_table.id

SQL ✅

SELECT
    *
FROM
    large_table
JOIN
    small_table
ON small_table.id = large_table.id

Use simple equi-joins

Two tables with date string e.g., ‘2020-09-01’, but one of the tables only has columns for year, month, day values

SQL ❌

SELECT *
FROM
    table1 a
JOIN
    table2 b
ON a.date = CONCAT(b.year, '-', b.month, '-', b.day)

SQL ✅

SELECT *
FROM
    table1 a
JOIN (
    select
        name, CONCAT(b.year, '-', b.month, '-', b.day) as date
    from
        table2 b
) new
ON a.date = new.date

Always “GROUP BY” by the attribute/column with the largest number of unique entities/values

SQL ❌

select
    main_category,
    sub_category,
    itemid,
    sum(price)
from
    table1
group by
    main_category, sub_category, itemid

SQL ✅

select
    main_category,
    sub_category,
    itemid,
    sum(price)
from
    table1
group by
    itemid, sub_category, main_category

Avoid subqueries in WHERE clause

SQL ❌

select
    sum(price)
from
    table1
where
    itemid in (
        select itemid
        from table2
)

SQL ✅

with t2 as (
    select itemid
    from table2
)

select
    sum(price)
from
    table1 as t1
join
    t2
on t1.itemid = t2.itemid

Use Max instead of Rank

SQL ❌

SELECT *
from (
    select
        userid,
        rank() over (order by prdate desc) as rank
    from table1
)
where ranking = 1

SQL ✅

SELECT userid, max(prdate)
from table1
group by 1

Other Tips

  • Use approx_distinct() instead of count(distinct) for very large datasets
  • Use approx_percentile(metric, 0.5) for median
  • Avoid UNIONs where possible
  • Use WITH statements vs. nested subqueries

References:

  • “SQL Performance Explained” by Markus Winand - This book provides a comprehensive guide to SQL optimization and covers topics such as indexing, execution plans, and database design.

  • “SQL Tuning” by Dan Tow - This book covers advanced techniques for tuning SQL queries and provides practical examples and case studies.

  • “The Art of SQL” by Stephane Faroult and Peter Robson - This book provides insights into the design and implementation of high-performance SQL systems and covers topics such as query optimization, indexing, and database schema design.

  • “Query Optimization and Execution in Oracle Database 12c” - This whitepaper by Oracle provides an overview of query optimization and execution in Oracle Database 12c, and includes best practices for optimizing SQL performance.

  • “SQL Optimization Techniques” - This article by Microsoft provides an overview of SQL optimization techniques, including indexing, query rewriting, and data normalization.

  • “MySQL Query Optimization” - This article by the MySQL documentation provides tips and best practices for optimizing SQL queries in MySQL, including indexing, query profiling, and schema design.

  • Pradnyana, K. D. (2022). SQL Optimization.

Previous