热门标签 | HotTags
当前位置:  开发笔记 > 数据库 > 正文

Oracle中的半联结和反联结详解

这篇文章主要介绍了Oracle中的半联结和反联结详解,也称半连接和反连接,其实就是in、exists,需要的朋友可以参考下

当两张表进行联结的时候,如果表1中的数据行是否出现在结果集中需要根据表2中出现或不出现至少一个相匹配的数据行来判断,这种情况就会发生半联结;而反联结便是半联结的补集,它们会作为数据库中常见的联结方法如NESTED LOOPS,MERGE SORT JOIN,HASH JOIN的选项出现。

实际上半联结和反联结本身也可以被认同是两种联结方法;在CBO优化模式下,优化器能够根据实际情况灵活的转换执行语句从而实现半联结和反联结方法,毕竟没有什么SQL语法可以显式的调用半联结和反联结,它们只是SQL语句满足某些条件时优化器可以选择的选项而已,不过仍然有必要深入这两种选项在特定情况下带来的性能优势。

半联结

半联结通常都发生在使用含有IN和EXISTS的相关子查询的时候,=ANY的用法与IN相同,所以也会出现发生半联结的情况;不过也是有例外的,在11gR2版本中,优化器不会为任何包含在OR分支中的子查询选择半联结,这也是现在官档中唯一明确标识的限制条件,来看几种发生场景:

代码如下:

-- 使用IN关键字的相关子查询 => 发生NESTED LOOPS半联结
SQL> select department_name
  2  from hr.departments dept
  3  where department_id in (select department_id from hr.employees emp);

11 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEPARTMENT_ID"="DEPARTMENT_ID")

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         11  rows processed

-- 使用EXISTS关键字的相关子查询 => 发生NESTED LOOPS半联结
SQL> select department_name
  2  from hr.departments dept where exists
  3  (select null from hr.employees emp where emp.department_id = dept.department_id);

11 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID")

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         11  rows processed

-- 谓语中使用了OR分支中的EXISTS子查询 => 禁用半联结
SQL> select department_name
  2  from hr.departments dept
  3  where 1=2 OR exists
  4  (select null from hr.employees emp where emp.department_id = dept.department_id);

11 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 440241596

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    27 |   432 |     4   (0)| 00:00:01 |
|*  1 |  FILTER            |                   |       |       |            |          |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |     2 |     6 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( EXISTS (SELECT 0 FROM "HR"."EMPLOYEES" "EMP" WHERE
              "EMP"."DEPARTMENT_ID"=:B1))
   3 - access("EMP"."DEPARTMENT_ID"=:B1)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         35  consistent gets
          0  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         11  rows processed

从结果集来看,我们很容易联想到内联结,那为什么半联结通常来说会获得更高的性能呢?这实际也是半联结优化的关键,拿NESTED LOOPS来举例,在NESTED LOOPS联结中,驱动表被读取后需要逐个的进入内层循环来进行匹配工作,并且只有当外层循环的数据行与内层循环中的每一行数据匹配运算完成后才会结束一个结果集的获取;而相对而言,半联结的区别在于数据集1中的每一条记录只返回一次,而不管数据集2中有几条匹配的记录,因此,半联结会在找到子查询中匹配到的第一条数据后立即结束处理从而提高性能。

对于某些需要利用半联结来提高性能的场景,可以通过手动的方式控制半联结的执行计划,使用SEMIJOIN和NO_SEMIJOIN提示分别可以指定优化器使用和禁用半联结。

代码如下:

-- 使用NO_SEMIJOIN提示禁用半联结
SQL> select department_name
  2  from hr.departments dept
  3  where department_id in (select /*+ no_semijoin */department_id from hr.employees emp);

11 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3372191744

------------------------------------------------------------------------------------------
| Id  | Operation            | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |                   |   106 |  1802 |     4  (25)| 00:00:01 |
|   1 |  VIEW                | VM_NWVW_2         |   106 |  1802 |     4  (25)| 00:00:01 |
|   2 |   HASH UNIQUE        |                   |   106 |  2544 |     4  (25)| 00:00:01 |
|   3 |    NESTED LOOPS      |                   |   106 |  2544 |     3   (0)| 00:00:01 |
|   4 |     TABLE ACCESS FULL| DEPARTMENTS       |    27 |   567 |     3   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN | EMP_DEPARTMENT_IX |     4 |    12 |     0   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("DEPARTMENT_ID"="DEPARTMENT_ID")

Statistics
----------------------------------------------------------
        506  recursive calls
          0  db block gets
        188  consistent gets
          7  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
         10  sorts (memory)
          0  sorts (disk)
         11  rows processed

除此之外,我们还可以使用_always_semi_join隐藏参数选择半联结的联结类型,有关_always_semi_join参数的可选值:

代码如下:

SQL> SELECT
  2           PARNO_KSPVLD_VALUES     pvalid_par#,
  3           NAME_KSPVLD_VALUES      pvalid_name,
  4           VALUE_KSPVLD_VALUES     pvalid_value,
  5           DECODE(ISDEFAULT_KSPVLD_VALUES, 'FALSE', '', 'DEFAULT' ) pvalid_default
  6   FROM
  7           X$KSPVLD_VALUES
  8   WHERE
  9           LOWER(NAME_KSPVLD_VALUES) LIKE '%'||LOWER(nvl('&pname',name_kspvld_values))||'%'
 10   ORDER BY
 11           pvalid_par#,
 12           pvalid_default,
 13           pvalid_Value
 14   /

  PAR# PARAMETER                                          VALUE                          DEFAULT
------ -------------------------------------------------- ------------------------------ -------
  1705 _always_semi_join                                  CHOOSE
       _always_semi_join                                  HASH
       _always_semi_join                                  MERGE
       _always_semi_join                                  NESTED_LOOPS
       _always_semi_join                                  OFF

该参数的默认值为choose,表示选用半联结的类型由优化器来决定,下面来使用_always_semi_join参数将上面的NESTED LOOPS半联结改变为HASH JOIN半联结:

代码如下:

-- 默认发生NESTED LOOPS SEMI
SQL> select department_name
  2  from hr.departments dept
  3  where department_id in (select department_id from hr.employees emp);

11 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2605691773

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS SEMI |                   |    10 |   190 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEPARTMENT_ID"="DEPARTMENT_ID")

Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         11  consistent gets
          0  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         11  rows processed

-- session级别修改参数
SQL> alter session set "_always_semi_join"=merge;

Session altered.

-- 发生MERGE JOIN SEMI
SQL> select department_name
  2  from hr.departments dept
  3  where department_id in (select department_id from hr.employees emp);

11 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 954076352

--------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |                   |    10 |   190 |     4  (25)| 00:00:01 |
|   1 |  MERGE JOIN SEMI             |                   |    10 |   190 |     4  (25)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| DEPARTMENTS       |    27 |   432 |     2   (0)| 00:00:01 |
|   3 |    INDEX FULL SCAN           | DEPT_ID_PK        |    27 |       |     1   (0)| 00:00:01 |
|*  4 |   SORT UNIQUE                |                   |   107 |   321 |     2  (50)| 00:00:01 |
|   5 |    INDEX FULL SCAN           | EMP_DEPARTMENT_IX |   107 |   321 |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
       filter("DEPARTMENT_ID"="DEPARTMENT_ID")

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          5  consistent gets
          1  physical reads
          0  redo size
        742  bytes sent via SQL*Net to client
        523  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
         11  rows processed
-- 从trace来看优化器的选择还是非常可靠的。。

反联结

从本质上来说,反联结和半联结很多相似的因素,反联结的发生通常是在使用含有NOT IN,NOT EXISTS的相关子查询的时候,同样,如果子查询谓语OR分支中,反联结也会被禁用,它和半联结主要的不同点还是在返回数据的匹配方式上,它是会返回在子查询中没有匹配到的数据行,不过其优化的原理是一致的,通过在子查询中找到第一条匹配记录而立即停止处理来提高效率,一下是发生的集中场景:

代码如下:

SQL> set autotrace traceonly
-- NOT IN 触发反联结
SQL> select department_name
  2  from hr.departments
  3  where department_id not in
  4  (select department_id from hr.employees where department_id is not null);

16 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3082375452

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    17 |   323 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |                   |    17 |   323 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("DEPARTMENT_ID"="DEPARTMENT_ID")
       filter("DEPARTMENT_ID" IS NOT NULL)

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
         12  consistent gets
          6  physical reads
          0  redo size
        985  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         16  rows processed

-- EXISTS触发反联结
SQL> select department_name
  2  from hr.departments dept
  3  where not exists
  4  (select null from hr.employees emp where emp.department_id = dept.department_id);

16 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3082375452

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    17 |   323 |     3   (0)| 00:00:01 |
|   1 |  NESTED LOOPS ANTI |                   |    17 |   323 |     3   (0)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   INDEX RANGE SCAN | EMP_DEPARTMENT_IX |    41 |   123 |     0   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID")

Statistics
----------------------------------------------------------
          3  recursive calls
          0  db block gets
         13  consistent gets
          0  physical reads
          0  redo size
        985  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
         16  rows processed


从上面的例子可以看出minus和outer join操作都可以巧妙的实现同样的结果,不过从执行计划来看,minus操作显然没有反联结操作优化,而使用outer join虽然发生了反联结优化,但是由于使用了带空值的虚拟记录来匹配数据行,不便于理解,因此实际还是不建议使用的。

如果想要手动控制反联结的执行计划,这里也有一些hint和参数可以使用,常用的hint有:

1.ANTIJOIN-进行反联结,优化器决定联结类型
2.USE_ANTI-老版本的提示,和ANTIJOIN功能一致
3.[NL_AJ] | [HASH_AJ] | [MERGE_AJ]-指定发生反联结的类型(10g开始被弃用,不过仍然可以生效)

在参数控制方面,也有个和_always_semi_join非常相同的_always_anti_join参数,用法完全一致;还有参数_optimizer_null_aware_antijoin,_optimizer_outer_to_anti_enable用于控制对含空值和外联结的反联结转换。

代码如下:

-- 使用hint显式指定反联结类型
SQL> select department_name
  2  from hr.departments dept
  3  where not exists (select /*+ hash_aj */ null from hr.employees emp
  4  where emp.department_id = dept.department_id);

16 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 3587451639

----------------------------------------------------------------------------------------
| Id  | Operation          | Name              | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |                   |    17 |   323 |     5  (20)| 00:00:01 |
|*  1 |  HASH JOIN ANTI    |                   |    17 |   323 |     5  (20)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS       |    27 |   432 |     3   (0)| 00:00:01 |
|   3 |   INDEX FULL SCAN  | EMP_DEPARTMENT_IX |   107 |   321 |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("EMP"."DEPARTMENT_ID"="DEPT"."DEPARTMENT_ID")

Statistics
----------------------------------------------------------
        566  recursive calls
          0  db block gets
        193  consistent gets
          0  physical reads
          0  redo size
        985  bytes sent via SQL*Net to client
        535  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
         12  sorts (memory)
          0  sorts (disk)
         16  rows processed

-- 使用_optimizer_null_antijoin参数关闭反联结中的空值考虑选项-即返回空值的情况不使用反联结
SQL> alter session set "_optimizer_null_aware_antijoin"=false;

Session altered.

SQL> select department_name
  2  from hr.departments
  3  where department_id not in (select department_id from hr.employees);

no rows selected

Execution Plan
----------------------------------------------------------
Plan hash value: 3416340233

----------------------------------------------------------------------------------
| Id  | Operation          | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |             |    26 |   416 |    30   (0)| 00:00:01 |
|*  1 |  FILTER            |             |       |       |            |          |
|   2 |   TABLE ACCESS FULL| DEPARTMENTS |    27 |   432 |     3   (0)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| EMPLOYEES   |     2 |     6 |     2   (0)| 00:00:01 |
----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( NOT EXISTS (SELECT 0 FROM "HR"."EMPLOYEES" "EMPLOYEES"
              WHERE LNNVL("DEPARTMENT_ID"<>:B1)))
   3 - filter(LNNVL("DEPARTMENT_ID"<>:B1))

Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
        172  consistent gets
          0  physical reads
          0  redo size
        343  bytes sent via SQL*Net to client
        513  bytes received via SQL*Net from client
          1  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          0  rows processed


推荐阅读
  • 本文介绍了在 SQL Server 2012 客户端中格式化 SQL 查询语句的多种方法,包括内置功能和第三方工具,帮助用户提高代码可读性和维护性。 ... [详细]
  • 本文探讨了在 SQL Server 中使用 JDBC 插入数据时遇到的问题。通过详细分析代码和数据库配置,提供了解决方案并解释了潜在的原因。 ... [详细]
  • 本文详细介绍了如何解决 Microsoft SQL Server 中用户 'sa' 登录失败的问题。错误代码为 18470,提示该帐户已被禁用。我们将通过 Windows 身份验证方式登录,并启用 'sa' 帐户以恢复其访问权限。 ... [详细]
  • 配置PHPStudy环境并使用DVWA进行Web安全测试
    本文详细介绍了如何在PHPStudy环境下配置DVWA( Damn Vulnerable Web Application ),并利用该平台进行SQL注入和XSS攻击的练习。通过此过程,读者可以熟悉常见的Web漏洞及其利用方法。 ... [详细]
  • 本文详细介绍如何使用 Apache Spark 执行基本任务,包括启动 Spark Shell、运行示例程序以及编写简单的 WordCount 程序。同时提供了参数配置的注意事项和优化建议。 ... [详细]
  • docker镜像重启_docker怎么启动镜像dock ... [详细]
  • openGauss每日一练第 12 天 |学习openGauss定义数据类型
    自己安装的openGauss环境启动openGaussgsctlDgaussdatadb1start登录openGaussgsqldpostgresp26000r1.创建一 ... [详细]
  • 软件工程课堂测试2
    要做一个简单的保存网页界面,首先用jsp写出保存界面,本次界面比较简单,首先是三个提示语,后面是三个输入框,然 ... [详细]
  • 本文档介绍了如何在Visual Studio 2010环境下,利用C#语言连接SQL Server 2008数据库,并实现基本的数据操作,如增删改查等功能。通过构建一个面向对象的数据库工具类,简化了数据库操作流程。 ... [详细]
  • 数据排序、无限滚动与分页加载及子查询的使用
    本文介绍了数据排序的基本方法,包括升序和降序排列。同时探讨了瀑布流布局(无限滚动)和传统分页技术在Web应用中的应用,并详细解释了子查询的概念及其替代方案。 ... [详细]
  • 本文详细介绍了如何在Linux系统中创建和管理DB2数据库,包括用户切换、数据库创建、错误处理、连接与断开、表空间和缓冲池的创建,以及用户权限管理和数据导入导出等操作。 ... [详细]
  • 本文介绍了如何通过在数据库表中增加一个字段来记录文章的访问次数,并提供了一个示例方法用于更新该字段值。 ... [详细]
  • 在寻找轻量级Ruby Web框架的过程中,您可能会遇到Sinatra和Ramaze。两者都以简洁、轻便著称,但它们之间存在一些关键区别。本文将探讨这些差异,并提供详细的分析,帮助您做出最佳选择。 ... [详细]
  • 本文介绍了两款提高Android开发效率的插件:Lombok和SQLScout。Lombok可以帮助开发者自动处理实体类的Getter和Setter方法,使代码更加简洁;而SQLScout则提供了直观的SQLite数据库查看功能,极大地简化了数据库调试过程。 ... [详细]
  • 当unique验证运到图片上传时
    2019独角兽企业重金招聘Python工程师标准model:public$imageFile;publicfunctionrules(){return[[[na ... [详细]
author-avatar
Bd专业8楼nv
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有