每隔一段时间,我都会遇到一个神秘的NATURAL JOIN
SQL操作符的用例,当我能把它变成NATURAL FULL JOIN
,我就更高兴了。曩昔的几篇关于这个主题的博文包含:
- 在SQL中运用NATURAL FULL JOIN来比较两个表
- 用不可思议的NATURAL FULL OUTER JOIN打动你的搭档!
最近,我在Reddit上偶然发现了一个问题:
有没有这样一种东西,能够连接两个彻底不同的表,没有任何关系,但却像一个联盟一样?
起初我想到了UNION CORRESPONDING
语法,这在大多数SQL方言中并不存在,即便它是一个规范功用。但后来,我想起这又是一个关于NATURAL FULL JOIN
的完美用例,这次与上面的比如稍有不同,即两个表的内容要进行比较。这一次,咱们要保证两个连接的表永远没有匹配的行,以获得类似UNION
的行为。
考虑一下Sakila数据库,在该数据库中,咱们有3个包含人的表,包含:
ACTOR
CUSTOMER
STAFF
这些表的界说如下:
CREATE TABLE actor (
actor_id integer NOT NULL PRIMARY KEY,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
last_update timestamp NOT NULL
);
CREATE TABLE customer (
customer_id integer NOT NULL PRIMARY KEY,
store_id integer NOT NULL,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
email varchar(50),
address_id integer NOT NULL,
active boolean NOT NULL,
create_date date NOT NULL,
last_update timestamp
);
CREATE TABLE staff (
staff_id integer NOT NULL,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
address_id integer NOT NULL,
email varchar(50),
store_id integer NOT NULL,
active boolean NOT NULL,
username varchar(16) NOT NULL,
password varchar(40),
last_update timestamp NOT NULL,
picture bytea
);
能够看出,常见的列实际上只要(FIRST_NAME, LAST_NAME, LAST_UPDATE)
,一切其他的列都是针对表的。运用下面的查询,咱们能够将一切的数据连接起来:
SELECT *
FROM (SELECT 'actor' AS source, * FROM actor) AS a
NATURAL FULL JOIN (SELECT 'customer' AS source, * FROM customer) AS c
NATURAL FULL JOIN (SELECT 'staff' AS source, * FROM staff) AS s;
结果看起来是这样的:
|source |first_name|last_name|last_update |actor_id|...|customer_id|...|staff_id|...|
|--------|----------|---------|-----------------------|--------|---|-----------|---|--------|---|
|actor |PENELOPE |GUINESS |2006-02-15 04:34:33.000|1 |...| | | | |
|actor |NICK |WAHLBERG |2006-02-15 04:34:33.000|2 |...| | | | |
|actor |ED |CHASE |2006-02-15 04:34:33.000|3 |...| | | | |
|customer|MARY |SMITH |2006-02-15 04:57:20.000| | |1 |...| | |
|customer|PATRICIA |JOHNSON |2006-02-15 04:57:20.000| | |2 |...| | |
|customer|LINDA |WILLIAMS |2006-02-15 04:57:20.000| | |3 |...| | |
|staff |Mike |Hillyer |2006-02-15 04:57:16.000| | | | |1 |...|
|staff |Jon |Stephens |2006-02-15 04:57:16.000| | | | |2 |...|
一些观察结果:
-
NATURAL JOIN
的匹配列(即具有相同称号的列)位于开头。它们包含组成的SOURCE
列,这对每个连接源来说都是不同的,所以咱们从来没有匹配过,这是咱们想要的。咱们想要UNION
语义(即连接3个表),而不是匹配它们。 - 之后列出了每个表所特有的列。它们只包含属于相关的数据。
SOURCE
这种技能显然不适合每天运用,但偶然也会有用。所以,不要轻视了 “数据 “的力气。NATURAL FULL JOIN
注意事项
我作了点小弊,实际的Sakila数据库形式在CUSTOMER
和STAFF
表之间有一个冲突:
CREATE TABLE customer (
customer_id integer NOT NULL PRIMARY KEY,
store_id integer NOT NULL,
first_name varchar(45) NOT NULL,
last_name varchar(45) NOT NULL,
email varchar(50),
address_id integer NOT NULL,
activebool boolean NOT NULL,
create_date date NOT NULL,
last_update timestamp
active integer NOT NULL, -- This is an integer
);
有了这个表的界说,我的查询就出现了这个过错:
SQL过错[42804]。ERROR:JOIN/USING类型整数和布尔值不能匹配
因此,为了处理这个问题,我修补了CUSTOMER
表的界说:
-- Patch
WITH customer AS (
SELECT
customer_id,
store_id,
first_name,
last_name,
email,
address_id,
activebool as active,
create_date,
last_update
FROM customer
)
-- Original query
SELECT *
FROM (SELECT 'actor' AS source, * FROM actor) AS a
NATURAL FULL JOIN (SELECT 'customer' AS source, * FROM customer) AS c
NATURAL FULL JOIN (SELECT 'staff' AS source, * FROM staff) AS s;
期望BigQuery的有用 [* REPLACE (...)](https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#select_replace)
语法能够更广泛地运用。