Subqueries in FROM and SELECT, The Join family of operators

컴퓨터/DB, SQL 2025. 6. 3. 14:46

728x90

출처

https://www.edx.org/learn/relational-databases/stanford-university-databases-relational-databases-and-sql

StanfordOnline: Databases: Relational Databases and SQL | edX

This course is one of five self-paced courses on the topic of Databases, originating as one of Stanford's three inaugural massive open online courses released in the fall of 2011. The original "Databases" courses are now all available on edx.org. This cour

www.edx.org

select sID, sName, GPA, GPA*(sizeHS/1000.0) as scaledGPA
from Student
where GPA*(sizeHS/1000.0) - GPA > 1.0
	or GPA - GPA*(sizeHS/1000.0) > 1.0;

다음과 같이 바꿀 수 있다.

(1) abs (absolute function) 활용하기

(2) from 절에 sub query 이용하기

select *
from(select sID, sName, GPA, GPA*(sizeHS/1000.0) as scaledGPA
from Student) G
where abs(G.scaledGPA - GPA) > 1.0;

* Colleges paired with the highest GPA of their applicants

select distinct College.cName, state, GPA
from College, Apply, Student
where College.cName = Apply.cName
	and Apply.sID = Student.sID
    and GPA >= all
    	(select GPA from Student, Apply where Student.sID = Apply.sID and Apply.cName = College.cName);

select clause에 sub query 활용하는 쿼리로 바꾸기

select cName, state,
(select distinct GPA
from Apply, Student
where College.cName = Apply.cName
	and Apply.sID = Student.sID
    and GPA >= all (select GPA from Student, Apply where Student.sID = Apply.sID and Apply.cName = College.cName)) as GPA
from College;

* 대학교와 대학교에 지원한 학생들 나열하기

select cName, state,
(select distinct sName from Apply, student where College.cName = Apply.cName and Apply.sID = Student.sID) as sName
from College;

❗️에러 발생

select 문에 있는 sub query에서 두 개 이상의 row를 리턴하면 에러가 발생한다.

from R1, R2, R3... => implicitly a cross product of those tables

Explicit Join of tables 4가지

(1) inner join: 관계대수의 세타 조인과 같다.

cross product을 하고 condition을 만족하는 튜플만 남긴다

ex.

',' => join (comma 대신 join, comma being the cross product)

where => on

select distinct sName, major
from Student inner join Apply
on Student.sID = Apply.sID;

# 동일
select distinct sName, major
from Student, Apply
where Student.sID = Apply.sID;

inner join은 디폴트 join이기 때문에 'inner join' 대신 'join'이라고만 써도 된다.

join is an abbreviation for inner join.

ex 2.

select sName, GPA
from Student join Apply
on Student.sID = Apply.sID
where sizeHS < 1000 and major = 'CS' and cName = 'stanford';

# 동일
select sName, GPA
from Student join Apply
on Student.sID = Apply.sID
	and sizeHS < 1000 and major = 'CS' and cName = 'stanford';

# 동일
select sName, GPA
from Student, Apply
where Student.sID = Apply.sID
	and sizeHS < 1000 and major = 'CS' and cName = 'stanford';

postgre에서는 3개 테이블을 조인하려고 하면 에러가 난다

requires all join operators to be binary, explicity just joining two relations.

# ❌ postgreSql에서는 3개 테이블을 조인하려고 하면 에러가 발생
select Apply.sID, sName, GPA, Apply.cName, enrollment
from Apply join Student join College
on Apply.sID = Student.sID and Apply.cName = College.cName;

# ✅ 이렇게 바꿔서 실행하면 된다
select Apply.sID, sName, GPA, Apply.cName, enrollment
from (Apply join Student on Apply.sID = Student.sID) join College
on Apply.cName = College.cName;

SQL 쿼리 작성 방식이 실제 처리 순서에 영향을 줄 수 있다.

SQL은 선언형 언어이기 때문에, 무엇을 할지는 말하지만 어떻게 할지는 명시하지 않는다고 알려져 있다.

하지만 실제로는, 작성한 쿼리의 구조(특히 join 순서나 괄호의 위치)가 DBMS의 실행 계획에 영향을 줄 수 있다.

위 코드에서 Apply와 Student를 조인한 다음, 그 결과를 College와 조인하자고 썼다면

대부분의 SQL 시스템은 그 순서를 따라 실제로 실행한다.

작은 테이블끼리 먼저 JOIN하면 성능이 빠를 수 있고,

큰 테이블부터 JOIN하면 메모리를 많이 쓰고 느릴 수 있다.

그래서 SQL을 튜닝할 때 JOIN 순서나 괄호를 바꾸면서 성능을 비교해보는 게 일반적이다.

(2) Natural Join: 관계대수의 natural join과 같다

automatically equating the same name columes(equates columns across tables of the same name)

it requires the values in those columns to be the same to keep the tuple in the cross product.

중복 칼럼을 제거한다.(it eliminates those duplicate attribute names in it's resut)

select distinct sName, major
from Student natural join Apply;

# 동일
select distinct sName, major
from Student inner join Apply
on Student.sID = Apply.sID;

natural join + 조건 추가하기

select sName, GPA
from Student natural join Apply
where sizeHS < 1000 and major = 'CS' and cName='stanford';

any relation joined with itself gives you back the original relation

select *
from Student S1 natural join Student S2; # Student 테이블과 동일한 결과 반환

(3) Inner join Using(attrs)

natural join과 비슷하지만 동등 비교할 attr을 직접 나열한다.

putting explicitly the equated attributes in the column

natural join은 implicitly combines columns that have the same name

반면 이 방식은 명시적으로 컬럼 이름을 적기 때문에 better practice다

현실 세계 테이블에서는 40,50, 수백개 컬럼이 있는 릴레이션이 있을 수 있고
값이 같지 않은데 칼럼명이 같을 수 있기 때문에 더 권장하는 방법.

select sName, GPA
from Student join Apply using(sID)
where sizeHS < 1000 and major = 'CS' and cName='Stanford';

select S1.sID, S1.sName, S1.GPA, S2.sID, S2.sName, S2.GPA
from Student S1, Student S2
where S1.GPA = S2.GPA and S1.sID < S2.sID;

# 동일
# 대부분의 SQL system에서 using clause와 on clause in combination with the join을 허용하지 않는다
select S1.sID, S1.sName, S1.GPA, S2.sID, S2.sName, S2.GPA
from Student S1 join Student S2 using(GPA)
where S1.sID < S2.sID; # on 쓰면 에러 발생함

(4) outer join

Left | Right | Full Outer Join이 있다

combining tuples similar to the theta join except when tuples don't match the theta condition,

they're still added to the result and padded with no values

select sName, sID, cName, major
from Student left outer join Apply using(sID); # outer word는 빼도 된다, left join만 써도 된다

# 같은 결과를 반환하는 쿼리, 실제 사용은 비추
select sName, sID, cName, major
from Student natural left outer join Apply;

# out join을 사용하지 않고 동일한 outer join 결과 도출하는 쿼리
select sName, Student.sID, cName, major
from Student, Apply
where Student.sID = Apply.sID
UNION
select sName, sID, null, null
from Student
where sID not in (select sID from Apply);

blank is actually a null value

Left inner join

왼쪽에 있는 릴레이션의 튜플이라면 뭐든지 결과에 포함시킨다.

만약 오른쪽 릴레이션에 매칭되는 튜플이 없다면, dangling tuple이라고 부르고

dangling tuple with no right matching tuple이어도 결과에 포함시킨다. null value로 패딩해서.

select sName, sID, cName, major
from Student full outer join Apply using(sID);

# 동일
select sName, sID, cName, major
from Student left outer join Apply using(sID)
union # 중복 제거함
select sName, sID, cName, major
from Student right outer join Apply using(sID)

# 동일
select sName, Student.sID, cName, major
from Student, Apply
where Student.sID = Apply.sID
union
select sName, sID, NULL, NULL
from Student
where sID not in (select sID from Apply)
union
select NULL, sID, cName, major
from Apply
where sID not in (select sID from Student);

* 연산의 성질

Commutativity(교환법칙)

(A op B) = (B op A)

join, cross product 등 SQL 혹은 관계대수 대부분이 commutative하다.

하지만 left outer join, right outer join이 예외다.

full outer join은 commutative하다.

Associativity(결합법칙)

(A op B) op C = A op (B op C)

outer join은 associative하지 않다.

full outer join, left/right outer join 모두 not associative

따라서 연산의 순서를 잘 생각해서 쿼리를 짜야 한다.

수행 순서:

T1 ⋈ T2 via B → T1(1,2) + T2(2,3) → 결과: (1,2,3)
위 결과와 T3를 NATURAL FULL OUTER JOIN → A와 C 공통 컬럼 사용
- T3에는 A=4가 있으나 T1-T2 결과엔 없음 → (4,NULL,5) 포함

natural join은 공통된 컬럼 이름을 기준으로 자동 조인

공통된 컬럼이 모두 일치할 때에만 병합

T2 NATURAL FULL OUTER JOIN T3 결과와 T1의 공통 컬럼은 A,B
T1의 (1,2)가 있는 row는 없기 때문에 아무것도 매칭되지 않는다.

저작자표시 (새창열림)

'컴퓨터 > DB, SQL' 카테고리의 다른 글

Null values (edX StanfordOnline Databases: Relational Databases and SQL) (1)	2025.06.15
Quiz (edX StanfordOnline Databases: Relational Databases and SQL) (1)	2025.06.15
Table Variables and Set Operators, Subqueries in WHERE clause (0)	2025.06.02
(edX stanfordOnline Databases: Relational Databases and SQL) Introduction, Basic Select (0)	2025.05.25
(edX stanfordOnline Databases: Modeling and Theory) UML to Relations (0)	2025.05.21

ABOUT ME

수제 녹차 수제 녹차

수행 순서:

'컴퓨터 > DB, SQL' 카테고리의 다른 글

티스토리툴바

ABOUT ME

수행 순서:

'컴퓨터 > DB, SQL' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바