[PostgreSQL] 기본 데이터 타입 - Character types

지과쌤 2020. 8. 17.

Character types / why should we use char instead of varchar?
Numeric types
Boolean types
Temporal types
UUID for storing Universally Unique Identifiers
Array for storing array strings, numbers, etc...
JSON stores JSON data
hstore stores key-value pair
User defined data

Character types

character varying(n), varchar(n) - 가변적인 길이를 갖는다. -> 최대 한도(n) 이내에서 입력된 데이터에 따라 길이가 변한다.

(8000byte까지)

character(n), char(n) - 고정적인 길이를 갖는다. -> 입력된 데이터가 최대 한도(n)보다 작다면, 남은 부분을 공백으로 채워넣는다. (8000byte까지)

text - 제한이 없는 가변적인 길이를 갖는다.

(최대 2gb까지 → 한글기준 500 * 1024 * 1024* 2 자 까지 가능)

varchar(n), char(n) 을 사용할 때, (n)을 빼고 넣는 경우가 있는데, 이때 varchar 는 text와 동일하게 취급하고 char 는 char(1)로 취급한다.

따로 text와 varchar를 구분하여 사용하는 경우는 없는 편이다.

다만, varchar(n)의 경우 어느정도 값의 범위가 가늠이 되지만, text 의 경우 가늠 자체가 안되기 때문에 따로 어떤 값들을 갖게될지 메모해주면 좋을 것 같다.

번외 - char 와 varchar를 각각 사용하는 이유?

우리가 type 선택에 있어 고려하는 부분들은 다음과 같을 것이다.

저장공간을 얼마나 차지하는지, I/O 성능은 어떨지.....

먼저 sql 에서 row structure가 어떤식으로 구성되는지를 알아야한다.

https://www.red-gate.com/simple-talk/sql/database-administration/sql-server-storage-internals-101/

SQL Server Storage Internals 101 - Simple Talk

This article is an extract from the book Tribal SQL. In this article, Mark S. Rasmussen offers a concise introduction to the physical storage internals behind SQL Server databases. He doesn't dive into every detail, but provides a simple, clear picture of

www.red-gate.com

해당 페이지를 읽었다는 전제 하에 간단히 기술하도록 하겠다.

structure of a data record at the byte level

모든 row structure는 record type과 관련된 2byte의 status로 시작한다.

다음 2byte는 fixed-length data의 총 길이를 저장한다. (fixed-length data의 끝을 가리킨다.)

그 다음 fixed-length data [char(n)] 를 저장해주게 된다. 이 data 는 길이가 고정되어있기때문에 전체 열의 크기를 계산할 수 있다.

따라서, 전체 크기가 고정되어있기때문에 무리없이 우선적으로 해당 index에 배치할 수 있다.

다음 두 영역은 해당 record에 대한 null값을 포함하는 열과, null 값이 들어가있는 열을 추적하는 null bitmap이다.

(fixed-length data는 항상 할당된 공간을 차지하므로 값이 null인지 아닌지 여부를 판단해야하고, variable-length data는 ""인지 null인지 판단해야하기 때문에 해당 영역을 구성한다.)

다음 2byte는 variable-length data 열의 개수를 저장하고, variable column offset array 와 실제 variable length column data 가 오게되는데 좀더 자세히 볼 필요가 있다.

variable-length data portion of a data record

먼저 0x0200은 variable column의 갯수를 갖고있는 2byte짜리 data이다.

그리고 그 다음으로 각 varchar(n) column의 끝을 가르키는 2byte짜리 offset array(pointer 라고 봐도 될듯하다.)들이 위치한다음 , 실제 variable data가 들어가게 된다.

즉

varchar(4)의 끝을 가르키는 2byte짜리 offset array '0x1400'

varchar(3)의 끝을 가르키는 2byte짜리 offset array '0x1700'

varchar(n)의 개수 x 2byte 만큼 overhead가 발생하는것이다. ( char(n) 는 offset array가 존재하지 않는다.)

그럼 이 부분이 char 와 varchar간의 선택에 있어 어떻게 작용할까?

1. 한글자만 넣는 경우

char(1) = 1 bytes

varchar(1) = 3 bytes (2 bytes of overhead)

이 경우에는 무조건 char를 사용하는게 좋아보인다. overhead가 없으니까!

2. "apple" 단어를 넉넉하게 할당된 곳에 넣는 경우

char(10) = 10 bytes ( 5 bytes overhead )

varchar(10) = 7 bytes ( 2 bytes overhead )

varchar보다 char가 더 큰 overhead가 발생했다. n = 'data length' 또는 'data length - 1' 이 아니면 무조건 varchar가 성능적으로나, 차지하는 공간으로나 훨씬 좋은것같다!

+@ UTF8 을 사용할 경우

UTF8은 언어에 따라 문자당 최대 3 bytes까지 크기가 변하는데 latin_1 의 경우 문자당 3 bytes가 필요하다.

극단적인 예로, latin_1 한글자를 넣는다고 가정했을때,

char(10) = 30 bytes ( 27 bytes overhead )

varchar(10) = 5 bytes ( 2 bytes overhead )

갭이 너무나도 커지게된다.... 따라서 UTF8 을 사용한다면 왠만해서는 varchar를 사용하는게 좋을 것 같다.

<참고자료>

https://stackoverflow.com/questions/59667/what-are-the-use-cases-for-selecting-char-over-varchar-in-sql

What are the use cases for selecting CHAR over VARCHAR in SQL?

I realize that CHAR is recommended if all my values are fixed-width. But, so what? Why not just pick VARCHAR for all text fields just to be safe.

stackoverflow.com

https://www.sqlservercentral.com/forums/topic/overhead-when-using-datatype-varchar

Overhead when using datatype varchar ? – SQLServerCentral

Overhead when using datatype varchar ? – Learn more on the SQLServerCentral forums

www.sqlservercentral.com

https://stackoverflow.com/questions/1885630/whats-the-difference-between-varchar-and-char/15553059

What's the difference between VARCHAR and CHAR?

What's the difference between VARCHAR and CHAR in MySQL? I am trying to store MD5 hashes.

stackoverflow.com

저작자표시 비영리 변경금지 (새창열림)

'DB' 카테고리의 다른 글

[PostgreSQL] 기본 데이터 타입 정리 - Array (0)	2020.09.01
[PostgreSQL] 기본 데이터 타입 정리 - UUID , Serial (0)	2020.09.01
[PostgreSQL] 기본 데이터 타입 정리 - Temporal types (0)	2020.09.01
[PostgreSQL]기본 데이터 타입 정리 - Boolean types (0)	2020.09.01
[PostgreSQL] 기본 데이터 타입 - Numeric types (0)	2020.08.17

[PostgreSQL] 기본 데이터 타입 - Character types

Character types

번외 - char 와 varchar를 각각 사용하는 이유?

'DB' 카테고리의 다른 글

댓글

💲 추천 글

티스토리툴바

[PostgreSQL] 기본 데이터 타입 - Character types

Character types

번외 - char 와 varchar를 각각 사용하는 이유?

'DB' 카테고리의 다른 글

볼 만한 글

댓글

💲 추천 글

티스토리툴바