03: data representation

컴퓨터/개념 2019. 10. 20. 23:22

728x90

- Distinguish between analog and digital information

computers are finite! how do we represent an infinite world!

digitize : breaking data into pieces and representing those pieces separately

All stored as binary digits(bits)

Why do we use binary to represent digitized data? 개선하기 쉽고 효율적이고 다른 진법으로 바꾸는 것도 쉽다

효율적이고 안정적이다.

- data compression and calculate compression ratios(압축비율)

data compression : Reduction in amount of space needed to store a piece of data or the badwidth to transmit it(대역폭 줄이고자)

다루어야 하는 데이터의 양이 증가-> 압축 필요

압축률 : The size of the compressed data divided by the size of the original data

= compressed size/original size

A data compression technique can be

lossless(비손실) : the data can be retrieved without any loss of original information

lossy(손실) : some information may be lost in the process of compression

최근의 압축은 비손실에서 손실로 넘어갔다. 해상도가 아주 높거나 동영상 같은 경우 너무 정밀하여 일부분이 변경되어도 우리 눈은 달라진 것을 알지 못한다. 일부러 손실을 유발하기도 한다. jpeg, mpeg이 그 예이다.

* Electronic Signals

An analog signal continually fluctuates in voltage up and down

A digital signal has only a high or low state, corresponding to the two binary digits

All electronic signals degrade as they move down a line

The voltage of the signal fluctuates due to environmental effects

아날로그 신호에 잡음이 끼어든 경우 복구가 불가능하다.

반면 디지털 신호는 항상 0 아니면 1이기 때문에 복구하기 쉽다.

- binary formats for negative and floating-point values

자연수 : 1byte(8bits)로 보통 표현한다.

8bit를 저장했지만 0-> 00000000

computers store data in fixed-size chunks, so we have leading zeroes.

자연수로 정수로 넘어가기 위해서는 sign(부호)이 필요하다.

* representing negative values

signed-magnitude number representation(부호-크기 표현)

8bit중 0이면 positive, 1이면 negative

-> +0, -0도 있다(맨 앞이 0인 경우, 1인 경우) : 컴퓨터 내에서는 자주 사용하지 않는다.

solution : 보수 사용. Half the natural numbers will represent themselves

The other half will represent negative integers

0 ~ 49를 의미하는 것 : 0~ 49

-1 ~ -50 를 의미하는 것: 음수 50개는 사용하고 남은 숫자, 즉 50~99

* 10의 보수

ten's complement representation

Negative(I) = 10^k - I, where k is the number of digits

ex. -3= Negative(3) = 100-3(using two digits)

To perform addition, add the numbers and discard any carry to the hundred digit.

ex. 5+94=99, 98+96=94, 96+6 = 2(-4+6=2)

* 2의 보수(Two's Complement)

01111111(127) ~ 10000000(-128)

The binary number line is easier to read when written vertically.

neg(n) + n = 2^k

-27의 2진수 표현은

neg(27) = 256-27

= (1 0000 0000)_2 - (0001 1011)_2

= (1)_2 + (1111 1111)_2 - (0001 1011)_2

= (1110 0101)_2

-> 2진수 표현에서 0을 1로, 1을 0으로 바꾼 후 1을 더하면 된다.

* Number Overflow

사용할 수 있는 가장 큰 양수는 127(using 8 bits)

127 + 3 = -126(011111111+00000011 = 10000010)

* Representing Real Numbers

mantissa : 가수

sign * mantissa * 10^exp(컴퓨터에서는 2의 거듭제곱을 사용한다)

floating point : 부동 소수점, radix point(decimal point) "floats"

mantissa - having a fixed number of digits

표준 : IEEE 754 format(이진수로 실수를 저장하는 방법)

10의 몇 승인지 표현하기 어려워(컴퓨터에서)

따라서 숫자E+N : 10^N(과학에서의 숫자 표현법)

* Representing Text

What must be provided to represent text?

text = a sequence of characters

- the characteristics of the ASCII and Unicode character sets

ASCII(American Standard Code for Information Interchange)

use seven bits to represent each character

알파벳 대소문자 = 52, 숫자 10개, 구두점 열몇개

70~-80개 필요 -> 7bit(128가지)로 커버

그러나 유럽에서 쓰는 변형된 알파벳도 사용할 수 있어야 해 -> extended ASCII(8bit) -> 1byte로 정의(글자 한 글자)

아스키 코드는

NUL(null character) - 문장이 끝날 때 빈 공간 두어서 문장 끝났음을 표시

backspace

line feed - 다음 줄로 넘어갈 때

carriage return 포함 - 타자기를 치다가 지금 쓰고 있는 라인의 가장 처음으로 가고 싶을 때

* Unicode Character Set

Extended ASCII is not enough for international use

2^16 = 65536, 16bits per character

한글 등 전세계 글자 포함

The first 256 characters correspond exactly to the extended ASCII code

- text compression

how can we store and transmit data more efficiently?

Lossless compression techniques include Keyword encoding, run-length encoding, huffman encoding

* Keyword encoding(early years)

자주 사용하는 전치사, 관사 등을 문장 부호로 대체

these -> #, as -> ^, the -> ~,

진짜 #,^ 등과 구별하지 못하는 문제점이 생긴다

* Run -length Encoding

Replace a repeated sequence with

- a flag

- the repeated value

- the number of repetitions

ex. n8 <-> nnnnnnnn

- * is the flag

- n is the repeated value

- 8 is the number of times n is repeated

jjj -> *j3으로 똑같은 길이라 줄이지 않는다.

However, this type of repetition doesn't occur in English text;

사용하는 곳 : gif

* Huffman Encoding(example of prefix coding)

사용하는 글자의 빈도를 따져, 자주 사용하지 않는 글자에 긴 bit를 부여하고, 많이 사용하는 글자는 작은 bit를 부여한다.

100 -> L

10/1001 코드로 지정하지 않았어

00 : A

01 : E

100 : L

110 : O

111 : R

1010 B

1011: D

- Representing Audio Information

We perceive sound when a series of air pressure waves vibrate a membrance in our ear,

which sends signals to our brain.

* stereo : It sends an electrical signal to each speaker, which then vibrates to produce sound.

signal : analog representation of the sound wave

cf) stereo : 서로 다른 두 개의 소리를 재생해서 거리감을 준다.

digitize the audio signal

1) sampling : 잘개 쪼개서 digital 신호로 바꾸기

2) quantization :

출처 : https://www.youtube.com/watch?v=bjdEG33zBc4&list=LLxiF8FYKHYCVvapPiqfYvSg&index=3&t=69s

'컴퓨터 > 개념' 카테고리의 다른 글

용어 정리 (0)	2019.11.24
15. 네트워크 (0)	2019.11.09
04 Logic Gates(컴퓨터와 전자공학의 연관) (0)	2019.11.04
메모리를 관리하는 방법 (0)	2019.11.01
컴퓨터 구조 (0)	2019.11.01

ABOUT ME

수제 녹차 수제 녹차

'컴퓨터 > 개념' 카테고리의 다른 글

티스토리툴바

ABOUT ME

'컴퓨터 > 개념' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바