Chapter 6: SVD와 차원축소

진행률: 6 / 8

예상 시간: 25분난이도: 중급

SVD와 차원축소

학습 목표

특이값 분해(SVD)의 원리 이해
저계수 근사와 데이터 압축
추천시스템과 이미지 압축 응용

SVD 분해란?

A = UΣV^T

• U: 왼쪽 특이벡터 (m×m 직교행렬)
• Σ: 특이값 대각행렬 (m×n)
• V^T: 오른쪽 특이벡터 전치 (n×n 직교행렬)

핵심: 모든 행렬은 SVD 분해 가능 (정사각행렬이 아니어도 OK!)

특이값의 의미

기하학적 해석

특이값 = 변환의 스케일링 팩터
큰 특이값 = 중요한 정보
작은 특이값 = 노이즈나 세부사항

정보 압축

상위 k개 특이값만 사용:

A_k = U_k Σ_k V_k^T

압축률: k(m+n+1) / (mn)

SVD 응용 분야

이미지 압축

JPEG 압축, 노이즈 제거, 특징 추출

NLP

LSA(잠재 의미 분석), 토픽 모델링

Truncated SVD (차원 축소)

전체 특이값

상위 k개만 선택

에너지 보존: 상위 k개 특이값이 전체 정보의 90% 이상 포함

SVD 실습 코드

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# SVD 분해
def perform_svd(A):
    """행렬 A의 SVD 분해"""
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    return U, s, Vt

# 저계수 근사
def low_rank_approx(A, k):
    """상위 k개 특이값만 사용한 근사"""
    U, s, Vt = perform_svd(A)
    
    # k개만 선택
    U_k = U[:, :k]
    s_k = s[:k]
    Vt_k = Vt[:k, :]
    
    # 재구성
    A_k = U_k @ np.diag(s_k) @ Vt_k
    
    # 압축률 계산
    original_size = A.shape[0] * A.shape[1]
    compressed_size = k * (A.shape[0] + A.shape[1] + 1)
    compression_ratio = compressed_size / original_size
    
    return A_k, compression_ratio

# 이미지 압축 예제
def compress_image(image_path, k):
    """이미지 SVD 압축"""
    # 이미지 로드
    img = Image.open(image_path).convert('L')  # 그레이스케일
    img_array = np.array(img)
    
    # SVD 압축
    compressed, ratio = low_rank_approx(img_array, k)
    
    # 결과 시각화
    fig, axes = plt.subplots(1, 2, figsize=(12, 6))
    
    axes[0].imshow(img_array, cmap='gray')
    axes[0].set_title(f'Original ({img_array.shape[0]}×{img_array.shape[1]})')
    axes[0].axis('off')
    
    axes[1].imshow(compressed, cmap='gray')
    axes[1].set_title(f'Compressed (k={k}, ratio={ratio:.1%})')
    axes[1].axis('off')
    
    plt.tight_layout()
    plt.show()
    
    return compressed

# 추천 시스템 예제
def recommendation_svd(ratings_matrix, k=10):
    """SVD 기반 추천 시스템"""
    # 평균 중심화
    mean_ratings = np.mean(ratings_matrix, axis=1, keepdims=True)
    centered = ratings_matrix - mean_ratings
    
    # SVD 분해
    U, s, Vt = perform_svd(centered)
    
    # k차원으로 축소
    U_k = U[:, :k]
    s_k = s[:k]
    Vt_k = Vt[:k, :]
    
    # 예측 행렬 생성
    predictions = U_k @ np.diag(s_k) @ Vt_k + mean_ratings
    
    return predictions

# 특이값 스펙트럼 분석
def analyze_singular_values(A):
    """특이값 분포 분석"""
    _, s, _ = perform_svd(A)
    
    # 누적 에너지
    energy = np.cumsum(s**2) / np.sum(s**2)
    
    plt.figure(figsize=(12, 4))
    
    plt.subplot(1, 2, 1)
    plt.semilogy(s, 'o-')
    plt.xlabel('Index')
    plt.ylabel('Singular Value')
    plt.title('Singular Value Spectrum')
    plt.grid(True)
    
    plt.subplot(1, 2, 2)
    plt.plot(energy, 'o-')
    plt.axhline(y=0.9, color='r', linestyle='--', label='90% energy')
    plt.xlabel('Number of Components')
    plt.ylabel('Cumulative Energy')
    plt.title('Energy Distribution')
    plt.legend()
    plt.grid(True)
    
    plt.tight_layout()
    plt.show()
    
    # 90% 에너지를 위한 컴포넌트 수
    k_90 = np.argmax(energy >= 0.9) + 1
    print(f"90% 에너지 보존을 위한 컴포넌트 수: {k_90}/{len(s)}")

# 예제 실행
A = np.random.randn(100, 50)
analyze_singular_values(A)

직교성과 정규화

선형시스템