시스템 개발 매뉴얼 3

tensorflow, CNN을 통해 머신 러닝 시작하기

Posted Nov 7, 2024 Updated Dec 22, 2024

By Jiwon Song

5 min read

시스템 개발 매뉴얼 3

tensorflow, CNN을 통해 머신 러닝 시작하기

MediaPipe 손 감지 시스템

랜드마크 추출 원리

MediaPipe는 단일 프레임에서 21개의 3D 손 키포인트를 감지
두 단계 파이프라인 사용:
1. BlazePalm 모델로 손 위치 감지
2. 감지된 영역에서 고정밀 3D 랜드마크 추출

CNN 모델

1D CNN 선택 이유

시계열 데이터나 순차 데이터 처리에 적합
손 랜드마크의 공간적 정보를 보존하면서 특징 추출 가능
고정된 길이의 데이터(21개 랜드마크) 처리에 효과적

모델 아키텍처 설계

Conv1D(64, 3) -> MaxPool -> Conv1D(128, 3) -> MaxPool -> Conv1D(64, 3) -> Dense

계층적 특징 추출을 위해 3개의 Conv1D 레이어 사용
MaxPooling으로 특징 압축 및 연산량 감소
Dropout으로 과적합 방지

이미지 처리 파이프라인

한글 경로 지원

바이너리 모드로 파일 읽기로 한글 경로 문제 해결
numpy 배열 변환으로 OpenCV 처리 가능

데이터 전처리

RGB 색상 변환으로 MediaPipe 입력 형식 맞춤
랜드마크 정규화로 학습 안정성 확보

학습 및 예측 시스템

학습 프로세스

원-핫 인코딩으로 라벨 변환
검증 세트 분리로 모델 성능 평가
배치 처리로 학습 효율성 향상

예측 시스템

실시간 랜드마크 추출
신뢰도 기반 예측 결과 필터링
JSON 형식으로 라벨 매핑 저장

이러한 구조는 실시간 수화 인식에 필요한 정확성과 효율성을 모두 달성할 수 있게 한다.

1. 필수 라이브러리 설정

수화 인식에 필요한 핵심 라이브러리들을 임포트한다.

  
import os
import cv2
import numpy as np
import mediapipe as mp
import matplotlib.pyplot as plt
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Conv1D, MaxPooling1D, Flatten
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split

mediapipe: 고성능 손 랜드마크 감지 제공
tensorflow: CNN 모델 구현과 학습에 필요
opencv: 이미지 처리와 변환에 필수
numpy: 행렬 연산과 데이터 처리의 기본

2. MediaPipe 설정

손 감지를 위한 MediaPipe 초기화 설정

  
mp_hands = mp.solutions.hands
hands = mp_hands.Hands(
    static_image_mode=True,
    max_num_hands=1,
    min_detection_confidence=0.5
)

static_image_mode=True: 실시간 영상이 아닌 정적 이미지 처리에 최적화
max_num_hands=1: 단일 손만 감지하여 처리 속도 향상
min_detection_confidence=0.5: 오탐지와 정확도의 균형점

3. 이미지 처리 시스템

  
def read_korean_image(image_path):
    with open(image_path, "rb") as f:
        bytes = bytearray(f.read())
        nparr = np.asarray(bytes, dtype=np.uint8)
        img = cv2.imdecode(nparr, cv2.IMREAD_COLOR)
        return cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

바이너리 모드 읽기: 한글 경로 문제 해결
bytearray 변환: 메모리 효율적 처리
BGR to RGB: MediaPipe 입력 요구사항 충족

4. 랜드마크 추출 시스템

  
def extract_landmarks(image_path):
    image_rgb = read_korean_image(image_path)
    results = hands.process(image_rgb)
    
    if results.multi_hand_landmarks:
        landmarks = results.multi_hand_landmarks[0]
        coords = [[lm.x, lm.y, lm.z] for lm in landmarks.landmark]
        return np.array(coords)
    return None

RGB 변환: MediaPipe 요구사항 충족
3D 좌표 추출: 공간적 특징 포착
None 반환: 손 감지 실패 시 안전한 처리

5. CNN 모델 아키텍처

  
def build_model(num_classes):
    model = Sequential([
        Conv1D(64, 3, activation="relu", input_shape=(21, 3)),
        MaxPooling1D(2),
        Conv1D(128, 3, activation="relu"),
        MaxPooling1D(2),
        Conv1D(64, 3, activation="relu"),
        Flatten(),
        Dense(128, activation="relu"),
        Dropout(0.5),
        Dense(num_classes, activation="softmax"),
    ])
    return model

Conv1D 사용: 시계열 특성 추출에 최적화
점진적 채널 확장: 특징 추출 능력 향상
Dropout 적용: 과적합 방지
Softmax 출력: 다중 클래스 분류에 적합

6. 예측 시스템

  
def predict_sign(model, image_path, label_map):
    landmarks = extract_landmarks(image_path)
    if landmarks is not None:
        landmarks = np.expand_dims(landmarks, axis=0)
        prediction = model.predict(landmarks)
        predicted_class = np.argmax(prediction[0])
        
        for label, idx in label_map.items():
            if idx == predicted_class:
                return label, prediction[0][predicted_class]
    return None, None

배치 차원 추가: 모델 입력 형식 준수
argmax 사용: 최고 확률 클래스 선택
신뢰도 반환: 예측 신뢰성 평가 가능

LAB

manual

This post is licensed under CC BY 4.0 by the author.