GoP 계산 파이프라인 분석 및 구현 계획 (Day 275)

Source

Field Notes/ReturnZero/Daily Notes/Day 275. 2022-04-01.md

Summary

이 노트는 Kaldi 기반 GoP(Goodness of Pronunciation) 계산 파이프라인의 핵심 단계인 compute-gop, align-compiled-mapped, ali-to-phones 등의 코드 흐름을 분석한 기록이다. 특히 alignment 생성부터 phone-level feature 추출까지의 데이터 흐름을 추적하며, 실제 구현을 위해 필요한 함수 단계(5, 8, 9, 11, 12 번)를 도출해냈다. 저자 몸 상태(코로나)로 인해 진행이 지연되었으나, 코드 로직 이해를 바탕으로 구현 계획을 구체화했다.

Key Points

GoP 계산의 핵심 입력은 model, alignment, prob-matrix이며, compute-gop는 이를 통해 LPP/LPR을 계산한다.
alignment 생성 과정(align-compiled-mapped)에서 FasterDecoder를 통해 디코딩 후 GetLinearSymbolSequence로 phone 시퀀스를 추출함을 확인했다.
ali-to-phones의 --per-frame 옵션이 활성화되면 프레임 수만큼 phone ID를 출력하여, 이것이 compute-gop의 alignment 입력이 됨을 파악했다.
구현을 위해 feature pipeline은 제외하고, nnet-compute, fsts 생성, ali-fsts 생성, ali-phones 생성, gop 계산 등 핵심 함수 단계만 분리해 구현할 계획이다.
lexicon feature는 TrainingGraphCompiler 초기화용이나, without-lexicon 모드에서는 NULL 처리됨을 확인했다.

AncomWiki

탐색기

GoP 계산 파이프라인 분석 및 구현 계획 (Day 275)

GoP 계산 파이프라인 분석 및 구현 계획 (Day 275)

Source

Summary

Key Points

그래프 뷰

목차

백링크

AncomWiki

탐색기

GoP 계산 파이프라인 분석 및 구현 계획 (Day 275)

GoP 계산 파이프라인 분석 및 구현 계획 (Day 275)

Source

Summary

Key Points

Related

그래프 뷰

목차

백링크