Continuous Birdsong Recognition Using Gaussian Mixture Modeling of Image Shape Features
Source
Evernote/Papers/Continuous Birdsong Recognition Using Gaussian Mixture Modeling of Image Shape Features.md
Summary
이 논문은 새 노래 종 식별을 위해 전통적인 음향 특징 대신 스펙트로그램을 회색조 이미지로 간주하여 이미지 형태 특징을 추출하는 새로운 방법을 제안합니다. MPEG-7 ART(Angular Radial Transform) 디스크립터를 사용하여 스펙트로그램의 주파수 및 시간적 변동을 포착하며, 이를 위해 섹터 확장 알고리즘을 도입했습니다. 가우시안 혼합 모델(GMM)을 활용한 28 종 분류 실험에서 제안된 ART 디스크립터는 기존 LPCC, MFCC, TDMFCC 등보다 높은 정확도(3 초 구간 86.30%, 5 초 구간 94.62%)를 달성했습니다.
Key Points
- 새 노래 인식에 음향 모델 대신 스펙트로그램의 이미지 형태 특징 활용
- MPEG-7 ART 디스크립터를 통한 주파수/시간 변동의 효율적 기술
- 스펙트로그램을 ART 기저 함수에 맞게 변환하는 섹터 확장 알고리즘 제안
- GMM 기반 28 종 분류에서 기존 특징(LPCC, MFCC 등) 대비 우수성 입증
Related
-
Structured Streaming Skeleton (SSS): 온라인 인간 제스처 인식용 새로운 특징 추출 방법
-
Moment-Based Spectral Analysis of Large-Scale Networks Using Local Structural Information
-
Point Representation for Local Optimization: Towards Multi-Dimensional Gray Codes
-
Anomaly Extraction in Backbone Networks Using Association Rules
-
Fast, Accurate Detection of 100,000 Object Classes on a Single Machine (Technical Supplement)
-
Supporting Flexible, Efficient, and User-Interpretable Retrieval of Similar Time Series
-
Fast Near-Duplicate Image Detection Using Uniform Randomized Trees
-
Efficient Closed-Form Solution to Generalized Boundary Detection
-
Weakly Supervised Learning of Object Segmentations from Web-Scale Video
-
Smooth Nonnegative Matrix Factorization for Unsupervised Audiovisual Document Structuring
-
Efficient Estimation of Word Representations in Vector Space
-
An Unsupervised Feature Selection Framework for Social Media Data
-
언어 독립적 시간 표현 판별적 파싱 (Language-Independent Discriminative Parsing of Temporal Expressions)
-
Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices
-
A Hamming Embedding Kernel with Informative Bag-of-Visual Words for Video Semantic Indexing
-
Dynamic Time Warping for Music Conducting Gestures Evaluation
-
Similarity-based Clustering by Left-Stochastic Matrix Factorization
-
Neighborhood Preserving Codes for Assigning Point Labels: Applications to Stochastic Search
-
Enlisting the Ghost: Modeling Empty Categories for Machine Translation
-
Active Learning through Adaptive Heterogeneous Ensembling (AHE)
-
Semantic content-based recommendation of software services using context
-
웹 데이터베이스 검색 결과 자동 주석 처리 (Automatic Annotation of Web Database Search Results)
-
Social Event Classification via Boosted Multimodal Supervised Latent Dirichlet Allocation
-
동적 스타 네트워크에서 다중 유형 객체의 공진화 (Co-Evolution of Multi-Typed Objects in Dynamic Star Networks)
-
Information-Theoretic Outlier Detection for Large-Scale Categorical Data
-
λ-Diverse Nearest Neighbors Browsing for Multidimensional Data
-
Improved Domain Adaptation for Statistical Machine Translation
-
Feature Ensemble Plus Sample Selection: Domain Adaptation for Sentiment Classification
-
Coherent image selection using a fast approximation to the generalized traveling salesman problem
-
Speech and Natural Language: Where Are We Now And Where Are We Headed
-
Efficient Inference and Structured Learning for Semantic Role Labeling