Research

Research highlights

A short overview of recent results and ongoing directions, with brief context for each area.

Current emphasis

Building hierarchical audio representations and evaluation protocols for general audio understanding.

The audio QA case study provides one concrete example, while the broader proposal extends toward cross-task transfer, efficiency, and robustness under realistic compute budgets.

View case study

Research theme

Audio question answering

I use AQA as a concrete testbed for understanding whether systems answer from actual acoustic evidence or rely too heavily on semantic and textual shortcuts.

Listening or Reading? An Empirical Study of Modality Importance Analysis Across AQA Question Types

ECHOTWIN-QA: A Dual-Tower BEATSBERT System for DCASE 2025 Task 5 Audio Question Answering

Research theme

Modality importance analysis

A central theme in my work is measuring how much different input modalities contribute across question types, so that model behavior can be interpreted more rigorously.

Question-type-specific ablation studies

Acoustic versus semantic dependency analysis

Research theme

Hierarchical audio intelligence

My research proposal asks whether explicit hierarchical representations can improve generalization and robustness across heterogeneous audio tasks compared with flat representations at similar compute budgets.

Acoustic -> units -> events -> scenes -> semantics

Unified representation for general audio understanding

Research theme

Unified evaluation across tasks

A core direction in the proposal is to build a compact evaluation suite spanning recognition, audio-text grounding, and reasoning-style audio tasks, with analysis tied to specific abstraction levels.

Cross-task transfer matrix

Recognition, grounding, and reasoning tasks

Research theme

Efficiency and robustness

I am interested in compute-aware scaling, efficient adaptation, and robustness under compression, noise, domain shift, and long-audio settings.

Compute-capability scaling curves

Robustness under compression and domain shift

Study design and analysis

Recent work uses audio question answering as a testbed for studying modality weighting, question-type-specific behavior, and hierarchical representations.

Controlled fusion coefficients for audio/text balance
Question-type-stratified accuracy and statistical tests
Diagnostics for shortcutting versus perceptual grounding

Benchmarks and tools

The projects build on publicly available benchmarks and toolchains so that ideas can transfer to future work.

DCASE 2025 Task 5 multi-domain AQA benchmark
BEATs-based audio encoders and BERT-style text towers
PyTorch training pipelines with ablations and evaluation