Research highlights
A short overview of recent results and ongoing directions, with brief context for each area.
Building hierarchical audio representations and evaluation protocols for general audio understanding.
The audio QA case study provides one concrete example, while the broader proposal extends toward cross-task transfer, efficiency, and robustness under realistic compute budgets.
Audio question answering
I use AQA as a concrete testbed for understanding whether systems answer from actual acoustic evidence or rely too heavily on semantic and textual shortcuts.
Modality importance analysis
A central theme in my work is measuring how much different input modalities contribute across question types, so that model behavior can be interpreted more rigorously.
Hierarchical audio intelligence
My research proposal asks whether explicit hierarchical representations can improve generalization and robustness across heterogeneous audio tasks compared with flat representations at similar compute budgets.
Unified evaluation across tasks
A core direction in the proposal is to build a compact evaluation suite spanning recognition, audio-text grounding, and reasoning-style audio tasks, with analysis tied to specific abstraction levels.
Efficiency and robustness
I am interested in compute-aware scaling, efficient adaptation, and robustness under compression, noise, domain shift, and long-audio settings.
Study design and analysis
Recent work uses audio question answering as a testbed for studying modality weighting, question-type-specific behavior, and hierarchical representations.
- Controlled fusion coefficients for audio/text balance
- Question-type-stratified accuracy and statistical tests
- Diagnostics for shortcutting versus perceptual grounding
Benchmarks and tools
The projects build on publicly available benchmarks and toolchains so that ideas can transfer to future work.
- DCASE 2025 Task 5 multi-domain AQA benchmark
- BEATs-based audio encoders and BERT-style text towers
- PyTorch training pipelines with ablations and evaluation