BSc Information and Computing Science

Zeyu Yin (Joey)

BSc Information and Computing Science student working on audio question answering and hierarchical audio intelligence

I am an undergraduate researcher at Xi'an Jiaotong-Liverpool University. My recent work studies modality importance in audio question answering, and my current research proposal focuses on hierarchical audio intelligence: toward unified representations for general audio understanding across recognition, grounding, and reasoning tasks.

Academic profile

Research on audio question answering and hierarchical audio intelligence.

Current emphasis: modality importance analysis in audio question answering and a research proposal on unified representations for general audio understanding.

Institution

Xi'an Jiaotong-Liverpool University

Location

Suzhou, China

Research proposal direction

Hierarchical audio intelligence for general audio understanding, with emphasis on abstraction levels, unified evaluation across tasks, efficiency, and robustness.

Selected research

A few project pages from recent work

These entries summarize recent work in a straightforward way, with room to expand selected pages as more material becomes available.

Case study
Research project

Listening or Reading? An Empirical Study of Modality Importance Analysis Across AQA Question Types

A case study on how audio question answering systems rely on acoustic evidence versus textual or contextual shortcuts across different question types.

Audio Question Answering
modality weighting
Acoustic Reasoning
DCASE 2025
Interactive Research Preview

How do different question types depend on audio?

Compare 6 AQA question types and inspect whether the model is truly listening.

DCASE 2025 Task 5EchoTwin-QA · BEATs + BERT6 question types
Sound CountingMostly audio-groundedBest 35.7% at lambda=0.9
Lambda sweep

Real aggregated accuracy from your experiment, averaged across the available seeds for this question type.

Text-only
Audio-only
lambda=0.0: 30.4%
lambda=1.0: 25.9%
Readout
Range
25.9% - 35.7%
Balanced
30.4%
Audio-only
25.9%
Key settings
Text-only
lambda=0.0
Accuracy: 30.4%

No material change. Remains strong without audio.

Balanced
lambda=0.5
Accuracy: 30.4%

No material change. Useful as a reference point in the sweep.

Audio-heavy
lambda=0.9
Accuracy: 35.7%

+5.4 pts vs text-only. Improves when audio contributes more.

Accuracy improves toward audio-heavy settings and peaks near lambda=0.9.
Challenge system
Research project

ECHOTWIN-QA: A Dual-Tower BEATSBERT System for DCASE 2025 Task 5

An end-to-end audio question answering system built for the DCASE 2025 Challenge, with training, evaluation, and ablation studies conducted from scratch.

DCASE 2025
BEATSBERT
End-to-end AQA
SURF project
Research project

Expressive Timing Modelling in Performed Classical Piano Music

A summer undergraduate research project exploring expressive timing in performed classical piano music through computational modeling.

Music information retrieval
Audio modeling
Research
Publications

Selected publications and reports

A concise view of recent work, including venue, year, contribution summary, and links.

Workshop paper
2025

Listening or Reading? An Empirical Study of Modality Importance Analysis Across AQA Question Types

DCASE 2025 Workshop

Zeyu Yin, Yiqiang Cai, Pingsong Deng, Xinyang Lyu, Shengchen Li

Designed the study, implemented modality-importance experiments, analyzed results across question types, and wrote the paper.

Technical report
2025

ECHOTWIN-QA: A Dual-Tower BEATSBERT System for DCASE 2025 Task 5 Audio Question Answering

DCASE 2025 Challenge (Task 5)

Zeyu Yin, Ziyang Zhou, Yiqiang Cai, Shengchen Li, Xi Shao

Built the end-to-end AQA system from scratch, ran training and evaluation pipelines, conducted ablations, and wrote the technical report.

Technical report
2025

ADAPTF-SEPNET: AudioSet-Driven Adaptive Pre-training of TF-SEPNet for Multi-device Acoustic Scene Classification

DCASE 2025 Challenge

Ziyang Zhou, Zeyu Yin, Yiqiang Cai, Shengchen Li, Xi Shao

Contributed to model development and experimental evaluation, and supported results analysis and manuscript preparation.

Research interests

Current research themes

The work is organized around a few connected questions in audio question answering, hierarchical representations, evaluation, and robustness.

Hierarchical audio intelligence
Large audio models
Audio question answering
Modality importance analysis
Unified evaluation across tasks
Efficiency and robustness
Research framing

Clear questions, unified evaluation, and careful diagnostics.

The site brings together published work, ongoing research themes, and a case-study page that can later host figures, ablations, and interactive analysis.

Timeline

Academic trajectory

A preview of recent roles and milestones that situate current research interests in a broader academic path.

2025

DCASE 2025 participant and researcher

Developed an end-to-end AQA system and analyzed modality importance across question types

2025

Workshop and challenge papers

Worked on study design, experimentation, ablations, and writing for audio question answering projects

2024

SURF undergraduate researcher

Worked on expressive timing modelling in performed classical piano music at XJTLU

2023

Academic Excellence Award recipient

Received the University Academic Excellence Award with full scholarship support