I am the Research Fellow in National University of Singapore (NUS). Prior to that, I received the PhD and Master Degree from NUS in 2023 and 2019, supervised by Prof. Li Haizhou, Bachelor Degree from Soochow University in 2018.
My research interest includes (audio-only or audio-visual) speech processing: enhancement, extraction and seperation; speaker processing: recognition, diarization, active speaker detection and anti-spoofing. I also work in self-supervised learning. I have published more than 20 papers at the top international AI conferences and journals such as TASLP, TMM, ACM MM, ICASSP, INTERSPEECH.
📜 Research Area
Research Area | Tasks |
---|---|
Speech processing | (Audio-visual) speech enhancement, extraction and separation |
Speaker processing | (Audio-visual) speaker recognition, verification, diarization and anti-spoofing |
Multi-modal speech processing | Active speaker detection, cross-modal speaker recognition |
Algoirthm | Self-supervised learning, fundamental model |
🏫 Education
- 2019.08 - 2023.08, Ph.D. in Speech Processing and Computer Vision, National University of Singapore (NUS), Singapore.
- 2018.08 - 2019.06, M.Sc. in Electronic and Computer Engineer, National University of Singapore (NUS), Singapore.
- 2014.09 - 2018.06, B.Eng. in Electronic Engineer, Soochow University, Suzhou, China.
Working Experience
- 2023.08 - Now, Research Fellow, National University of Singapore (NUS), Singapore.
📝 Publication
2024
- Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization Ruijie Tao, Shi Zhan, Yidi Jiang, Duc-Tuan Truong, Eng-Siong Chng, Massimo Alioto and Haizhou Li. ACM Multimedia, 2024,
- Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training, Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li, TMM, 2024.
- Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention, Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li, Under Review, 2024.
- Voice Conversion Augmentation for Speaker Recognition on Defective Datasets, Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li, Under Review, 2024.
- How Do Neural Spoofing Countermeasures Detect Partially Spoofed Audio?, Tianchi Liu, Lin Zhang, Rohan Kumar Das, Yi Ma, Ruijie Tao, Haizhou Li, INTERSPEECH, Oral, 2024.
- Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection, Duc-Tuan Truong, Ruijie Tao, Tuan Nguyen, Hieu-Thi Luong, Kong Aik Lee, Eng Siong Chng, INTERSPEECH, 2024.
- Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification, Duc-Tuan Truong, Ruijie Tao, Jia Qi Yip, Kong Aik Lee, Eng Siong Chng, ICASSP, 2024.
- Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech, Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang, Haizhou Li, ICASSP, 2024.
- Prompt-driven Target Speech Diarization, Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian and Haizhou Li, ICASSP, Oral, 2024.
- USED: Universal Speaker Extraction and Diarization, Junyi Ao, Mehmet Sinan Yıldırım, Meng Ge, Shuai Wang, Ruijie Tao, Yanmin Qian, Liqun Deng, Longshuai Xiao, Haizhou Li, Under Review, 2024.
- A Benchmark for Multi-speaker Anonymization, Xiaoxiao Miao, Ruijie Tao, Chang Zeng, Xin Wang, Under Review, 2024.
- Target Speech Diarization with Multimodal Prompts, Yidi Jiang, Ruijie Tao, Zhengyang Chen, Yanmin Qian, Haizhou Li, Under Review, 2024.
2023
- Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs, Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki and Haizhou Li, TASLP, 2023.
- Speaker recognition with two-step multi-modal deep cleansing, Ruijie Tao, Kong Aik Lee, Shi Zhan and Haizhou Li, ICASSP, 2023.
- Deep Cross-modal Retrieval between Space Image and Acoustic Speech, Xinyuan Qian, Wei Xue, Qiquan Zhang, Ruijie Tao and Haizhou Li, TMM, 2023.
- Bi-directional Image-Speech Retrieval Through Geometric Consistency, Xinyuan Qian, Wei Xue, Qiquan Zhang, Ruijie Tao, Yiming Wang, Kainan Chen, Haizhou Li, ICCV Workshop, 2023.
- Target Active Speaker Detection with Audio-visual Cues, Yidi Jiang, Ruijie Tao, Zexu Pan and Haizhou Li, INTERSPEECH, 2023.
2022
- Self-supervised Speaker Recognition with Loss-gated Learning, Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki and Haizhou Li. ICASSP, 2022,
- Selective Hearing through Lip-reading, Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li, TASLP, 2022.
- Ego4D: Around the World in 3,000 Hours of Egocentric Video, Kristen Grauman, Andrew Westbury, Eugene Byrne, …, Ruijie Tao, …, et al, CVPR, Oral, Best Paper Finallist, 2022.
2021
- Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection, Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li, ACM Multimedia, Oral, 2021,
- NUS-HLT Report for ActivityNet Challenge 2021 AVA (Speaker), Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li, CVPR Workshop Report, 2021.
- Muse: Multi-modal target speaker extraction with visual cues, Zexu Pan, Ruijie Tao, Chenglin Xu, and Haizhou Li, ICASSP, 2021.
- HLT-NUS Submission for 2020 NIST Conversational Telephone Speech SRE, Rohan Kumar Das, Ruijie Tao and Haizhou Li, Arxiv, 2021.
- I4U System Description for NIST SRE’20 CTS Challenge, Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, …, Ruijie Tao, …, et al, Arxiv, 2021.
2020
- Audio-visual Speaker Recognition with a Cross-modal Discriminative Network, Ruijie Tao, Rohan Kumar Das and Haizhou Li, INTERSPEECH, 2020.
- HLT-NUS Submission for 2019 NIST Multimedia Speaker Recognition Evaluation, Rohan Kumar Das, Ruijie Tao, Jichen Yang, Wei Rao, Cheng Yu and Haizhou Li, APSIPA, 2020.
💻 Open Source Code
- Speaker Recognition Framework
- Active Speaker Detection Framework
- Self-supervised Speaker Recognition Framework
- Audio-visual Speaker Recognition Framework
- Cross-modal Speaker Recognition Framework
- Ego4d Benchmark
👔 Internship and Visiting Experience
- 2022.02 - 2022.08, Visiting Student, Chinese University of Hong Kong (CUHKSZ), Shenzhen, China.
- 2015.07 - 2015.08, Visiting Student, University of Cambridge, Cambridge, UK.
🎖 Others
Award
- The 1st place winner in FAME Challenge, ACM-Multimedia, 2024
- Egocentric Vision (EgoVis) 2022/2023 Distinguished Paper Award, 2024
- IEEE SLP Student Travel Grant, ICASSP Best Paper Nominee (Corresponding author), 2024
- Nanyang Speech Technology Forum, Best Student Paper Award, 2023
- PREMIA, Best Student Paper Award, 2022
- CVPR Best Paper Nominee, 2022
- The 2nd place winner in NIST Speaker Recognition Evaluation (SRE), 2021
- The 3rd place winner in the ActivityNet Challenge (Speaker), CVPR Workshop, 2021
- NUS Research Scholarship, 2019
Reviewer
- Computer Vision and Pattern Recognition Conference (CVPR),
- IEEE Transactions on Audio, Speech, and Language Processing (TASLP),
- IEEE The International Conference on Acoustics, Speech, & Signal Processing (ICASSP),
- IEEE Spoken Language Technology Workshop(SLT),
- The International Speech Communication Association (INTERSPEECH),
- Signal Processing Letters (SPL),
- Digital Signal Processing (DSP),
- Computer Speech & Language (CSL)
- IEEE Open Journal of Signal Processing (OJSP)
- International Symposium on Chinese Spoken Language Processing (ISCSLP)
Teaching
- EE3801 Data Engineering Principles, NUS undergraduate course
- EE5132 Wireless and Sensor Networks, NUS graduate course