I am the Research Fellow in National University of Singapore (NUS), supervised by Prof. Li Haizhou. Prior to that, I received the PhD and Master Degree from NUS in 2023 and 2019, Bachelor Degree from Soochow University in 2018.
My research interest Audio-visual speech processing, includes (audio-only or audio-visual) speaker recognition, speaker diarization, speech extraction, active speaker detection, self-supervised learning. I have published more than 10 papers at the top international AI conferences and journals such as TASLP, ACM MM, ICASSP, INTERSPEECH.
📜 Research Area
Speech Processing: Speaker recognition, Speaker diarization, Target speaker extraction, anti-spoofing, speech separation, voice conversation, text-to-speech
Computer Vision: Face recognition; Face detection; Lip reading
Multi-modal Processing: Audio-visual active speaker detection, AV speaker recognition, AV target speaker extraction, talking face generation
Algoirthm: Self-supervised speech processing
🏫 Education
- 2019.08 - 2023.08, Ph.D. in Speech Processing and Computer Vision, National University of Singapore (NUS), Singapore.
- 2018.08 - 2019.06, M.Sc. in Electronic and Computer Engineer, National University of Singapore (NUS), Singapore.
- 2014.09 - 2018.06, B.Eng. in Electronic Engineer, Soochow University, Suzhou, China.
Working Experience
- 2023.08 - Now, Research Fellow, National University of Singapore (NUS), Singapore.
📝 Publication
2024
- Emphasized Non-Target Speaker Knowledge in Knowledge Distillation for Automatic Speaker Verification, Duc-Tuan Truong, Ruijie Tao, Jia Qi Yip, Kong Aik Lee, Eng Siong Chng, ICASSP, 2024.
- Audio-Visual Active Speaker Extraction for Sparsely Overlapped Multi-talker Speech, Junjie Li, Ruijie Tao, Zexu Pan, Meng Ge, Shuai Wang, Haizhou Li, ICASSP, 2024.
- Prompt-driven Target Speech Diarization, Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian and Haizhou Li, ICASSP, 2024.
- USED: Universal Speaker Extraction and Diarization, Junyi Ao, Mehmet Sinan Yıldırım, Meng Ge, Shuai Wang, Ruijie Tao, Yanmin Qian, Liqun Deng, Longshuai Xiao, Haizhou Li, Arxiv, 2024.
2023
- Deep Cross-modal Retrieval between Space Image and Acoustic Speech, Xinyuan Qian, Wei Xue, Qiquan Zhang, Ruijie Tao and Haizhou Li, TMM, 2023.
- Bi-directional Image-Speech Retrieval Through Geometric Consistency, Xinyuan Qian, Wei Xue, Qiquan Zhang, Ruijie Tao, Yiming Wang, Kainan Chen, Haizhou Li, ICCV Workshop, 2023.
- Target Active Speaker Detection with Audio-visual Cues, Yidi Jiang, Ruijie Tao, Zexu Pan and Haizhou Li, INTERSPEECH, 2023.
- Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs, Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki and Haizhou Li, TASLP, 2023.
- Speaker recognition with two-step multi-modal deep cleansing, Ruijie Tao, Kong Aik Lee, Shi Zhan and Haizhou Li, ICASSP, 2023.
2022
- Self-supervised Speaker Recognition with Loss-gated Learning, Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki and Haizhou Li. ICASSP, 2022,
- Selective Hearing through Lip-reading, Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li, TASLP, 2022.
- Ego4D: Around the World in 3,000 Hours of Egocentric Video, Kristen Grauman, Andrew Westbury, Eugene Byrne, …, Ruijie Tao, …, et al, CVPR, Oral, Best Paper Nominee, 2022.
2021
- Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection, Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li, ACM Multimedia, Oral, 2021,
- NUS-HLT Report for ActivityNet Challenge 2021 AVA (Speaker), Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li, CVPR Workshop Report, 2021.
- Muse: Multi-modal target speaker extraction with visual cues, Zexu Pan, Ruijie Tao, Chenglin Xu, and Haizhou Li, ICASSP, 2021.
- HLT-NUS Submission for 2020 NIST Conversational Telephone Speech SRE, Rohan Kumar Das, Ruijie Tao and Haizhou Li, Arxiv, 2021.
- I4U System Description for NIST SRE’20 CTS Challenge, Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, …, Ruijie Tao, …, et al, Arxiv, 2021.
2020
- Audio-visual Speaker Recognition with a Cross-modal Discriminative Network, Ruijie Tao, Rohan Kumar Das and Haizhou Li, INTERSPEECH, 2020.
- HLT-NUS Submission for 2019 NIST Multimedia Speaker Recognition Evaluation, Rohan Kumar Das, Ruijie Tao, Jichen Yang, Wei Rao, Cheng Yu and Haizhou Li, APSIPA, 2020.
💻 Open Source Code
- Speaker Recognition Framework
- Active Speaker Detection Framework
- Self-supervised Speaker Recognition Framework
- Audio-visual Speaker Recognition Framework
- Ego4d Benchmark
👔 Internship and Visiting Experience
- 2022.02 - 2022.08, Visiting Student, Chinese University of Hong Kong (CUHKSZ), Shenzhen, China.
- 2015.07 - 2015.08, Visiting Student, University of Cambridge, Cambridge, UK.
🎖 Others
Award
- Nanyang Speech Technology Forum, Best Student Paper Award, 2023
- PREMIA, Best Student Paper Award, 2022
- The 2nd place winner in NIST Speaker Recognition Evaluation (SRE), 2021
- The 3rd place winner in the ActivityNet Challenge (Speaker), CVPR Workshop, 2021
- NUS Research Scholarship, 2019
Reviewer
- Computer Vision and Pattern Recognition Conference (CVPR),
- Transactions on Audio, Speech, and Language Processing (TASLP),
- The International Conference on Acoustics, Speech, & Signal Processing (ICASSP),
- Signal Processing Letters (SPL),
- Digital Signal Processing (DSP),
- Computer Speech & Language (CSL)
Teaching
- EE3801 Data Engineering Principles, NUS undergraduate course
- EE5132 Wireless and Sensor Networks, NUS graduate course