Projects
Reading
People
Chat
SU\G
(đž)
/K·U
Projects
Reading
People
Chat
Sign Up
Light
Dark
System
Shusuke Takahashi
Follow
Share
Generating author description...
All published works
Action
Title
Year
Authors
+
PDF
Chat
SAVGBench: Benchmarking Spatially Aligned Audio-Video Generation
2024
Kazuki Shimada
Christian Simon
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
+
PDF
Chat
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
2024
Marco ComunitĂ
Zhi Zhong
Akira Takahashi
Shiqi Yang
Mengjie Zhao
Koichi Saito
Yukara Ikemiya
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
+
PDF
Chat
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
2024
Kengo Uchida
Takashi Shibuya
Yuhta Takida
Naoki Murata
Shusuke Takahashi
Yuki Mitsufuji
+
PDF
Chat
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
2024
Shiqi Yang
Zhi Zhong
Mengjie Zhao
Shusuke Takahashi
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
+
Zero- and Few-Shot Sound Event Localization and Detection
2024
Kazuki Shimada
Kengo Uchida
Yuichiro Koyama
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
Tatsuya Kawahara
+
Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
2024
Hao Shi
Kazuki Shimada
Masato Hirano
Takashi Shibuya
Yuichiro Koyama
Zhi Zhong
Shusuke Takahashi
Tatsuya Kawahara
Yuki Mitsufuji
+
PDF
Chat
Extending Audio Masked Autoencoders toward Audio Restoration
2023
Zhi Zhong
Hao Shi
Masato Hirano
Kazuki Shimada
Kazuya Tateishi
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
+
PDF
Chat
Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
2023
Ryosuke Sawata
Naoki Murata
Yuhta Takida
Toshimitsu Uesaka
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
+
An Attention-Based Approach to Hierarchical Multi-Label Music Instrument Classification
2023
Zhi Zhong
Masato Hirano
Kazuki Shimada
Kazuya Tateishi
Shusuke Takahashi
Yuki Mitsufuji
+
Diffroll: Diffusion-Based Generative Music Transcription with Unsupervised Pretraining Capability
2023
Kin Wai Cheuk
Ryosuke Sawata
Toshimitsu Uesaka
Naoki Murata
Naoya Takahashi
Shusuke Takahashi
Dorien Herremans
Yuki Mitsufuji
+
An Attention-based Approach to Hierarchical Multi-label Music Instrument Classification
2023
Zhi Zhong
Masato Hirano
Kazuki Shimada
Kazuya Tateishi
Shusuke Takahashi
Yuki Mitsufuji
+
Diffusion-based Signal Refiner for Speech Separation
2023
Masato Hirano
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Yuki Mitsufuji
+
Extending Audio Masked Autoencoders Toward Audio Restoration
2023
Zhi Zhong
Hao Shi
Masato Hirano
Kazuki Shimada
Kazuya Tateishi
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
+
The Whole Is Greater than the Sum of Its Parts: Improving DNN-based Music Source Separation
2023
Ryosuke Sawata
Naoya Takahashi
Stefan Uhlich
Shusuke Takahashi
Yuki Mitsufuji
+
Diffusion-Based Speech Enhancement with Joint Generative and Predictive Decoders
2023
Hao Shi
Kazuki Shimada
Masato Hirano
Takashi Shibuya
Yuichiro Koyama
Zhi Zhong
Shusuke Takahashi
Tatsuya Kawahara
Yuki Mitsufuji
+
STARSS23: An Audio-Visual Dataset of Spatial Recordings of Real Scenes with Spatiotemporal Annotations of Sound Events
2023
Kazuki Shimada
Archontis Politis
Parthasaarathy Sudarsanam
Daniel Krause
Kengo Uchida
Sharath Adavanne
Aapo Hakala
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
+
The Sound Demixing Challenge 2023 $\unicode{x2013}$ Cinematic Demixing Track
2023
Stefan Uhlich
Giorgio Fabbro
Masato Hirano
Shusuke Takahashi
Gordon Wichern
Jonathan Le Roux
Dipam Chakraborty
Sharada P. Mohanty
Kai Li
Yi Luo
+
Zero- and Few-shot Sound Event Localization and Detection
2023
Kazuki Shimada
Kengo Uchida
Yuichiro Koyama
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
Tatsuya Kawahara
+
PDF
Chat
Improving Character Error Rate is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-Box Acoustic Models
2022
Ryosuke Sawata
Yosuke Kashiwagi
Shusuke Takahashi
+
PDF
Chat
Spatial Mixup: Directional Loudness Modification as Data Augmentation for Sound Event Localization and Detection
2022
Ricardo FalcĂłn PĂ©rez
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Yuki Mitsufuji
+
PDF
Chat
Music Source Separation With Deep Equilibrium Models
2022
Yuichiro Koyama
Naoki Murata
Stefan Uhlich
Giorgio Fabbro
Shusuke Takahashi
Yuki Mitsufuji
+
PDF
Chat
Multi-ACCDOA: Localizing And Detecting Overlapping Sounds From The Same Class With Auxiliary Duplicating Permutation Invariant Training
2022
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
Emiru Tsunoo
Yuki Mitsufuji
+
PDF
Chat
Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
2022
Yuichiro Koyama
Kazuhide Shigemi
Masafumi Takahashi
Kazuki Shimada
Naoya Takahashi
Emiru Tsunoo
Shusuke Takahashi
Yuki Mitsufuji
+
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
2022
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
Emiru Tsunoo
Yuki Mitsufuji
+
SQ-VAE: Variational Bayes on Discrete Representation with Self-annealed Stochastic Quantization
2022
Yuhta Takida
Takashi Shibuya
WeiHsiang Liao
Chieh-Hsin Lai
Junki Ohmura
Toshimitsu Uesaka
Naoki Murata
Shusuke Takahashi
Toshiyuki Kumakura
Yuki Mitsufuji
+
STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events
2022
Archontis Politis
Kazuki Shimada
Parthasaarathy Sudarsanam
Sharath Adavanne
Daniel Krause
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
Tuomas Virtanen
+
DiffRoll: Diffusion-based Generative Music Transcription with Unsupervised Pretraining Capability
2022
Kin Wai Cheuk
Ryosuke Sawata
Toshimitsu Uesaka
Naoki Murata
Naoya Takahashi
Shusuke Takahashi
Dorien Herremans
Yuki Mitsufuji
+
Diffiner: A Versatile Diffusion-based Generative Refiner for Speech Enhancement
2022
Ryosuke Sawata
Naoki Murata
Yuhta Takida
Toshimitsu Uesaka
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
+
PDF
Chat
Manifold-Aware Deep Clustering: Maximizing Angles Between Embedding Vectors Based on Regular Simplex
2021
Keitaro Tanaka
Ryosuke Sawata
Shusuke Takahashi
+
PDF
Chat
Manifold-Aware Deep Clustering: Maximizing Angles between Embedding Vectors Based on Regular Simplex
2021
Keitaro Tanaka
Ryosuke Sawata
Shusuke Takahashi
+
PDF
Chat
All For One And One For All: Improving Music Separation By Bridging Networks
2021
Ryosuke Sawata
Stefan Uhlich
Shusuke Takahashi
Yuki Mitsufuji
+
PDF
Chat
Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection
2021
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
+
Manifold-Aware Deep Clustering: Maximizing Angles between Embedding Vectors Based on Regular Simplex
2021
Keitaro Tanaka
Ryosuke Sawata
Shusuke Takahashi
+
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
2021
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
Emiru Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
+
Preventing Oversmoothing in VAE via Generalized Variance Parameterization
2021
Yuhta Takida
WeiâHsiang Liao
Chieh-Hsin Lai
Toshimitsu Uesaka
Shusuke Takahashi
Yuki Mitsufuji
+
Improving Character Error Rate Is Not Equal to Having Clean Speech: Speech Enhancement for ASR Systems with Black-box Acoustic Models
2021
Ryosuke Sawata
Yosuke Kashiwagi
Shusuke Takahashi
+
Multi-ACCDOA: Localizing and Detecting Overlapping Sounds from the Same Class with Auxiliary Duplicating Permutation Invariant Training
2021
Kazuki Shimada
Yuichiro Koyama
Shusuke Takahashi
Naoya Takahashi
Emiru Tsunoo
Yuki Mitsufuji
+
Music Source Separation with Deep Equilibrium Models
2021
Yuichiro Koyama
Naoki Murata
Stefan Uhlich
Giorgio Fabbro
Shusuke Takahashi
Yuki Mitsufuji
+
Spatial Data Augmentation with Simulated Room Impulse Responses for Sound Event Localization and Detection
2021
Yuichiro Koyama
Kazuhide Shigemi
Masafumi Takahashi
Kazuki Shimada
Naoya Takahashi
Emiru Tsunoo
Shusuke Takahashi
Yuki Mitsufuji
+
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
2020
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
+
ACCDOA: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization and Detection
2020
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
+
All for One and One for All: Improving Music Separation by Bridging Networks
2020
Ryosuke Sawata
Stefan Uhlich
Shusuke Takahashi
Yuki Mitsufuji
Common Coauthors
Coauthor
Papers Together
Yuki Mitsufuji
37
Kazuki Shimada
22
Yuichiro Koyama
18
Naoya Takahashi
14
Takashi Shibuya
13
Ryosuke Sawata
12
Naoki Murata
8
Masato Hirano
8
Zhi Zhong
8
Stefan Uhlich
6
Toshimitsu Uesaka
6
Emiru Tsunoo
6
Yuhta Takida
5
Kazuya Tateishi
4
Kengo Uchida
4
Tatsuya Kawahara
4
Keitaro Tanaka
3
Giorgio Fabbro
3
Parthasaarathy Sudarsanam
2
Dorien Herremans
2
Kin Wai Cheuk
2
Daniel Krause
2
Masafumi Takahashi
2
Mengjie Zhao
2
Hao Shi
2
Chieh-Hsin Lai
2
Archontis Politis
2
Tuomas Virtanen
2
Yosuke Kashiwagi
2
Sharath Adavanne
2
Kazuhide Shigemi
2
Hao Shi
1
Christian Simon
1
Alexander Stempkovskiy
1
Sharada P. Mohanty
1
Dipam Chakraborty
1
Ricardo FalcĂłn PĂ©rez
1
Tatiana Habruseva
1
Rongzhi Gu
1
Marco ComunitĂ
1
Akira Takahashi
1
Junki Ohmura
1
Masafumi Takahashi
1
WeiHsiang Liao
1
Aapo Hakala
1
Jonathan Le Roux
1
Roman Solovyev
1
Shiqi Yang
1
Toshiyuki Kumakura
1
Gordon Wichern
1
Commonly Cited References
Action
Title
Year
Authors
# of times referenced
+
PDF
Chat
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
2019
Daniel Park
William Chan
Yu Zhang
ChungâCheng Chiu
Barret Zoph
Ekin D. Cubuk
Quoc V. Le
10
+
PDF
Chat
AENet: Learning Deep Audio Features for Video Analysis
2017
Naoya Takahashi
Michael Gygli
Luc Van Gool
9
+
First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation
2019
Luca Mazzon
Yuma Koizumi
Masahiro Yasuda
Noboru Harada
8
+
PDF
Chat
Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks
2018
Sharath Adavanne
Archontis Politis
Joonas Nikunen
Tuomas Virtanen
7
+
Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy
2019
Yin Cao
Qiuqiang Kong
Turab Iqbal
Fengyan An
Wenwu Wang
Mark D. Plumbley
7
+
A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection.
2020
Archontis Politis
Sharath Adavanne
Tuomas Virtanen
6
+
Conv-TasNet: Surpassing Ideal TimeâFrequency Magnitude Masking for Speech Separation
2019
Yi Luo
Nima Mesgarani
6
+
PDF
Chat
An Improved Event-Independent Network for Polyphonic Sound Event Localization and Detection
2021
Yin Cao
Turab Iqbal
Qiuqiang Kong
Fengyan An
Wenwu Wang
Mark D. Plumbley
5
+
PDF
Chat
Overview and Evaluation of Sound Event Localization and Detection in DCASE 2019
2020
Archontis Politis
Annamaria Mesaros
Sharath Adavanne
Toni Heittola
Tuomas Virtanen
5
+
PDF
Chat
A Sequence Matching Network for Polyphonic Sound Event Localization and Detection
2020
Thi Ngoc Tho Nguyen
Douglas L. Jones
WoonâSeng Gan
5
+
PDF
Chat
Accdoa: Activity-Coupled Cartesian Direction of Arrival Representation for Sound Event Localization And Detection
2021
Kazuki Shimada
Yuichiro Koyama
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
5
+
PDF
Chat
SDR â Half-baked or Well Done?
2019
Jonathan Le Roux
Scott Wisdom
Hakan ErdoÄan
John R. Hershey
5
+
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
2017
Priya Goyal
Piotr DollĂĄr
Ross Girshick
Pieter Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
4
+
Phase-aware Speech Enhancement with Deep Complex U-Net
2019
Hyeong-Seok Choi
Jang-Hyun Kim
Jaesung Huh
Adrian Kim
Jung-Woo Ha
Kyogu Lee
4
+
A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection
2020
Archontis Politis
Sharath Adavanne
Tuomas Virtanen
4
+
PDF
Chat
Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
2017
Morten KolbĂŠk
Dong Yu
ZhengâHua Tan
Jesper Jensen
4
+
A Four-Stage Data Augmentation Approach to ResNet-Conformer Based Acoustic Modeling for Sound Event Localization and Detection
2021
Qing Wang
Jun Du
Huaxin Wu
Jia Pan
Feng Ma
ChinâHui Lee
4
+
PDF
Chat
Speech Enhancement and Dereverberation With Diffusion-Based Generative Models
2023
Julius Richter
Simon Welker
Jean-Marie Lemercier
Bunlong Lay
Timo Gerkmann
4
+
Universal Speech Enhancement with Score-based Diffusion
2022
Joan SerrĂ
Santiago Pascual
Jordi Pons
Recep OÄuz Araz
Davide Scaini
4
+
A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection
2021
Archontis Politis
Sharath Adavanne
Daniel Krause
Antoine Deleforge
Prerak Srivastava
Tuomas Virtanen
4
+
Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection
2021
Kazuki Shimada
Naoya Takahashi
Yuichiro Koyama
Shusuke Takahashi
Emiru Tsunoo
Masafumi Takahashi
Yuki Mitsufuji
4
+
A Dataset of Dynamic Reverberant Sound Scenes with Directional Interferers for Sound Event Localization and Detection.
2021
Archontis Politis
Sharath Adavanne
Daniel Krause
Antoine Deleforge
Prerak Srivastava
Tuomas Virtanen
4
+
PDF
Chat
SEGAN: Speech Enhancement Generative Adversarial Network
2017
Santiago Pascual
Antonio Bonafonte
Joan SerrĂ
4
+
PDF
Chat
Densely connected multidilated convolutional networks for dense prediction tasks
2021
Naoya Takahashi
Yuki Mitsufuji
4
+
PDF
Chat
Fullsubnet: A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement
2021
Xiang Hao
Xiangdong Su
Radu Horaud
Xiaofei Li
3
+
Real-time Denoising and Dereverberation with Tiny Recurrent U-Net
2021
Hyeong-Seok Choi
Sungjin Park
Jie Hwan Lee
Hoon Heo
Dongsuk Jeon
Kyogu Lee
3
+
D3Net: Densely connected multidilated DenseNet for music source separation
2020
Naoya Takahashi
Yuki Mitsufuji
3
+
Denoising Diffusion Implicit Models
2020
Jiaming Song
Chenlin Meng
Stefano Ermon
3
+
Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection
2016
Naoya Takahashi
Michael Gygli
Beat Pfister
Luc Van Gool
3
+
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection
2021
Thi Ngoc Tho Nguyen
Karn N. Watcharasupat
Ngoc Khanh Nguyen
Douglas L. Jones
WoonâSeng Gan
3
+
Denoising Diffusion Probabilistic Models
2020
Jonathan Ho
Ajay N. Jain
Pieter Abbeel
3
+
MetricGAN: Generative Adversarial Networks based Black-box Metric Scores Optimization for Speech Enhancement
2019
SzuâWei Fu
Chien-Feng Liao
Yu Tsao
Shou-De Lin
3
+
PDF
Chat
Asteroid: The PyTorch-Based Audio Source Separation Toolkit for Researchers
2020
Manuel Pariente
Samuele Cornell
Joris Cosentino
Sunit Sivasankaran
Efthymios Tzinis
Jens Heitkaemper
Michel Olvera
Fabian-Robert Stöter
Mathieu Hu
Juan M. MartĂn-Doñas
3
+
Emergency Vehicles Audio Detection and Localization in Autonomous Driving
2021
Hongyi Sun
Xinyi Liu
Kecheng Xu
Jinghao Miao
Qi Luo
3
+
PDF
Chat
Dual-Path RNN: Efficient Long Sequence Modeling for Time-Domain Single-Channel Speech Separation
2020
Yi Luo
Zhuo Chen
Takuya Yoshioka
3
+
PDF
Chat
Conditional Diffusion Probabilistic Model for Speech Enhancement
2022
Yen-Ju Lu
Zhong-Qiu Wang
Shinji Watanabe
Alexander Richard
Cheng Yu
Yu Tsao
3
+
Denoising Diffusion Restoration Models
2022
Bahjat Kawar
Michael Elad
Stefano Ermon
Jiaming Song
3
+
PDF
Chat
Deep attractor network for single-microphone speaker separation
2017
Zhuo Chen
Yi Luo
Nima Mesgarani
2
+
PDF
Chat
Restoring Degraded Speech via a Modified Diffusion Model
2021
Jianwei Zhang
Suren Jayasuriya
Visar Berisha
2
+
PDF
Chat
DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score
2018
Yuma Koizumi
Kenta Niwa
Yusuke Hioka
Kazunori Kobayashi
Yoichi Haneda
2
+
PDF
Chat
Deep clustering and conventional networks for music separation: Stronger together
2017
Yi Luo
Zhuo Chen
John R. Hershey
Jonathan Le Roux
Nima Mesgarani
2
+
Improved Techniques for Training Score-Based Generative Models
2020
Yang Song
Stefano Ermon
2
+
PDF
Chat
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
2020
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
2
+
PDF
Chat
Permutation invariant training of deep models for speaker-independent multi-talker speech separation
2017
Dong Yu
Morten KolbĂŠk
ZhengâHua Tan
Jesper Jensen
2
+
PDF
Chat
Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms
2018
Robin Scheibler
Eric Bezzam
Ivan DokmaniÄ
2
+
Event-Independent Network for Polyphonic Sound Event Localization and Detection
2020
Yin Cao
Turab Iqbal
Qiuqiang Kong
Yue Zhong
Wenwu Wang
Mark D. Plumbley
2
+
Diffusion Models Beat GANs on Image Synthesis
2021
Prafulla Dhariwal
Alex Nichol
2
+
Speaker-Independent Speech Separation With Deep Attractor Network
2018
Yi Luo
Zhuo Chen
Nima Mesgarani
2
+
PDF
Chat
Single-Channel Multi-Speaker Separation Using Deep Clustering
2016
Yusuf Ziya IĆık
Jonathan Le Roux
Zhuo Chen
Shinji Watanabe
John R. Hershey
2
+
Sound Event Localization and Detection Using Activity-Coupled Cartesian DOA Vector and RD3net
2020
Kazuki Shimada
Naoya Takahashi
Shusuke Takahashi
Yuki Mitsufuji
2