ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) | 2019-02-06

Speech Augmentation Using Wavenet in Speech Recognition

Jisung Wang, Sangki Kim and Yeha Lee


Data augmentation is crucial to improving the performance of deep neural networks by helping the model avoid overfitting and improve its generalization. In automatic speech recognition, previous work proposed several approaches to augment data by performing speed perturbation or spectral transformation. Since data augmented in this manner has similar acoustic representations as the original data, it has limited advantage in improving generalization of the acoustic model. In order to avoid generating data with limited diversity, we propose a voice conversion approach using a generative model (WaveNet), which generates a new utterance by transforming an utterance to a given target voice. Our method synthesizes speech with diverse pitch patterns by minimizing the use of acoustic features. With the Wall Street Journal dataset, we verify that our method led to better generalization compared to other data augmentation techniques such as speed perturbation and WORLD-based voice conversion. In addition, when combined with the speed perturbation technique, the two methods complement each other to further improve performance of the acoustic model.
NeurIPS 2018 ML4H Workshop | 2018-12-08

Integrating Reinforcement Learning to Self Training for Pulmonary Nodule Segmentation in Chest X-rays


Machine learning applications in medical imaging are frequently limited by the lack of quality labeled data. In this paper, we explore the self training method, a form of semi-supervised learning, to address the labeling burden. By integrating reinforcement learning, we were able to expand the application of self training to complex segmentation networks without any further human annotation. The proposed approach, reinforced self training (ReST), fine tunes a semantic segmentation networks by introducing a policy network that learns to generate pseudolabels. We incorporate an expert demonstration network, based on inverse reinforcement learning, to enhance clinical validity and convergence of the policy network. The model was tested on a pulmonary nodule segmentation task in chest X-rays and achieved the performance of a standard U-Net while using only 50% of the labeled data, by exploiting unlabeled data. When the same number of labeled data was used, a moderate to significant cross validation accuracy improvement was achieved depending on the absolute number of labels used.
RSNA 2018 | 2018-11-27

Deep Learning-Based Computer-Aided Detection System for Multiclass Multiple Lesions on Chest Radiographs: Observers’ Performance Study


To evaluate the added value of a deep-learning based computer-aided detection (CAD) system for multiclass multiple lesions on radiographs when radiologists read chest radiographs.
RSNA 2018 | 2018-11-27

Deep Learning-Based Automatic Chest PA Screening System for Various Devices and Hospitals


To ensure generalization performance in various hospitals, we developed a deep learning based automatic Chest PA screening System which can detect 5 class findings and performs well on various devices. Its performance was evaluated by using FROC and FOM in various devices.
RSNA 2018 | 2018-11-25

CNN-based Image Super-Resolution for CT Slice Thickness Reduction using Paired CT Scans for Improving Robustness of Computer-aided Nodule Detection System


To evaluate the effectiveness of a slice thickness reduction technique in computed tomography(CT) scans using convolutional neural network(CNN)-based super-resolution(SR) network for improving the sensitivity of lung nodule detection in thick section CT scans.


Thanks for subscribing to our newsletter.