1 Introduction

Image reconstruction is an inverse mathematical problem for mapping the sensor domain information to the image domain. A good image reconstruction is a key component for establishing high quality images from sensors. Traditionally in MR imaging the k-space information is used in a compressed sensing framework to address the problem of image reconstruction [9]. Recently, there has been interest in finding the mapping from the frequency domain to the image domain using deep learning techniques [10]. The majority of such methods have the aim of accelerating the image acquisition without compromising image quality. In this work, we address a different problem. We aim to correct motion artefacts with an end-to-end setup using motion artefact corrupted k-space data as the input and producing high quality images as the output.

Fig. 1.
figure 1

Examples of a good quality CINE CMR image (a), corresponding k-space (b) and a corrupted image (c), where red arrows indicate the artefacts and the corresponding k-space (d). The k-space corruption process is able to simulate realistic motion-related artefacts. (Color figure online)

High diagnostic accuracy of image analysis pipelines requires high quality medical images. Misleading conclusions can be drawn when the original data are of low quality, in particular for cardiac magnetic resonance (CMR) imaging. CMR images can contain a range of image artefacts [2], and improving the quality of images acquired by MR scanners is a challenging problem. Traditionally, low quality images are excluded from further analysis. However, excluding images not only diminishes the research value of the cohort but also raises the issue of how to robustly and efficiently identify images for exclusion.

The UK Biobank is a large-scale study with all data accessible to researchers worldwide, and will eventually consist of CMR images from 100,000 subjects [12]. To maximize the research value of this and other similar datasets, automatic artefact correction tools are essential. One specific challenge in CMR is motion-related artefacts such as mis-triggering, arrhythmia and breathing artefacts due to incomplete breath-holds. These can result in temporal and/or spatial blurring of the images, which makes subsequent processing difficult [2]. Examples of a good quality image and a synthetic motion artefact corrupted image are shown in Fig. 1a and c for a short-axis view CINE CMR scan. The corresponding k-space data are shown in Fig. 1b and d. In this work, our goal is to recover the good quality image (Fig. 1a) from the corrupted k-space data (Fig. 1d) directly using deep learning.

Our approach is based on automatically correcting for artefacts during the reconstruction process. We use a deep neural network for correcting artefacts and evaluate our method on a synthetic dataset of 2000 2D+time CMR images from the UK Biobank. We also evaluate the performance on real artefact cases to showcase the performance of our method. There are two major contributions of this work. First, we address the problem of motion artefact correction directly from k-space by leveraging the rich information available and validate it on a large-scale CMR dataset. Second, we introduce an adverserial component to the Automap framework [18] to increase the realism and quality of the images.

2 Background

Deep learning techniques have been utilized for inverse problems with considerable success [10]. This success has motivated the medical image analysis community to use deep learning on multiple image reconstruction problems such as CT [5] and MR [17]. The main motivation has been to accelerate the image acquisition using under-sampling.

In the literature, there have been four strategies to approach the problem of estimating high quality images from corrupted (or under-sampled) k-space [4]. One choice is to correct the k-space before applying the inverse Fourier transform (IFT). Han et al. [4] proposed the use of convolutional networks for k-space correction coupled with weighting layers on k-space. A more common approach is to use the IFT on k-space and learn a mapping between the corrupted images and good quality images. Kwon et al. [7] proposed using multi-layer preceptrons to find this mapping. This group of approaches are essentially denoising techniques, which do not directly utilize the information in the frequency domain. To remedy this broken link an alternative strategy is to use iterative updates between k-space and the image domain using a cascaded network [14, 15]. This group of methods aims to use networks in the image domain to improve the image and feed back the improved image information to k-space with a data consistency term. More recently, Zhu et al. [18] proposed an end-to-end image reconstruction approach (Automap) for MR and evaluated it on under-sampled k-space data.

In the context of CMR artefact correction, early works focused on changes in acquisition schemes [13] and analytical methods for motion artefact reduction [6]. For automatic correction of the CMR, Lotjonen et al. [8] used short-axis and long-axis images to optimize the locations of the slices using mutual information as a similarity measure. However, these methods cannot address the mis-triggering problem and focus only on the in-plane motion of the heart.

3 Methods

The proposed framework of using a deep neural network for motion artefact correction on k-space data is based on a generative-adversarial network setup. Our aim is to train a successful generator to reconstruct good quality images from motion artefact corrupted k-space data.

3.1 Network Architecture

The algorithm consists of a generator and a discriminator as illustrated in Fig. 2. Our generator network follows a similar architecture to [18], which was originally developed for image reconstruction using domain specific information. In our case we additionally use a discriminator to increase the robustness and realism of the reconstructed images. The input to the network is a complex n-by-n k-space matrix, which we concatenate into a \((2 \times n \times n)\)-by-1 vector. We then use two fully connected layers: FC1 with \(2 \times n \times n \) neurons and FC2 with \( n \times n \) neurons. The output from FC2 is reshaped and two convolutional layers with 64 filters and \(5 \times 5 \) filter size are used. After that a deconvolutional layer with 64 filters of size \(7 \times 7 \) is applied and finally a \( 1 \times 1 \) layer is used to aggregate the results into an image.

Fig. 2.
figure 2

Generative adversarial Automap architecture for motion artefact correction.

The discriminator takes a generated image or a real image as input and uses two convolutional layers and a final dense layer for classification. The final output of the discriminator a decision as to whether the generated image looks real or fake. By using outputs of the generator (artefact corrected images) and the real images from the dataset the discriminator is trained to distinguish between the artefact corrected images and high quality images. The loss function for the model is a mean squared error loss between the predicted image and real image and combined with a Wasserstein loss [3], which takes the mean of the differences between the two images. The weights of the discriminator are frozen during the training of the whole model and trained separately only with the Wasserstein loss, which is shown to be effective for inverse problems [1].

3.2 Implementation Details

The parameters of the convolutional and fully-connected layers were initialized randomly from a zero-mean Gaussian distribution and trained until no substantial progress was observed in the training loss. In this study, we use the RMSprop optimizer to minimize mean squared error. One important aspect during training is the activity regularizer, which is used after the deconvolutional layer. In our implementation, we first trained without this regularizer, finding that including it early in training led to the loss being trapped in poor local minima. Once training converged without the regularizer, it was then added, which led to the generation of sharper looking images.

First, we trained our network data from the ImageNET dataset to learn a variety of frequencies from k-space as described in [18] without the regularization term. Then, the network was trained for 50 epochs with the regularization term. Finally, we introduced the cardiac MR data and trained our network for an additional 150 epochs. The training was stopped early if no significant improvement was observed. An improvement was considered significant if the relative increase in performance was at least 0.5% over 20 epochs. To better generalize the model we applied data augmentation by rotating images in increments of 90 \(^\circ \). We also found that the success of our implementation was highly sensitive to the choice of learning rate, which we set to be 0.00002.

During training, a batch-size of 20 2D k-space datasets was used. We used the Keras Framework with Tensorflow backend for implementation and training the network took around 3 days on a NVIDIA Quadro 6000P GPU. Correction of a single image sequence took less than 1s once the network was trained.

4 Experimental Results

We evaluated our algorithm on a subset of the UK Biobank dataset consisting of 2000 good quality CINE MR acquisitions. 50 temporal frames from each subject at mid-ventricular level were used to generate synthetic motion artefacts. We used 75000 2D images for training and 25000 images for testing. The data were chosen to be free of other types of image quality issues such as missing axial slices and were visually verified by an expert cardiologist. The details of the acquisition protocol of the UK Biobank dataset can be found in [12].

4.1 K-space Corruption for Synthetic Data

We generated k-space corrupted data in order to simulate motion artefacts. We followed a Cartesian sampling strategy for k-space corruption to generate synthetic but realistic motion artefacts [11]. We first transformed each 2D short axis sequence to the Fourier domain and changed 1 in 3 Cartesian sampling lines to the corresponding lines from other cardiac phases to mimic motion artefacts. We added a random frame offset when replacing the lines. In this way the original good quality images from the training set were used to generate corresponding CMR artefact images. This is a realistic approach as the motion artefacts that occur from mis-triggering often arise from similar misplacement of k-space lines.

Fig. 3.
figure 3

Synthetic dataset results. Corrupted k-space (a), inverse Fourier transform (b), proposed method (c) and original good quality image (d). The proposed method is able to correct the motion artefacts, but loses some structure.

4.2 Quantitative Results on Synthetic Dataset

We compared our algorithm with a reconstruction using the IFT and also with two variants of the proposed deep learning framework: one without the adversarial component and one with the adversarial component but trained only using ImageNET data. The results are reported in Table 1. We report root mean square error (RMSE) and peak signal-to-noise ratio (PSNR) results for motion artefact correction, defined as follows:

$$\text {RMSE}= \sqrt{ \dfrac{1}{N_{x}N_{y}} \sum _{x=0}^{N_{x}} \sum _{y=0}^{N_{y}} (r(x,y)-p(x,y))^2 } $$
$$ \text {PSNR}= 20 log_{10} \left( \dfrac{ \sum \limits _{x=0}^{N_{x}} \sum \limits _{y=0}^{N_{y}} r(x,y)^2 }{\sqrt{ \sum \limits _{x=0}^{N_{x}} \sum \limits _{y=0}^{N_{y}} (r(x,y)-p(x,y))^2 }} \right) $$

where \(N_{x}\) and \(N_{y}\) denote the number of pixels in the x and y directions and r and p represent reference and predicted images.

Alongside these two measures, we also computed structural similarity index (SSIM) [16] results. SSIM has been shown to provide sensitivity to structural information and texture. The SSIM between two images is defined as follows for any image region x and y:

$$\text {SSIM}(x,y)= \dfrac{(2 \mu _{x}\mu _{y}+c_{1}) (2 \sigma _{xy}+c_{2})}{ (\mu _{x}^{2}+\mu _{y}^{2}+c_{1}) (\sigma _{x}^{2}+\sigma _{y}^{2}+c_{2})} $$

where \(\mu _{x}\) and \(\mu _{y}\) are the average intensities for regions x and y, \(\sigma _{x}\) and \(\sigma _{y}\) are variance values for regions x and y, \(\sigma _{xy}\) is the covariance of regions x and y and \(c_{1}\) and \(c_{2}\) are constant values for stabilizing the denominator.

Table 1 shows that the cardiac trained adversarial Automap technique is capable of correcting motion artefacts with high accuracy compared to the other techniques. The quality of the images increased particularly in terms of low RMSE and high PSNR using the ImageNET trained Automap reconstruction compared to the IFT reconstruction approach. However, some structure has been lost in comparison with the IFT approach, which can be seen from the lower SSIM scores with the proposed method. Training with cardiac images helped to recover some SSIM, but it was still lower than the IFT approach. One possible explanation for this is that in the proposed method, the network has been trained to minimise the MSE, which commonly causes smoothed-out or blurred looking images. An example of such a case can be seen in Fig. 3, where the proposed method corrects the artefact but loses some structural information. The adversarial training improves the performance of the Automap model in all three metrics and especially in terms of SSIM.

Table 1. Mean RMSE, PSNR, and SSIM results of motion artefact correction from k-space data.

4.3 Qualitative Results on Real Motion Artefact Case

To illustrate the performance of our technique on artefact correction, we applied it to a dataset from the UK Biobank containing mis-triggering artefacts. The visual results in Fig. 4 show improved image quality compared to the IFT reconstructed image.

Fig. 4.
figure 4

Example of a mis-triggering artefact from the UK Biobank dataset. K-space data (a), motion corrupted image (b) and proposed method (c). The proposed method is able to correct the motion artefacts.

5 Discussion and Conclusion

In this paper, we have proposed an end-to-end image artefact correction pipeline for 2D CINE CMR, and evaluated it on the large-scale UK Biobank dataset. We have shown the value and shortcomings of deep learning based reconstruction for motion artefact correction. We have demonstrated that the generic Automap framework can aid in correcting motion artefacts using an adversarial setup, outperforming inverse Fourier transform. To the best knowledge of the authors, this is the first paper that has addressed the motion artefact correction problem in MR directly from k-space data. The general applicability of the Automap framework is limited by its high memory requirement, which is caused by the fully connected layers at the start of the network.

In future work, we plan to investigate more appropriate loss functions to attempt to recover the lost structural information in the reconstructed images. Moreover, we will investigate the robustness of our technique on our own clinical data, which we expect to contain more motion corruption compared to UK Biobank data.