Core Concepts
A novel method, MaSkel, is proposed to directly generate high-quality human X-ray images from human masking images, without the need for harmful radiation exposure.
Abstract
The paper introduces a new approach, MaSkel, designed to generate human X-ray images directly from human masking images. This is the first work to propose a model for predicting whole-body X-rays from masking observations.
The key highlights of the work are:
The authors have built two synthetic human X-ray datasets with resolutions of 64x64 and 256x256, each comprising 10,000 images, to address the data limitation problem.
A two-stage training strategy is employed, where the first stage trains an encoder using the Masked Autoencoder (MAE) technique to capture a high-quality latent representation of X-ray images. The second stage maps the masking images to a similar latent space and uses a Vector Quantized Variational AutoEncoder (VQ-VAE) to generate the final X-ray images.
Qualitative and quantitative evaluations, including a user study with medical professionals, demonstrate the effectiveness of the MaSkel model in generating anatomically accurate and realistic human X-ray images from masking inputs.
While the model performs well on clean masking images, there is a noticeable decline in quality when applied to more irregular, clothed human surfaces. This points to future research directions in improving the model's generalization capabilities.
Overall, the MaSkel approach represents a significant advancement in the field of human anatomy modeling, providing a novel way to generate pseudo-X-ray images without harmful radiation exposure, with potential applications in medical diagnostics, digital animation, and ergonomic design.
Stats
The PSNR of the reconstructed X-ray images in the first-stage training is close to 34 dB.
The SSIM of the reconstructed X-ray images in the first-stage training is close to 1.0.
The LPIPS of the reconstructed X-ray images in the first-stage training is close to 0.0.
The PSNR of the generated X-ray images in the second-stage training is about 23.5 dB.
The SSIM of the generated X-ray images in the second-stage training is 0.92.
The LPIPS of the generated X-ray images in the second-stage training is close to 0.0.
Quotes
"As we know, this is the first work to propose a model to generate human X-ray images from human masking images."
"We have built two synthetic human X-rays datasets in 64 × 64 and 256 × 256 resolution respectively, each comprises 10, 000 images."
"We design a two-stage training strategy for the generation of customized X-ray images from masking images."
"Through the application of similarity metrics and a user study conducted with medical professionals, we demonstrate the efficacy of our method in generating well-structured and anatomic human X-rays."