Soundlocd: An efficient conditional discrete contrastive latent diffusion model for text-to-sound generation