Multiple-view diffusion model has recently shown potential in novel view synthesis. However, most existing methods depend on accurate camera poses, which either require costly annotations or computation. We address this challenge with a pose-free framework, MV-FusionRecon, which integrates priors from a 3D reconstruction model to a diffusion model. Our approach first predicts camera poses and renders coarse target views using the reconstruction model, then injects its geometric features into the diffusion backbone. During inference, we leverage the explicit 3D Gaussian Splatting representation from the reconstruction model to select the most informative reference views from a large candidate set, overcoming context-length limitations. Experiments show that our method improves view synthesis quality under pose-free conditions, bridging the gap between 3D reconstruction and diffusion-based generation.