Retrieval Augmented Generation (RAG) effectively mitigates the hallucination and knowledge cut-off issues by Large Language Models with retrieving external knowledge. However, existing research overlooks the evaluation of text chunking, which is a critical connection between retrieval and generation phrase of RAG system. Focus this gap, this study constructs an evaluation framework based on a full factorial design to investigate the impact of three strategies—Character-level, Sentence-aware, and Structure-aware chunking, on the retrieval efficiency, generation quality, and computational cost of RAG systems across varying retrieval depths ($k=3, 5, 8$). The experiments are conducted on the HotpotQA multi-hop reasoning dataset using the DeepSeek-R1:1.5b model for end-to-end inference. The results reveal significant performance inversion phenomenon. While Character-level chunking achieves the highest Recall and Evidence Hit Rate during the retrieval phase, it leads to the worst F1 score and Exact Match Rate during the generation phase. In contrast, Structure-aware chunking, despite a disadvantage in retrieval ranking, achieves the highest F1 Score and Exact Match Rate in the generation phase by preserving complete paragraph logic. Furthermore, qualitative analysis indicates that both Character-level and Sentence-aware chunking, lacking macro-context, tends to induce hallucinations, while Structure-aware chunking effectively supports correct rejection. This study confirms that Semantic Integrity of the context is more critical than mere Physical Coverage in the RAG systems. By providing logical units with high signal-to-noise ratios, Structure-aware chunking enhances the reliability of complex reasoning. These findings provide theoretical evidence and engineering guidance for optimizing RAG systems.