Q94 — AWS DEA-C01 Ch.1

Question 94 of 100 | ← Chapter 1

A company uses Amazon S3 as a data lake. The company sets up a data warehouse by using a multi-node Amazon Redshift cluster. The company organizes the data files in the data lake based on the data source of each data file. The company loads all the data files into one table in the Redshift cluster by using a separate COPY command for each data file location. This approach takes a long time to load all the data files into the table. The company must increase the speed of the data ingestion. The company does not want to increase the cost of the process. Which solution will meet these requirements?

Correct Answer: D. Create a manifest file that contains the data file locations. Use a COPY command to load the data into Amazon Redshift.

Explanation

在这种情况下,创建一个包含数据文件位置的清单文件(manifestfile)是一种有效的解决方案。使用清单文件可以让COPY命令一次性加载多个数据文件,避免了为每个数据文件位置单独使用COPY命令,从而提高了数据摄取的速度。A选项使用provisionedAmazonEMR集群可能会增加成本。B选项先加载到AmazonAurora再通过AWSGlue作业加载到AmazonRedshift过程复杂且可能增加成本。C选项使用AWSGive作业复制文件再加载也不是最优且成本可能增加。综上所述,D选项是满足要求的最佳答案。