Q43 — AWS DEA-C01 Ch.1
Question 43 of 100 | ← Chapter 1
A company is planning to migrate on-premises Apache Hadoop clusters to Amazon EMR. The company also needs to migrate a data catalog Into a persistent storage solution. The company currently stores the data catalog in an on-premises Apache Hive metastore on the Hadoop clusters. The company requires a Serverless solution to migrate the data catalog. Which solution will meet these requirements MOST cost-effectively?
- A. Use AWS Database Migration Service (AWS DMS) to migrate the Hive metastore into Amazon S3. Configure AWS Glue Data Catalog to scan Amazon S3 to produce the data catalog.
- B. Configure a Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use AWS Glue Data Catalog to store the company's data catalog as an external data catalog. ✓
- C. Configure an external Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use Amazon Aurora MySQL to store the company's data catalog.
- D. Configure a new Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use the new metastore as the company's data catalog.
Correct Answer: B. Configure a Hive metastore in Amazon EMR. Migrate the existing on-premises Hive metastore into Amazon EMR. Use AWS Glue Data Catalog to store the company's data catalog as an external data catalog.
Explanation
答案B是最符合成本效益的选择。首先,在AmazonEMR中配置Hive元存储库可以直接迁移现有的本地Hive元存储,实现无缝对接。其次,使用AWSGlueDataCatalog作为外部数据目录,可以利用其强大的功能和灵活性,同时满足无服务器的需求,减少了运维成本和复杂性。相比之下,选项A中使用AWSDMS迁移到S3并配置Glue扫描S3可能增加额外的复杂性和成本。选项C中使用AmazonAuroraMySQL存储数据目录增加了额外的费用。选项D只是简单配置新的元存储库,没有充分利用现有资源和外部数据目录的优势。因此,综合考虑,选项B是最优解。