Q29 — AWS SAA-C03 Ch.3

Question 29 of 65 | ← Chapter 3

Q159. A company has an application that places hundreds of .csv files into an Amazon S3 bucket every hour. The files are 1GB in size. Each time a file is uploaded, the company needs to convert the file to Apache Parquet format and place the output file into an S3 bucket.Which solution will meet these requirements with the LEAST operational overhead?

Correct Answer: D. Create an AWS Glue extract, transform, and load (ETL)job to convert the .csv files to Parquet format and place the output files into an S3 bucket. Create an AWS Lambda function for each S3PUT event to invoke the ETL job.

Explanation

To meet the requirement of converting hundreds of .csv files into Apache Parquet format and placing the output files into an S3 bucket with the least operational overhead, a solutions architect should create a single AWS Glue ETL job to convert the files and place the output files in an S3 bucket. Therefore, option D is the correct answer.Option A suggests creating an AWS Lambda function for each S3 PUT event, which could result in high operational overhead and costs due to the large number of files being converted.Option B suggests creating an Apache Spark job, which would require additional infrastructure and may not be necessary to convert a large number of relatively small files.Option C suggests using AWS Glue and Amazon Athena, which could add unnecessary complexity to the solution and may not be necessary for this scenario.Creating an AWS Glue ETL job can handle the conversion of hundreds of .csv files into Apache Parquet format, and placing the output files into an S3 bucket without triggering an AWS Lambda function for each file. By using AWS Glue, the ETL job can scale up or down automatically based on the size of the input data and the complexity of the transformation. This approach will reduce operational overhead and provide a scalable and cost-effective solution.