I’m trying to sift through information on big data technologies. I have data stored in S3 that I want to analyze using EMR. However, when I try to research the pros and cons of Presto, Hive, Spark, or any other technology, I end up drowning in company sponsored benchmark reports or written by people with clear biases.

So, my ask: Am I better off just experimenting with each tool, or do you have any suggested that offer opinions with substance, and not just buzzwords?

