How does Okera impact performance?
Okera has been designed with performance in mind. The I/O and data path are tuned to be highly efficient, the clusters have been proven to scale to 800+ nodes and the overall deployment supports federation (deploying multiple hundreds of node clusters).
In many workloads using the most popular Hadoop analytic frameworks (Spark, MapReduce), performance often improves between 10-20% due to the data path efficiency. One way to think about this is that the MapReduce/Java application offloads the initial processing (I/O, file parsing, decompression, filtering) to Okera and we've invested in making this as fast as possible.