Overhauling Apache Kylin for the cloud


Recently, the Apache Kylin community released a major update with the general availability of Kylin 4. Kylin 4 continues the mission to provide a unified, high-performance, cloud-friendly, open source OLAP (online analytical processing) platform. Kylin 4 upgrades the Kylin architecture to make it easy to deploy and scale in the cloud. The new release features three major platform updates and myriad other improvements.

First, Kylin 4 replaces its previous HBase storage engine with Apache Parquet, making it possible to decouple compute and storage for unlimited independent scalability. Second, Kylin 4 unifies the compute engine and removes any previous dependencies on the Hadoop ecosystem. This makes resource allocation much more flexible, resulting in a significant reduction in total cloud resource usage and associated costs. Third, by introducing a brand new, fully distributed query engine, Kylin 4 makes cubing duration and query latency much more performant compared to previous releases.

In this article, we will dive into the details of these new innovations and the new capabilities they enable.

What is Apache Kylin?

Apache Kylin is an open source distributed analysis engine that provides SQL query interfaces above Hadoop and Spark, along with OLAP capabilities to support extremely large data sets. It was initially developed at eBay and contributed to the Apache Software Foundation. Kylin can query massive relational tables with sub-second response times.

Kylin’s core idea is the precomputation of result sets, meaning it calculates all possible query results in advance according to the specified dimensions and measures. Kylin basically exchanges space for time to speed up OLAP queries with fixed query patterns.

Apache Kylin lets you query billions of rows at sub-second latency in three steps:

Copyright © 2021 IDG Communications, Inc.



Source link