Apache Lens : Unifying Analytics Interface

Apache Lens provides an Unified Analytics Interface. Lens aims to cut the Data Analytics silos by providing a single view of data across multiple tiered data stores and optimal execution environment for the analytical query. It seamlessly integrates Hadoop with traditional data warehouses to appear like one.

At a high level the project provides these features -

  • Simple metadata layer which provides an abstract view over tiered data stores
  • Single shared schema server based on the Hive Metastore - This schema is shared by data pipelines (HCatalog) and analytics applications.
    • OLAP Cube QL which is a high level SQL like language to query and describe data sets organized in data cubes.
    • A JDBC driver and Java client libraries to issue queries, and a CLI for ad hoc queries.
    • Lens application server - a REST server which allows users to query data, make schema changes, scheduling queries and enforcing quota limits on queries.
    • Driver based architecture allows plugging in reporting systems like Hive, Columnar data warehouses, Redshift etc.
    • Cost based engine selection - allows optimal use of resources by selecting the best execution engine for a given query based on the query cost.