Recently, Apache announced the graduation of Falcon to a Top Level Project. This is a milestone for the project and a testimony to the need of the hour for a robust big data processing and management solution for Apache Hadoop ecosystem. This project, which is now being widely used by various industries, such as, mobile, healthcare, mobile applications, software solutions and technology, had its humble beginnings at InMobi.
This is how it all started. During late 2011, the data team at InMobi was building data and processing pipelines to process the mobile ad data for analytics. With the mobile advertising space undergoing change at breakneck speed, they realized that they would have to build new pipelines and change existing pipelines rapidly. They realized that managing and operating all these pipelines can be simplified if some of the common requirements of a data processing pipeline were abstracted out and handled by a single platform. This idea was the genesis of Falcon. A platform that simplified data motion, orchestration of data pipelines and lifecycle management in the Hadoop ecosystem.
After reaping the benefits of the platform for a few months, the team said - “Wow! Falcon is such a life saver. We should tell our fellow InMobians about this.” More and more data processing at InMobi started happening on Falcon.
After a few more months, the team again said - “Wait a minute. Why should InMobi alone benefit from this platform? Lets tell the whole technical community about this.” So, during early 2013 Falcon became Apache Falcon (incubating). Around the same time, recognizing the relevance of Apache Falcon, HortonWorks started collaborating with InMobi to take the initiative forward. Apache Falcon is being shipped along with HortonWorks Data Platform since version 2.1.
Once it was in the open, Apache Falcon received all the love and more importantly, contributions of the community. Today, with 400+ pipelines and 2000+ data feeds, it forms the backbone of InMobi’s mobile data analytics pipelines. It has a wide range of capabilities that include:
- Higher layer of abstractions for resources, data and processing pipelines, so it is easier to develop and manage applications.
- Handling of complex data processing logic and orchestration of the processing applications.
- Handling of data movement, data discovery and data lifecycle management.
- Promotion of separation of concerns - Lets applications focus on processing logic and offloads the management to itself.
- Usage of polyglot programming - Doesn’t reinvent the wheel, instead uses widely used Hadoop platform and products.
Gaining more acceptance by the day and being bolstered by Apache Software Foundation’s meritocratic process and principles, Apache Falcon graduated to a top level project in December 2014.
Being promoted as a top level project is just the beginning of a new journey for Apache Falcon. It is going to get bigger and better. Some key features planned are:
- Declarative and UI based ETL (Extract, Transform, Load) capabilities to help users build big data processing pipelines quickly.
- Stream processing support intended to provide a seamless processing across batch and stream.
In addition, some of the improvements that have been planned are:
- High availability
- Better management and operability of Falcon and the data processing pipelines deployed on Falcon
- Easier installation, development and debugging capabilities.
You can be part of the journey too. Visit the Falcon webpage to learn more on how you can contribute.