Building Cloud Native Applications
The cloud transformation journey for many organizations generally begin by using the cloud for early prototyping and executing IT projects to achieve agility and quick turnaround time. This also includes less mission critical applications which can be hosted either on an internal private cloud infrastructure or the public cloud. The next level is the migration of mission critical production applications to the cloud, as part of the enterprise IT roadmap to reduce the cost and gain in efficiency in running the business. This is generally done after evaluation of the enterprise application portfolio, to determine if the application can be moved to the cloud or needs to be retired.
A large majority of these enterprise applications (that are deemed to be applicable for cloud migration) end up being migrated to the cloud, primarily focusing on the virtualization of the environment in a pay-as-you-go model. This is generally governed by factors related to legacy technologies being involved and the need to have the migration done in a short period of time. This type of migration is called “Lift and Shift” due to the fact that the application itself is not touched and migrated as-is. This potentially does match the short term goals, however since these applications were not built for the cloud, they are likely to experience performance and latency issues and will not give you the value for the money you are spending to keep them running. There is a need for these enterprise IT applications to exploit the cloud services and offerings to truly gain agility in running your business. In fact, we need to look at each migrated enterprise application as a service on the cloud (similar to the cloud vendors) for your organization. This can be only achieved, if they are built for the cloud (i.e. are cloud native applications). Enterprise application migration to the cloud can be gradually transformed into a cloud native application to ensure that the business continuity requirements, as well as the longer term IT transformation goals of cost, scale and innovation are met.
Cloud Native Applications
Twelve Factor-App Methodology
The starting point is to look at the characteristics specified in the Twelve Factor-App methodology. These are learnings and best practices based design patterns created based on experiences in building application services on the Heroku platform as a service (PaaS) cloud and is now the de-facto standard for cloud native applications. The key characteristics being
- The application must be built for the cloud or be cloud native.
- Scalability and Resilience support should not require significant re-architecture or external service support
- The application should have clear contract definitions with the underlying environment (operating system and services) to ensure a level of cloud agnostic behaviour
- The application should be agile and ready for DevOps with automated deployment across the development and production environments
Distributed System underpinning
While building Cloud Native applications, the underpinning environment is of a distributed system. As a result, Eric Brewers CAP theorem plays an important role in the way we architect the application. According to the theorem distributed systems cannot achieve Consistency (same data is seen on all nodes at the same time), Availability (a guaranteed response to every request) and Partition Tolerance (partial system failure or message loss does not affect the system functionality) all together. We will see most distributed applications and databases are classified based on this theorem, tending towards either providing availability and partition tolerance or partition tolerance and consistency. The key requirements of the application and its services need to be rated against the need for Consistency, Availability and Partition tolerance, with the selection of two of the three which are most critical. These would govern the type of design patterns which need to be applied to satisfy this requirement.
Cloud Native Application Design Patterns
The key aspects of a cloud native application is to be resilient to failures, as there are multiple moving parts in a distributed cloud environment which could affect the application. This leads us to other related aspects as well, including the ability of the application to continue functioning with parts of the system being down and a self-healing mechanism to recover once the related service is back on. In order to achieve this, it is recommended to try and break your existing monolithic application into micro services. The recent trend of moving to micro services is seen to be still evolving, as advocates like Martin Fowler are proposing a Monolith First architecture , which talks of starting off with building the application as a Monolith and then gradually breaking the system into micro services as the complexity of the monolith increases.
This fits in well with the cloud application migration strategy as well. You can break the cloud migration into an iterative exercise where you start with a Monolith migration and then look at moving parts of the system as micro services. The SoundCloud journey presented by Phil Calçado his series of blogs on Dealing with the Monolith, is a perfect example to look at in this regard, where they gradually transitioned from a monolith to the micro services architecture pattern. Some of the key architecture patterns for Cloud Native Applications are listed below:
Micro services the ability to create independent, decoupled, stateless RESTful services around a specific business capability interconnected via a reliable asynchronous fabric as described by Martin Fowler .
Container based scaling
With Micro services the ideal mechanism of being able to host these services is on light weight containers. The industry is recognizing this and we see all major cloud vendors including Azure, AWS and Google providing container management and orchestration services.
As you start decomposing your application into micro services, one of the key aspects which come to play include being able to manage all these services and ensure that there is proper level of security and monitoring of the same. As the number of services increase the API Gateway pattern comes into play where all client applications will be able to access all services via a single entry point. The API Gateway taking care of services being routed to the backend services. This will also help in supporting backend services that support different protocols like SOAP or REST for communication.
The API Gateway can be complimented with a load balancer or router for higher stability. The API gateway may also implement security or can be integrated on to provide OAuth or SAML based authenticated access to the services.
All micro services are meant to be decoupled in nature and are scaled independently and potentially in an auto scale mode. For clients of a particular service to be able to access it, need the ability to dynamically know the service availability and location details. All service calls are routed via a service discovery agent which is aware of all the registered services running in the distributed environment.
This is a resilience and self-healing pattern to ensure a failure is self-contained. In a distributed cloud environment an application is dependent on multiple resources and services. Any of these services can fail at any particular point in time. The application should be resilient enough to detect the same and handle gracefully without cascading the failure to other parts of the system. Also the self-healing ability means it should be able to recover from it as well once the failure has been rectified.
The circuit breaker has three states
- Closed: Indicating the service can be called as either it has succeeded previously or is within the threshold failure limit
- Open: Indicating there is a failure beyond the threshold limit or the suspected failure has been confirmed from the Half Open state after a time-out period
- Half-Open: Indicating that there is a suspected failure which can be retried once after a time-out
The circuit breaker acts as a proxy for service that may fail. It avoids calling the service if the threshold of failures is reached and returns an exception. It also has a self-healing mechanism as it monitors when the service is back up and can start normal operations. This is achieved by a complimentary Retry Pattern, where once a service has failed, it will be able to re-attempt calling the service transparently.
This draws references from the shipping or aircraft industry, where there are partitions created in a ship to ensure if one of the section is breached it can be sealed off, without affecting the rest of the ship. This is again is related to resilience of the cloud application, where we can have a client specific partitioning of services. For example, the web and mobile clients hit different set of services ensuring they are insulated from partition specific failures
As multiple applications / clients (which could be tenants or other client services or application instances) start accessing the service APIs (or backend resources), there may be some of these which exceed the capacity which can be handled. As a result, this could cause the service requests to start failing. The throttling design pattern ensures that access to a particular service API is controlled or moderated to ensure that the SLA is met.
From the resilience point of view, this pattern is generally combined with the Auto-scaling pattern, as described in the book Cloud Design Patterns by Microsoft . Auto-scaling is to configure the services to automatically provision additional instances as an when the defined capacity limit is reached. However, there will be a short time window required for the auto-scaling to kick-in. This window can be used to throttle the service access that will ensure the service still continues to be available and there after the throttling can be relaxed. Auto-scale will also work in reducing the number of instances when the demand is low to reduce the cost of the cloud services consumed.
Most of the design patterns related to cloud favour partition tolerance and availability over being immediately consistent. This means that while the applications and services may not guarantee consistency of data but eventually the last writes will be reflected across the system. This ensures the scalability of the cloud application and the data providing more efficiently. As an example, an e-retail site allowing the orders to go through with inventory updates eventually updated. The real life example here is a Bank handling of cheque deposits. The bank cheque entries are made in the system but the backend transfer of funds are initiated separately at a later point in time.
One of the key aspects to this is how to handle failures as the micro-services are decoupled and stateless. In most cases, these services are idempotent and may not require a roll back, however a compensating service may need to be designed to undo the steps performed during the eventual consistent set of operations. Applying this to the examples we have seen earlier, the e-retail application allowed orders to be placed without immediate inventory checks and at times when we have low inventory, these can be handled by a cancel order compensatory process. And in the case of bank scenario where there is a problem in the cheque processing, bounced cheque handling can be considered as a compensatory operation.
Use case based Database selection
The choice of database and storage options is a critical factor in ensuring we create the most optimum cloud application or service that can handle the scale. Cloud native applications may use multiple database options to cater to specific use cases. Here are some guidelines for the same
Benefit / Use case
|Key Value Based NoSQL||Azure Table Storage, Redis, Memcache DB||Row based search using a partition key , potentially unrelated data||CPU , Memory intensive computation|
|Column Based NoSQL||HBase, Casandra, Redshift||Two dimensional array with each row having multiple key value pairs (columns) , unstructured but related data||Good for aggregation queries involving large data sets|
|Document Based NoSQL||Azure Document DB , Mongo DB, Couch DB||Key / Value pair where data is stored as a collection of nested and serialized documents in client friendly formats including JSON||Fast access to nested and complex entities.|
|Graph based NoSQL||Neo4J , FlockDB, GraphDB, InfiniteGraph||Data stored in a tree structure with nodes (represent entities), edges (represent relationships) and properties (data on the entity)||Mathematical processing, handling complex relationship processing|
|RDBMS||SQLServer, Oracle, MySQL||Transactional data requiring high level of consistency and data integrity||OLAP/OLTP|
Vendor offerings for Cloud Native Applications
Vendor offerings are gradually adding new set of services catering the growing need to build cloud native applications. Given below are some of the notable ones, in no particular order of preference and is by no means a complete list of the ones available today.
Microsoft has recently announced that it will soon offer Docker containers for the Windows Eco-system with
- The Hyper-V Container deployment with high level of isolation powered by Hyper-V virtualization.
- The Nano Server small footprint Windows server which is built for the cloud .It is a headless version of the Windows Server designed to run cloud native applications and containers.
Microsoft provides API management with its Azure API Management service. This is a result of its acquisition of Apiphany (a leading provider of API management delivery platform) in 2013, which forms the base of this service offering. This service caters to API management, throttling, health monitoring apart from other features.
AWS EC2 Container Service, provides the Docker compatible containers that can be managed and run across a cluster of EC2 instances. It also provides the container eco-system services of cluster management, load balancing and API support.
Taking micro-services and container technology to the next level, AWS Lambda service provides the ability to host stateless execution logic/code as lambda functions, which can be initiated and scaled in milliseconds to respond to specific events sources. These lambda functions which are primarily backend services, can be used to further invoke other RESTful services if required. Based on a Node.js powered platform AWS Lambda frees the developer of virtualization and container management. It was also named in as the Disruptive Cloud Technology of quarter four 2014 with the introduction of this scalable event driven compute service, with micro-applications, that can be provisioned in milliseconds. For example: A file upload event to trigger a related processing operation can be deployed using this service.
Amazon provides API management services through its recently launched Amazon API Gateway Service that can be used as a single point of entry to APIs exposed via cloud applications and AWS Lambda endpoints apart from the AWS services like S3, EC2 and RDS.
VMWare has launched a light weight container ready Linux distribution called Photon . This will support Docker, CoreOS Rocket, and Pivotal Garden container formats.
Cloud Native Identity and Access Management
VMWare Lightwave is an enterprise grade Identity and Access Management solution, which includes first of its kind container security. This adds additional security beyond just the container isolation levels which is catering specifically to cloud native application development.
One of the early innovators in the native cloud applications and services, Google runs more than two billion containers per week and are running product workloads for services such as Gmail, Maps and Search inside containers. It recently launched the Beta version of its Google Container Engine , which provides container life cycle, cluster management, operation services powered by its open sourced orchestration system for Docker which is Kubernetes.
The IBM Bluemix platform recently announced support for Docker based containers with a wide variety of container management and deployment services in a DevOps friendly eco-system.
IBM Cloud platform has strengthened by adding Bluemix API Management to further enhance building cloud native API driven solutions on the cloud.
Platforms and Frameworks
Spring Cloud is a toolbox for building cloud native applications using the Spring Java platform. It brings together a set of design patterns and use cases that will often be encountered in building such systems. This can be combined with the high productivity of Spring Boot , to provide out-of the-box production ready features including health checks, externalized configuration and many others, in-line with the 12-factor application methodology. The all Java Big Data framework of Spring XD has recently been redesigned to be cloud native and launched as Spring Cloud Data Flow.
Microsoft ASP.NET vNext is a lean framework for building web and cloud applications. ASP.NET vNext application can use the cloud-optimized subset of the .NET framework (CoreCLR) which is around ~11 Mb in size as compared to 200 Mb. It also is designed to work in the cloud environment with default integration with Microsoft Azure through a simplified configuration.
The Cloud computing landscape is moving away from purely Lift and Shift application migrations and towards building cloud native applications to be able to fully utilize the potential of the cloud and scalability requirements. However this comes with its own set of challenges which can be overcome by carefully making the right set of design decisions as well as selection of the platform and the tools to achieve the same. While cloud vendors , platforms and frameworks are gearing up their services to enable the next generation cloud applications , there needs to be early investment in architecting solutions appropriately to take advantage of them. Most cloud application may not be perfect in its initial release cycles and budget for an evolutionary architecture.