azure architecture reference template for web applications...azure architecture reference template...

AZURE ARCHITECTURE

REFERENCE TEMPLATE FOR

WEB APPLICATIONS

Subramanian Veerappan, Senior Enterprise Architect D & A Practice, HCL Technologies

May 2020

AZURE ARCHITECTURE REFERENCE TEMPLATE FOR WEB APPLICATIONS

1

Contents Objective ..................................................................................................................................................... 5

Scope .......................................................................................................................................................... 5

Azure Architecture Considerations ............................................................................................................. 5

Content Flow .............................................................................................................................................. 5

Architecture Styles ...................................................................................................................................... 6

N-Tier Architecture ................................................................................................................................. 6

Microservices .......................................................................................................................................... 6

Event Driven Architecture ....................................................................................................................... 6

Web-Queue-Worker ............................................................................................................................... 6

Cloud Design Pattern Considerations ......................................................................................................... 7

Key Design Principle Considerations ........................................................................................................... 9

Architecture components for a Basic Web Application ............................................................................ 10

App Service plan ................................................................................................................................... 11

SQL Database ........................................................................................................................................ 11

Region ................................................................................................................................................... 11

Scalability considerations ...................................................................................................................... 11

Scaling the App Service app .................................................................................................................. 11

Recommendations for scaling a web app ............................................................................................. 12

Scaling SQL Database ........................................................................................................................ 12

Availability considerations .................................................................................................................... 12

Backups ............................................................................................................................................. 12

Manageability considerations ............................................................................................................... 13

Deployment ...................................................................................................................................... 13

Configuration .................................................................................................................................... 13

Security considerations ......................................................................................................................... 14

SQL Database auditing ...................................................................................................................... 14

Deployment slots .............................................................................................................................. 14

Logging .............................................................................................................................................. 14

SSL ..................................................................................................................................................... 14

Authentication .................................................................................................................................. 14

Architecture components for a Web application with High Availability ................................................... 16

Architecture .......................................................................................................................................... 16

Recommendations ................................................................................................................................ 17


2

Regional pairing ................................................................................................................................ 17

Resource groups ............................................................................................................................... 17

Front Door configuration .................................................................................................................. 17

SQL Database .................................................................................................................................... 18

Cosmos DB ........................................................................................................................................ 18


Azure Front Door .............................................................................................................................. 18

SQL Database .................................................................................................................................... 19

Storage .............................................................................................................................................. 19


Cost considerations ............................................................................................................................... 20

Azure Front Door .............................................................................................................................. 20

Azure Cosmos DB .............................................................................................................................. 20

Components for a Complex Web Application with High Availability ........................................................ 21

Architecture .......................................................................................................................................... 21


Regional pairing ................................................................................................................................ 22

Traffic Manager configuration .......................................................................................................... 23

Configure SQL Server Always On Availability Groups ........................................................................ 23




Virtual machine scale sets ................................................................................................................. 25

SQL server ......................................................................................................................................... 26

Load balancers .................................................................................................................................. 26

Traffic Manager pricing ..................................................................................................................... 26

Architecture components for a Serverless Web application ..................................................................... 27

Architecture .......................................................................................................................................... 28


Function App plans ........................................................................................................................... 29

Function App boundaries .................................................................................................................. 30

Function bindings .............................................................................................................................. 30

Scalability considerations ...................................................................................................................... 30

Disaster recovery considerations .......................................................................................................... 31


3

Security considerations ......................................................................................................................... 31

Authentication .................................................................................................................................. 31

Authorization .................................................................................................................................... 32

CORS ................................................................................................................................................. 32

Enforce HTTPS ................................................................................................................................... 33

Lock down the function app.............................................................................................................. 33

Protect application secrets ............................................................................................................... 33

DevOps considerations ......................................................................................................................... 34

Front-end deployment ...................................................................................................................... 34

Back-end deployment ....................................................................................................................... 34

API versioning ................................................................................................................................... 34


Azure Functions ................................................................................................................................ 35

Azure Cosmos DB .............................................................................................................................. 35

Content Delivery Network ................................................................................................................ 36

Multi-tier web application with HA and DR on Azure ............................................................................... 37

Architecture .......................................................................................................................................... 37

Components ...................................................................................................................................... 38

Alternatives ....................................................................................................................................... 39

Other considerations ............................................................................................................................ 39

Scalability .......................................................................................................................................... 39

Security ............................................................................................................................................. 39

Pricing ................................................................................................................................................... 39

Microservices Architecture on AKS ........................................................................................................... 40

Architecture .......................................................................................................................................... 40

Components ...................................................................................................................................... 40

Design considerations ........................................................................................................................... 42

Microservices .................................................................................................................................... 42

API gateway ...................................................................................................................................... 42

Data storage ...................................................................................................................................... 42

Service object ........................................................................................................................................ 42

Ingress ................................................................................................................................................... 43

TLS/SSL encryption ............................................................................................................................ 44

Namespaces .......................................................................................................................................... 44


4

Autoscaling ........................................................................................................................................... 44

Pod autoscaling ................................................................................................................................. 45

Cluster autoscaling ............................................................................................................................ 45

Health probes ....................................................................................................................................... 45

Resource constraints............................................................................................................................. 46

Role based access control (RBAC) ......................................................................................................... 46

Secrets management and application credentials ................................................................................ 48

Pod and container security ................................................................................................................... 49

Deployment (CI/CD) considerations...................................................................................................... 49

Container best practices ................................................................................................................... 49

Helm charts ....................................................................................................................................... 50

Helm Revisions .................................................................................................................................. 51

Azure DevOps Pipeline for Microservices on Kubernetes ..................................................................... 51


Azure Kubernetes Service (AKS) ........................................................................................................ 52

Azure Load balancer.......................................................................................................................... 53

Azure DevOps Services ...................................................................................................................... 53

Azure Monitor ................................................................................................................................... 53

Exclusions ................................................................................................................................................. 53

Conclusion ................................................................................................................................................ 53

References ................................................................................................................................................ 53

Acknowledgement .................................................................................................................................... 53


5

Objective Objective of this document is to provide guidance on various Azure Architecture styles and

considerations for various application models within those styles.

Scope Scope of this document is to cover Azure Web application Architecture including High availability, N-tier

architecture.

Azure Architecture Considerations <<Mention key drivers/ NFR requirements for proposing given Architecture>>

• Scalability

• Reliability

• Availability

• Resiliency

• Performance

• Security

• Data Quality

• Usability

• Resilience

Content Flow

Want to know about key design principles to be considered => Key design principles

Want to know about Architectural considerations for

• Simple Web application => Simple Web App

• Web application with High Availability => Web App with HA

• Complex Web application with HA => Complex Web App with HA

• Multi-tier Web App with HA and DR => Multi-tier Web App with HA and DR

• Micro services Architecture on AKS => Microservices on AKS

• Serverless Web Application => Serverless Web App


6

Architecture Styles <<Mention which of the below Arch. Style will be followed >>

<<Following challenges need to be considered before proposing a specific Arch. Style:

• Complexity

• Messaging and eventual consistency

• Inter-service communication

• Manageability

>>

N-Tier Architecture

<<Note: This arch. Style is best fit for appls. That uses mix of IaaS and managed services, appls. Which

already uses layered architecture>>

Microservices

<<Note: Propose this architecture style for applications which have complex domains and which

demands a mature development and DevOps process>>

Event Driven Architecture

<<Note: Best fit for applications that ingest and process large volume of data with very low latency. This

type of architecture is also useful when different subsystems need to perform different types of

processing on same data>>

Web-Queue-Worker

<<Note: Propose this architecture style applications with relatively simple domain with some resource

intensive tasks>>


7

Cloud Design Pattern Considerations

Pattern What is this pattern for?

Ambassador Create helper services that send network requests on behalf of a

consumer service or application.

Anti-Corruption Layer Implement a façade or adapter layer between a modern application and a

legacy system.

Asynchronous Request-

Reply

Decouple backend processing from a frontend host, where backend

processing needs to be asynchronous, but the frontend still needs a clear

response.

Backends for Frontends Create separate backend services to be consumed by specific frontend

applications or interfaces.

Bulkhead Isolate elements of an application into pools so that if one fails, the others

will continue to function.

Cache-Aside Load data on demand into a cache from a data store

Choreography Let each service decide when and how a business operation is processed,

instead of depending on a central orchestrator.

Circuit Breaker Handle faults that might take a variable amount of time to fix when

connecting to a remote service or resource.

Claim Check Split a large message into a claim check and a payload to avoid

overwhelming a message bus.

Compensating

Transaction

Undo the work performed by a series of steps, which together define an

eventually consistent operation.

Competing Consumers Enable multiple concurrent consumers to process messages received on

the same messaging channel.

Compute Resource

Consolidation

Consolidate multiple tasks or operations into a single computational unit

CQRS Segregate operations that read data from operations that update data by

using separate interfaces.

Event Sourcing Use an append-only store to record the full series of events that describe

actions taken on data in a domain.

External Configuration

Store

Move configuration information out of the application deployment

package to a centralized location.

Federated Identity Delegate authentication to an external identity provider.

https://docs.microsoft.com/en-us/azure/architecture/patterns/anti-corruption-layer

https://docs.microsoft.com/en-us/azure/architecture/patterns/async-request-reply

https://docs.microsoft.com/en-us/azure/architecture/patterns/async-request-reply

https://docs.microsoft.com/en-us/azure/architecture/patterns/backends-for-frontends

https://docs.microsoft.com/en-us/azure/architecture/patterns/bulkhead

https://docs.microsoft.com/en-us/azure/architecture/patterns/cache-aside

https://docs.microsoft.com/en-us/azure/architecture/patterns/choreography

https://docs.microsoft.com/en-us/azure/architecture/patterns/circuit-breaker

https://docs.microsoft.com/en-us/azure/architecture/patterns/claim-check

https://docs.microsoft.com/en-us/azure/architecture/patterns/compensating-transaction

https://docs.microsoft.com/en-us/azure/architecture/patterns/compensating-transaction

https://docs.microsoft.com/en-us/azure/architecture/patterns/competing-consumers

https://docs.microsoft.com/en-us/azure/architecture/patterns/compute-resource-consolidation

https://docs.microsoft.com/en-us/azure/architecture/patterns/compute-resource-consolidation

https://docs.microsoft.com/en-us/azure/architecture/patterns/cqrs

https://docs.microsoft.com/en-us/azure/architecture/patterns/event-sourcing

https://docs.microsoft.com/en-us/azure/architecture/patterns/external-configuration-store

https://docs.microsoft.com/en-us/azure/architecture/patterns/external-configuration-store

https://docs.microsoft.com/en-us/azure/architecture/patterns/federated-identity


8

Gatekeeper Protect applications and services by using a dedicated host instance that

acts as a broker between clients and the application or service, validates

and sanitizes requests, and passes requests and data between them.

Gateway Aggregation Use a gateway to aggregate multiple individual requests into a single

request.

Gateway Offloading Offload shared or specialized service functionality to a gateway proxy.

Gateway Routing Route requests to multiple services using a single endpoint.

Geodes Deploy backend services into a set of geographical nodes, each of which

can service any client request in any region.

Health Endpoint

Monitoring

Implement functional checks in an application that external tools can

access through exposed endpoints at regular intervals.

Index Table Create indexes over the fields in data stores that are frequently

referenced by queries.

Leader Election Coordinate the actions performed by a collection of collaborating task

instances in a distributed application by electing one instance as the

leader that assumes responsibility for managing the other instances.

Materialized View Generate prepopulated views over the data in one or more data stores

when the data isn't ideally formatted for required query operations.

Pipes and Filters Break down a task that performs complex processing into a series of

separate elements that can be reused.

Priority Queue Prioritize requests sent to services so that requests with a higher priority

are received and processed more quickly than those with a lower priority.

Publisher/Subscriber Enable an application to announce events to multiple interested

consumers asynchronously, without coupling the senders to the receivers.

Queue-Based Load

Leveling

Use a queue that acts as a buffer between a task and a service that it

invokes in order to smooth intermittent heavy loads.

Retry Enable an application to handle anticipated, temporary failures when it

tries to connect to a service or network resource by transparently retrying

an operation that's previously failed.

Scheduler Agent

Supervisor

Coordinate a set of actions across a distributed set of services and other

remote resources.

Sequential Convoy Process a set of related messages in a defined order, without blocking

processing of other groups of messages.

Sharding Divide a data store into a set of horizontal partitions or shards.

https://docs.microsoft.com/en-us/azure/architecture/patterns/gatekeeper

https://docs.microsoft.com/en-us/azure/architecture/patterns/gateway-aggregation

https://docs.microsoft.com/en-us/azure/architecture/patterns/gateway-offloading

https://docs.microsoft.com/en-us/azure/architecture/patterns/gateway-routing

https://docs.microsoft.com/en-us/azure/architecture/patterns/geodes

https://docs.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring

https://docs.microsoft.com/en-us/azure/architecture/patterns/health-endpoint-monitoring

https://docs.microsoft.com/en-us/azure/architecture/patterns/index-table

https://docs.microsoft.com/en-us/azure/architecture/patterns/leader-election

https://docs.microsoft.com/en-us/azure/architecture/patterns/materialized-view

https://docs.microsoft.com/en-us/azure/architecture/patterns/pipes-and-filters

https://docs.microsoft.com/en-us/azure/architecture/patterns/priority-queue

https://docs.microsoft.com/en-us/azure/architecture/patterns/publisher-subscriber

https://docs.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling

https://docs.microsoft.com/en-us/azure/architecture/patterns/queue-based-load-leveling

https://docs.microsoft.com/en-us/azure/architecture/patterns/retry

https://docs.microsoft.com/en-us/azure/architecture/patterns/scheduler-agent-supervisor

https://docs.microsoft.com/en-us/azure/architecture/patterns/scheduler-agent-supervisor

https://docs.microsoft.com/en-us/azure/architecture/patterns/sequential-convoy

https://docs.microsoft.com/en-us/azure/architecture/patterns/sharding


9

Sidecar Deploy components of an application into a separate process or container

to provide isolation and encapsulation.

Static Content Hosting Deploy static content to a cloud-based storage service that can deliver

them directly to the client.

Strangler Incrementally migrate a legacy system by gradually replacing specific

pieces of functionality with new applications and services.

Throttling Control the consumption of resources used by an instance of an

application, an individual tenant, or an entire service.

Valet Key Use a token or key that provides clients with restricted direct access to a

specific resource or service.

Key Design Principle Considerations Design for self-healing: In a distributed system, failures happen. Application has to be designed with

self-healing capability, when failures occur.

Make all things redundant: Consider building redundancy into the application, to avoid having single

points of failure.

Minimize coordination: Minimize coordination between application services to achieve scalability.

Design to scale out: Design the application so that it can scale horizontally, adding or removing new

instances as demand requires.

Partition around limits: Use partitioning to work around database, network, and compute limits.

Design for operations: Design the application so that the operations team has the tools they need.

Use managed services: Wherever possible, use platform as a service (PaaS) rather than infrastructure as

a service (IaaS).

Use the best data store for the job: Pick the storage technology that is the best fit for the application

data and how it will be used.

Design for evolution: All successful applications change over time. An evolutionary design is key for

continuous innovation.

Build for the needs of business: Every design decision must be justified by a business requirement.

https://docs.microsoft.com/en-us/azure/architecture/patterns/sidecar

https://docs.microsoft.com/en-us/azure/architecture/patterns/static-content-hosting

https://docs.microsoft.com/en-us/azure/architecture/patterns/strangler

https://docs.microsoft.com/en-us/azure/architecture/patterns/throttling

https://docs.microsoft.com/en-us/azure/architecture/patterns/valet-key


10

Architecture components for a Basic Web Application <<Following components should be considered for designing a basic web application on Azure

• Resource Group

• App service App

• App service Plan

• Deployment slots

• Azure DNS

• Azure SQL Database

• Logical Server

• Azure Storage

• Azure AD

>>

<<Considerations for each Azure Component >>


11

App Service plan

<<Considerations: Propose Free and Shared tiers for testing purposes as the shared resources cannot scale out. Propose Basic, Standard, and Premium tiers for production workload because the app would need dedicated virtual machine instances and would have allocated resources that can scale out.>>

<<Word of caution: App Service plans are billed on a per second basis. You are charged for the instances in the App Service plan, even if the app is stopped. Ensure to delete plans that are not being used (for example, test deployments).>>

SQL Database

<<Considerations: A logical server group makes administrative tasks simple. Each database within the group is deployed with a specific service tier.

<<Word of caution: Within each group, the databases cannot share resources. There are no compute costs for the server but for each database, we need to specify the tier. Therefore, because of the dedicated resources cost may be higher but the performance will be better.>>

Region

<<Consideration: Provision the App Service plan and the SQL Database in the same region to minimize network latency. Generally, choose the region closest to your users. >>

Scalability considerations

A major benefit of Azure App Service is the ability to scale your application based on load. Here are some considerations to keep in mind when planning to scale your application.

Scaling the App Service app

<<Consideration: There are two ways to scale an App Service app – Scale up and Scale out. Decide on which scaling option is required for the application – refer to below Recommendations section>>

• Scale up, which means changing the instance size. The instance size determines the memory, number of cores, and storage on each VM instance.

• Scale out, which means adding instances to handle increased load. Each pricing tier has a maximum number of instances.

You have two options –

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers


12

• Manual - by changing the instance count manually

• Use autoscaling to automatically add or remove instances based on a schedule and/or performance metrics.

Recommendations for scaling a web app

• Avoid scaling up/down - It may trigger an application restart. Instead, select a tier and size that meet your performance requirements under typical load and then scale out the instances to handle changes in traffic volume.

• Enable autoscaling - o For predictable regular workload - create profiles to schedule the instance

counts ahead of time. o If the workload is not predictable, use rule-based autoscaling (e.g. based on CPU

usage) to react to changes in load as they occur. o We can combine both approaches.

• Autoscale based on load test data - identify potential bottlenecks based on load test, set our autoscale rules on that data. Shorter cool-down period to add instance- Set a shorter cool-down period for adding instances, and a longer cool-down period for removing instances. For example, set 5 minutes to add an instance, but 60 minutes to remove an instance.

Scaling SQL Database

<< Consider scaling up individual databases with no application downtime when there is a need for a higher service tier or performance level for SQL Database.>>

Availability considerations

<<Consider HA, DR, Scaling out and scaling up options as 100% availability is not guaranteed by standalone App service or SQL Server database. Each Azure service is guaranteed by Microsoft with an availability range between 99.95% to 99.99% only. >>

Backups

<<Considerations:

• Use point-in-time restore to recover from human error by returning the database to an earlier point in time.

• Use geo-restore to recover from a service outage by restoring a database from a geo-redundant backup.

App Service provides a backup and restore feature for your application files. However, be aware that the backed-up files include app settings in plain text and these may include secrets, such as connection strings.


13

Avoid using the App Service backup feature to back up the SQL databases because it exports the database to a SQL BACPAC file, consuming DTUs. Instead, use SQL Database point-in-time restore as mentioned above.>>

Manageability considerations

<<Consider creating separate resource groups for production, development, and test environments. This will make it easier to manage deployments, delete test deployments, and assign access rights.>>

When assigning resources to resource groups, consider the following:

• Lifecycle - In general, put resources with the same lifecycle into the same resource group. • Access - You can use role-based access control (RBAC) to apply access policies to the resources in

a group. • Billing - You can view the rolled-up costs for the resource group.

Deployment

<< Considerations:

1. Provisioning the Azure resources - We recommend to use Azure Resource Manager templates. 2. Deploying the application (code, binaries, and content files) - We recommend to make use of

Azure DevOps CI/CD pipeline with source code managed in Git, Azure Repos.

We recommend to have minimum of 3 slots for an App service app – production, staging and last known good deployment. Production slot to represent live production site, staging slot for deploying updates (many applications have a significant warmup and cold-start time), last-known-good deployment to hold the previous production deployment (which is now in staging). Any problem discovered can be reverted by swapping production with last-known-good deployment.

Do NOT use production deployment slots for testing because all apps within the same App Service plan share the same VM instances and it will degrade production performance. >>

Configuration

Store configuration settings as app settings. Define the app settings in your Resource Manager templates, or using PowerShell. At runtime, app settings are available to the application as environment variables.

Never check passwords, access keys, or connection strings into source control. Instead, pass these as parameters to a deployment script that stores these values as app settings.


14

When you swap a deployment slot, the app settings are swapped by default. If you need different settings for production and staging, you can create app settings that stick to a slot and don't get swapped.

Security considerations SQL Database auditing

Auditing can help you maintain regulatory compliance and get insight into discrepancies and irregularities that could indicate business concerns or suspected security violations.

Deployment slots

Each deployment slot has a public IP address. Secure the nonproduction slots using Azure Active Directory login so that only members of your development and DevOps teams can reach those endpoints.

Logging

Logs should never record users' passwords or other information that might be used to commit identity fraud. Scrub those details from the data before storing it.

SSL

An App Service app includes an SSL endpoint on a subdomain of azurewebsites.net at no additional cost. The SSL endpoint includes a wildcard certificate for the *.azurewebsites.net domain. If you use a custom domain name, you must provide a certificate that matches the custom domain. The simplest approach is to buy a certificate directly through the Azure portal. You can also import certificates from other certificate authorities.

As a security best practice, your app should enforce HTTPS by redirecting HTTP requests. You can implement this inside your application or use a URL rewrite rule.

Authentication

We recommend authenticating through an identity provider (IDP), such as Azure AD, Facebook, Google, or Twitter. Use OAuth 2 or OpenID Connect (OIDC) for the authentication flow. Azure AD provides functionality to manage users and groups, create application roles, integrate your on-premises identities, and consume backend services such as Office 365 and Skype for Business.

Avoid having the application manage user logins and credentials directly, as it creates a potential attack surface. At a minimum, you would need to have email confirmation, password recovery, and multi-


15

factor authentication; validate password strength; and store password hashes securely. The large identity providers handle all of those things for you, and are constantly monitoring and improving their security practices.

Consider using App Service authentication to implement the OAuth/OIDC authentication flow. The benefits of App Service authentication include:

• Easy to configure. • No code is required for simple authentication scenarios. • Supports delegated authorization using OAuth access tokens to consume resources on behalf of

the user. • Provides a built-in token cache.

Some limitations of App Service authentication:

• Limited customization options. • Delegated authorization is restricted to one backend resource per login session. • If you use more than one IDP, there is no built-in mechanism for home realm discovery. • For multi-tenant scenarios, the application must implement the logic to validate the token issuer.


16

Architecture components for a Web application with High Availability

Architecture A multi-region architecture can provide higher availability than deploying to a single region. Let’s take this sample case of exposing a web application through two regions – one primary and another standby.

• Primary and secondary regions. This architecture uses two regions to achieve higher availability. The application is deployed to each region. During normal operations, network traffic is routed to the primary region. If the primary region becomes unavailable, traffic is routed to the secondary region.

• Front Door. Front Door routes incoming requests to the primary region. If the application running that region becomes unavailable, Front Door fails over to the secondary region.

• Geo-replication of SQL Database and/or Cosmos DB.

If a regional outage affects the primary region, you can use Front Door to fail over to the secondary region. This architecture can also help if an individual subsystem of the application fails.

There are several general approaches to achieving high availability across regions:

• Active/passive with hot standby. Traffic goes to one region, while the other waits on hot standby. Hot standby means the VMs in the secondary region are allocated and running at all times.

• Active/passive with cold standby. Traffic goes to one region, while the other waits on cold standby. Cold standby means the VMs in the secondary region are not allocated until needed for


17

failover. This approach costs less to run, but will generally take longer to come online during a failure.

• Active/active. Both regions are active, and requests are load balanced between them. If one region becomes unavailable, it is taken out of rotation.

Recommendations

Please use the given recommendations as a reference, as requirements for an application / customer might differ.

Regional pairing

Each Azure region is paired with another region within the same geography. In general, choose regions from the same regional pair (for example, East US 2 and Central US). Benefits of doing so include:

• If there is a broad outage, recovery of at least one region out of every pair is prioritized. • Planned Azure system updates are rolled out to paired regions sequentially to minimize possible

downtime. • In most cases, regional pairs reside within the same geography to meet data residency

requirements.

However, make sure that both regions support all of the Azure services needed for your application.

Resource groups

Consider placing the primary region, secondary region, and Traffic Manager into separate resource groups. This lets you manage the resources deployed to each region as a single collection.

Front Door configuration

Routing - Front Door supports several routing mechanisms. With priority Routing setting, Front Door sends all requests to the primary region unless the endpoint for that region becomes unreachable. At that point, it automatically fails over to the secondary region. Set the backend pool with different priority values, 1 for the active region and 2 or higher for the standby or passive region.

Health probe - Front Door uses HTTP/HTTPS probe to monitor the availability of each back end. The probe gives Front Door a pass/fail test for failing over to the secondary region. It works by sending a request to a specified URL path. If it gets a non-200 response within a timeout period, the probe fails. You can configure the health probe frequency, number of samples required for evaluation, and the number of successful samples required for the


18

backend to be marked as healthy. If Front Door marks the backend as degraded, it fails over to the other backend. For details, refer to Health Probes in MS articles.

As a best practice, create a health probe path in your application backend that reports the overall health of the application. This health probe should check critical dependencies such as the App Service apps, storage queue, and SQL Database. Otherwise, the probe might report a healthy backend when critical parts of the application are actually failing. On the other hand, don't use the health probe to check lower priority services. For example, if an email service goes down the application can switch to a second provider or just send emails later.

SQL Database

• Use Active Geo-Replication to create a readable secondary replica in a different region.

• You can have up to four readable secondary replicas.

• Fail over to a secondary database if your primary database fails or needs to be taken offline.

• Active Geo-Replication can be configured for any database in any elastic database pool.

Cosmos DB

You can do geo-replication across regions with multi-master (multi-write) for Cosmos DB. Alternatively, you can designate one region as the writable region and the others as read-only replicas. If there is a regional outage, you can fail over by selecting another region to be the write region. The client SDK automatically sends write requests to the current write region, so you don't need to update the client configuration after a failover.


Consider these points when designing for high availability across regions.

Azure Front Door

Azure Front Door automatically fails over if the primary region becomes unavailable. When Front Door fails over, there is a period of time (usually about 20-60 seconds) when clients cannot reach the application. The duration is affected by the following factors:

• Frequency of health probes. The more frequent the health probes are sent, the faster Front Door can detect downtime or the backend coming back healthy.

• Sample size configuration. This configuration controls how many samples are required for the health probe to detect that the primary backend has become unreachable. If this value is too low, you could get false positives from intermittent issues.

Front Door is a possible failure point in the system. If the service fails, clients cannot access your application during the downtime.


19

We strongly recommend to review the Front Door service level agreement (SLA) and determine whether using Front Door alone meets your business requirements for high availability. If not, consider adding another traffic management solution as a fallback. If the Front Door service fails, change your canonical name (CNAME) records in DNS to point to the other traffic management service. This step must be performed manually, and your application will be unavailable until the DNS changes are propagated.

SQL Database

Consider Database sizing and availability factors, based on appropriate recovery point objective (RPO) and estimated recovery time (ERT) for SQL Database.

Storage

For Azure Storage, use read-access geo-redundant storage (RA-GRS). With RA-GRS storage, the data is replicated to a secondary region. You have read-only access to the data in the secondary region through a separate endpoint. If there is a regional outage or disaster, the Azure Storage team might decide to perform a geo-failover to the secondary region. There is no customer action required for this failover.

RA-GRS storage provides durable storage, but it's important to understand what can happen during an outage:

• If a storage outage occurs, there will be a period of time when you don't have write-access to the data. You can still read from the secondary endpoint during the outage.

• If a regional outage or disaster affects the primary location and the data there cannot be recovered, the Azure Storage team may decide to perform a geo-failover to the secondary region.

• Data replication to the secondary region is performed asynchronously. Therefore, if a geo-failover is performed, some data loss is possible if the data can't be recovered from the primary region.

• Transient failures, such as a network outage, will not trigger a storage failover. Design your application to be resilient to transient failures. Mitigation options include: o Read from the secondary region. o Temporarily switch to another storage account for new write operations (for example, to

queue messages). o Copy data from the secondary region to another storage account. o Provide reduced functionality until the system fails back.

For Queue storage, create a backup queue in the secondary region. During failover, the app can use the backup queue until the primary region becomes available again. That way, the application can still process new requests.


20


If the primary database fails, perform a manual failover to the secondary database. The secondary database remains read-only until you fail over.

Cost considerations

Use the pricing calculator to estimate costs.

Azure Front Door

Azure Front Door billing has three pricing tiers: outbound data transfers, inbound data transfers, and routing rules. For more info See Azure Front Door Pricing. The pricing chart does not include the cost of accessing data from the backend services and transferring to Front Door. Those costs are billed based on data transfer charges, described in Bandwidth Pricing Details.

Azure Cosmos DB

There are two factors that determine Azure Cosmos DB pricing:

• The provisioned throughput or Request Units per second (RU/s).

Cosmos DB allocates the resources required to guarantee the RU/s that you specify. You are billed hourly for the maximum provisioned throughput per hour. Because of the resources dedicated to your container or database, you are charged for the specified throughput even if you don't run any workload.

• Consumed storage. You are billed a flat rate for the total amount of storage (GBs) consumed for data and the indexes for a given hour.

https://azure.microsoft.com/pricing/calculator

https://azure.microsoft.com/pricing/details/frontdoor

https://azure.microsoft.com/pricing/details/bandwidth


21

Components for a Complex Web Application with High Availability

Architecture

We have considered an N-tier application with SQL Server for this ‘complex application with HA’ scenario.

• Primary and secondary regions. Two regions are used to achieve higher availability. One will act as the primary region. The other region is for failover.

• Azure Traffic Manager. Traffic Manager routes incoming requests to one of the regions. During normal operations, it routes requests to the primary region. If that region becomes unavailable, Traffic Manager fails over to the secondary region.

• Resource groups. Create separate resource groups for the primary region, the secondary region, and for Traffic Manager. This gives you the flexibility to manage each region as a single collection of resources. For example, you could redeploy one region, without taking down the other one. Link the resource groups, so that you can run a query to list all the resources for the application.

• Virtual networks. Create a separate virtual network for each region. Make sure the address spaces do not overlap.


22

• SQL Server Always On Availability Group. If you are using SQL Server, we recommend SQL Always On Availability Groups for high availability. Create a single availability group that includes the SQL Server instances in both regions.

Also consider Azure SQL Database (which provides a relational database as a cloud service) than any other database product on Azure. With SQL Database, you don't need to configure an availability group or manage failover.

• Virtual network peering. Peer the two virtual networks to allow data replication from the primary region to the secondary region.

Recommendations

A multi-region architecture can provide higher availability than deploying to a single region. If a regional outage affects the primary region, you can use Traffic Manager to fail over to the secondary region. This architecture can also help if an individual subsystem of the application fails.

There are several general approaches to achieving high availability across regions:

• Active/passive with hot standby. Traffic goes to one region, while the other waits on hot standby. Hot standby means the VMs in the secondary region are allocated and running at all times.

• Active/passive with cold standby. Traffic goes to one region, while the other waits on cold standby. Cold standby means the VMs in the secondary region are not allocated until needed for failover. This approach costs less to run, but will generally take longer to come online during a failure.

• Active/active. Both regions are active, and requests are load balanced between them. If one region becomes unavailable, it is taken out of rotation.

This reference architecture focuses on active/passive with hot standby, using Traffic Manager for failover. Note that you could deploy a small number of VMs for hot standby and then scale out as needed.

Regional pairing

Each Azure region is paired with another region within the same geography. In general, choose regions from the same regional pair (for example, East US 2 and US Central). Benefits of doing so include:

• If there is a broad outage, recovery of at least one region out of every pair is prioritized. • Planned Azure system updates are rolled out to paired regions sequentially, to minimize possible

downtime. • Pairs reside within the same geography, to meet data residency requirements.

However, make sure that both regions support all of the Azure services needed for your application.


23

Traffic Manager configuration

Consider the following points when configuring Traffic Manager:

• Routing. Traffic Manager supports several routing algorithms. With priority routing setting, Traffic Manager sends all requests to the primary region, unless the primary region becomes unreachable. At that point, it automatically fails over to the secondary region.

• Health probe. Traffic Manager uses an HTTP (or HTTPS) probe to monitor the availability of each region. The probe checks for an HTTP 200 response for a specified URL path. As a best practice, create an endpoint that reports the overall health of the application, and use this endpoint for the health probe. Otherwise, the probe might report a healthy endpoint when critical parts of the application are actually failing.

When Traffic Manager fails over there is a period of time when clients cannot reach the application. The duration is affected by the following factors:

• The health probe must detect that the primary region has become unreachable. • DNS servers must update the cached DNS records for the IP address, which depends on the DNS

time-to-live (TTL). The default TTL is 300 seconds (5 minutes), but you can configure this value when you create the Traffic Manager profile.

If Traffic Manager fails over, we recommend performing a manual failback rather than implementing an automatic failback. Otherwise, you can create a situation where the application flips back and forth between regions. Verify that all application subsystems are healthy before failing back.

Note that Traffic Manager automatically fails back by default. To prevent this, manually lower the priority of the primary region after a failover event. For example, suppose the primary region is priority 1 and the secondary is priority 2. After a failover, set the primary region to priority 3, to prevent automatic failback. When you are ready to switch back, update the priority to 1.

Another approach is to temporarily disable the endpoint until you are ready to fail back:

Depending on the cause of a failover, you might need to redeploy the resources within a region. Before failing back, perform an operational readiness test. The test should verify things like:

• VMs are configured correctly. (All required software is installed, IIS is running, and so on.) • Application subsystems are healthy. • Functional testing. (For example, the database tier is reachable from the web tier.)

Configure SQL Server Always On Availability Groups

Prior to Windows Server 2016, SQL Server Always On Availability Groups require a domain controller, and all nodes in the availability group must be in the same Active Directory (AD) domain.


24

To configure the availability group:

• At a minimum, place two domain controllers in each region. • Give each domain controller a static IP address. • Peer the two virtual networks to enable communication between them. • For each virtual network, add the IP addresses of the domain controllers (from both

regions) to the DNS server list. You can use the following CLI command. • Create a Windows Server Failover Clustering (WSFC) cluster that includes the SQL Server

instances in both regions. • Create a SQL Server Always On Availability Group that includes the SQL Server instances in

both the primary and secondary regions. Follow these key steps: o Put the primary replica in the primary region. o Put one or more secondary replicas in the primary region. Configure these to use

synchronous commit with automatic failover. o Put one or more secondary replicas in the secondary region. Configure these to

use asynchronous commit, for performance reasons. (Otherwise, all T-SQL transactions have to wait on a round trip over the network to the secondary region.)

Note

Asynchronous commit replicas do not support automatic failover.


With a complex N-tier app, you may not need to replicate the entire application in the secondary region. Instead, you might just replicate a critical subsystem that is needed to support business continuity.

Traffic Manager is a possible failure point in the system. If the Traffic Manager service fails, clients cannot access your application during the downtime. Review the Traffic Manager SLA, and determine whether using Traffic Manager alone meets your business requirements for high availability. If not, consider adding another traffic management solution as a failback. If the Azure Traffic Manager service fails, change your CNAME records in DNS to point to the other traffic management service. (This step must be performed manually, and your application will be unavailable until the DNS changes are propagated.)

For the SQL Server cluster, there are two failover scenarios to consider:

• All of the SQL Server database replicas in the primary region fail. For example, this could happen during a regional outage. In that case, you must manually fail over the availability group, even though Traffic Manager automatically fails over on the front end.

Word of caution:


25

With forced failover, there is a risk of data loss. Once the primary region is back online, take a snapshot of the database and use tablediff to find the differences.

• Traffic Manager fails over to the secondary region, but the primary SQL Server database replica is still available. For example, the front-end tier might fail, without affecting the SQL Server VMs. In that case, Internet traffic is routed to the secondary region, and that region can still connect to the primary replica. However, there will be increased latency, because the SQL Server connections are going across regions. In this situation, you should perform a manual failover as follows:

1. Temporarily switch a SQL Server database replica in the secondary region to synchronous commit. This ensures there won't be data loss during the failover.

2. Fail over to that replica. 3. When you fail back to the primary region, restore the asynchronous commit setting.


When you update your deployment, update one region at a time to reduce the chance of a global failure from an incorrect configuration or an error in the application.

Test the resiliency of the system to failures. Here are some common failure scenarios to test:

• Shut down VM instances. • Pressure resources such as CPU and memory. • Disconnect/delay network. • Crash processes. • Expire certificates. • Simulate hardware faults. • Shut down the DNS service on the domain controllers.

Measure the recovery times and verify they meet your business requirements. Test combinations of failure modes, as well.

Cost considerations

Use the Azure Pricing Calculator to estimates costs. Here are some other considerations.

Virtual machine scale sets

Virtual machine scale sets are available on all Windows VM sizes. You are only charged for the Azure VMs you deploy and any additional underlying infrastructure resources consumed such as storage and networking. There are no incremental charges for the Virtual machine scale sets service.

For single VMs pricing options, see Windows VMs pricing.

https://msdn.microsoft.com/library/ms162843.aspx


https://azure.microsoft.com/pricing/details/virtual-machines/windows


26

SQL server

If you choose Azure SQL DBaas, you can save on cost because don't need to configure an Always On Availability Group and domain controller machines. There are several deployment options starting from single database up to managed instance, or elastic pools.

For SQL server VMs pricing options, see SQL VMs pricing.

Load balancers

You are charged only for the number of configured load-balancing and outbound rules. Inbound NAT rules are free. There is no hourly charge for the Standard Load Balancer when no rules are configured.

Traffic Manager pricing

Traffic Manager billing is based on the number of DNS queries received, with a discount for services receiving more than 1 billion monthly queries. You are also charged for each monitored endpoint.

https://azure.microsoft.com/pricing/details/sql-database/managed


27

Architecture components for a Serverless Web application

The term serverless has two distinct but related meanings:

• Backend as a service (BaaS). Back-end cloud services, such as databases and storage, provide APIs that enable client applications to connect directly to these services.

• Functions as a service (FaaS). In this model, a "function" is a piece of code that is deployed to the cloud and runs inside a hosting environment that completely abstracts the servers that run the code.

Both definitions have in common the idea that developers and DevOps personnel don't need to deploy,

configure, or manage servers.

This reference architecture focuses on FaaS using Azure Functions, although serving web content from Azure Blob Storage could be an example of BaaS. Some important characteristics of FaaS are:

1. Compute resources are allocated dynamically as needed by the platform. 2. Consumption-based pricing: You are charged only for the compute resources used to execute

your code. 3. The compute resources scale on demand based on traffic, without the developer needing to do

any configuration.

Functions are executed when an external trigger occurs, such as an HTTP request or a message arriving on a queue. This makes an event-driven architecture style natural for serverless architectures. To coordinate work between components in the architecture, consider using message brokers or pub/sub patterns.


28

Architecture

Typical architecture for a serverless application consists of the following components:

Blob Storage. Static web content, such as HTML, CSS, and JavaScript files, are stored in Azure Blob Storage and served to clients by using static website hosting. All dynamic interaction happens through JavaScript code making calls to the back-end APIs. There is no server-side code to render the web page. Static website hosting supports index documents and custom 404 error pages.

CDN. Use Azure Content Delivery Network (CDN) to cache content for lower latency and faster delivery of content, as well as providing an HTTPS endpoint.

Function Apps. Azure Functions is a serverless compute option. It uses an event-driven model, where a piece of code (a "function") is invoked by a trigger. In this architecture, the function is invoked when a client makes an HTTP request. The request is always routed through an API gateway, described below.

API Management. API Management provides an API gateway that sits in front of the HTTP function. You can use API Management to publish and manage APIs used by client applications. Using a gateway helps to decouple the front-end application from the back-end APIs. For example, API Management can rewrite URLs, transform requests before they reach the back end, set request or response headers, and so forth.

API Management can also be used to implement cross-cutting concerns such as:

• Enforcing usage quotas and rate limits • Validating OAuth tokens for authentication • Enabling cross-origin requests (CORS) • Caching responses • Monitoring and logging requests

If you don't need all of the functionality provided by API Management, another option is to use Functions Proxies. This feature of Azure Functions lets you define a single API surface for multiple function apps, by creating routes to back-end functions. Function proxies can also perform limited transformations on the HTTP request and response. However, they don't provide the same rich policy-based capabilities of API Management.

Cosmos DB. Cosmos DB is a multi-model database service. For this scenario, the function application fetches documents from Cosmos DB in response to HTTP GET requests from the client.

Azure Active Directory (Azure AD). Users sign into the web application by using their Azure AD credentials. Azure AD returns an access token for the API, which the web application uses to authenticate API requests.


29

Azure Monitor. Monitor collects performance metrics about the Azure services deployed in the solution. By visualizing these in a dashboard, you can get visibility into the health of the solution. It also collected application logs.

Azure Pipelines. Pipelines is a continuous integration (CI) and continuous delivery (CD) service that builds, tests, and deploys the application.

Recommendations Function App plans

Azure Functions supports two hosting models. With the consumption plan, compute power is automatically allocated when your code is running. With the App Service plan, a set of VMs are allocated for your code. The App Service plan defines the number of VMs and the VM size.

Note that the App Service plan is not strictly serverless, according to the definition given above. The programming model is the same, however — the same function code can run in both a consumption plan and an App Service plan.

Here are some factors to consider when choosing which type of plan to use:

• Cold start. With the consumption plan, a function that hasn't been invoked recently will incur some additional latency the next time it runs. This additional latency is due to allocating and preparing the runtime environment. It is usually on the order of seconds but depends on several factors, including the number of dependencies that need to be loaded. Cold start is usually more of a concern for interactive workloads (HTTP triggers) than asynchronous message-driven workloads (queue or event hubs triggers), because the additional latency is directly observed by users.

• Timeout period. In the consumption plan, a function execution times out after a configurable period of time (to a maximum of 10 minutes)

• Virtual network isolation. Using an App Service plan allows functions to run inside of an App Service Environment, which is a dedicated and isolated hosting environment.

• Pricing model. The consumption plan is billed by the number of executions and resource consumption (memory × execution time). The App Service plan is billed hourly based on VM instance SKU. Often, the consumption plan can be cheaper than an App Service plan, because you pay only for the compute resources that you use. This is especially true if your traffic experiences peaks and troughs. However, if an application experiences constant high-volume throughput, an App Service plan may cost less than the consumption plan.

• Scaling. A big advantage of the consumption model is that it scales dynamically as needed, based on the incoming traffic. While this scaling occurs quickly, there is still a ramp-up period. For some workloads, you might want to deliberately overprovision the VMs, so that you can handle bursts of traffic with zero ramp-up time. In that case, consider an App Service plan.


30

Function App boundaries

A function app hosts the execution of one or more functions. You can use a function app to group several functions together as a logical unit. Within a function app, the functions share the same application settings, hosting plan, and deployment lifecycle. Each function app has its own hostname.

Use function apps to group functions that share the same lifecycle and settings. Functions that don't share the same lifecycle should be hosted in different function apps.

Consider taking a microservices approach, where each function app represents one microservice, possibly consisting of several related functions. In a microservices architecture, services should have loose coupling and high functional cohesion. Loosely coupled means you can change one service without requiring other services to be updated at the same time. Cohesive means a service has a single, well-defined purpose.

Function bindings

Use Functions bindings when possible. Bindings provide a declarative way to connect your code to data and integrate with other Azure services. An input binding populates an input parameter from an external data source. An output binding sends the function's return value to a data sink, such as a queue or database.

By using bindings, you don't need to write code that talks directly to the service, which makes the function code simpler and also abstracts the details of the data source or sink. In some cases, however, you may need more complex logic than the binding provides. In that case, use the Azure client SDKs directly.

Scalability considerations

Functions. For the consumption plan, the HTTP trigger scales based on the traffic. There is a limit to the number of concurrent function instances, but each instance can process more than one request at a time. For an App Service plan, the HTTP trigger scales according to the number of VM instances, which can be a fixed value or can autoscale based on a set of autoscaling rules.

Cosmos DB. Throughput capacity for Cosmos DB is measured in Request Units (RU). A 1-RU throughput corresponds to the throughput need to GET a 1KB document. In order to scale a Cosmos DB container past 10,000 RU, you must specify a partition key when you create the container and include the partition key in every document that you create.

API Management. API Management can scale out and supports rule-based autoscaling. The scaling process takes at least 20 minutes. If your traffic is bursty, you should provision for the maximum burst traffic that you expect. However, autoscaling is useful for handling hourly or daily variations in traffic.


31

Disaster recovery considerations

The deployment shown here resides in a single Azure region. For a more resilient approach to disaster-recovery, take advantage of the geo-distribution features in the various services:

• API Management supports multi-region deployment, which can be used to distribute a single API Management instance across any number of Azure regions.

• Use Traffic Manager to route HTTP requests to the primary region. If the Function App running in that region becomes unavailable, Traffic Manager can fail over to a secondary region.

• Cosmos DB supports multiple master regions, which enables writes to any region that you add to your Cosmos DB account. If you don't enable multi-master, you can still fail over the primary write region. The Cosmos DB client SDKs and the Azure Function bindings automatically handle the failover, so you don't need to update any application configuration settings.

Security considerations Authentication

The GetStatus API in the reference implementation uses Azure AD to authenticate requests. Azure AD supports the OpenID Connect protocol, which is an authentication protocol built on top of the OAuth 2 protocol.

In this architecture, the client application is a single-page application (SPA) that runs in the browser. This type of client application cannot keep a client secret or an authorization code hidden, so the implicit grant flow is appropriate. Here's the overall flow:

1. The user clicks the "Sign in" link in the web application. 2. The browser is redirected the Azure AD sign in page. 3. The user signs in. 4. Azure AD redirects back to the client application, including an access token in the URL fragment. 5. When the web application calls the API, it includes the access token in the Authentication header.

The application ID is sent as the audience ('aud') claim in the access token. 6. The back-end API validates the access token.

To configure authentication:

• Register an application in your Azure AD tenant. This generates an application ID, which the client includes with the login URL.

• Enable Azure AD authentication inside the Function App. • Add the validate-jwt policy to API Management to pre-authorize the request by validating

the access token.


32

It's recommended to create separate app registrations in Azure AD for the client application and the back-end API. Grant the client application permission to call the API. This approach gives you the flexibility to define multiple APIs and clients and control the permissions for each.

Within an API, use scopes to give applications fine-grained control over what permissions they request from a user. For example, an API might have Read and Write scopes, and a particular client app might ask the user to authorize Read permissions only.

Authorization

In many applications, the back-end API must check whether a user has permission to perform a given action. It's recommended to use claims-based authorization, where information about the user is conveyed by the identity provider (in this case, Azure AD) and used to make authorization decisions. For example, when you register an application in Azure AD, you can define a set of application roles. When a user signs into the application, Azure AD includes a roles claim for each role that the user has been granted, including roles that are inherited through group membership.

The ID token that Azure AD returns to the client contains some of the user's claims. Within the function app, these claims are available in the X-MS-CLIENT-PRINCIPAL header of the request. However, it's simpler to read this information from binding data. For other claims, use Microsoft Graph to query Azure AD.

CORS

In a typical architecture, the web application and the API do not share the same origin. That means when the application calls the API, it is a cross-origin request. Browser security prevents a web page from making AJAX requests to another domain. This restriction is called the same-origin policy and prevents a malicious site from reading sensitive data from another site. To enable a cross-origin request, add a Cross-Origin Resource Sharing (CORS) policy to the API Management gateway:


33

Note

Be very careful about setting allow-credentials to true, because it means a website can send the user's credentials to your API on the user's behalf, without the user being aware. You must trust the allowed origin.

Enforce HTTPS

For maximum security, require HTTPS throughout the request pipeline:

• CDN. Azure CDN supports HTTPS on the *.azureedge.net subdomain by default. • Static website hosting. Enable the "Secure transfer required" option on the Storage

account. When this option is enabled, the storage account only allows requests from secure HTTPS connections.

• API Management. Configure the APIs to use HTTPS protocol only. You can configure this in the Azure portal or through a Resource Manager template.

• Azure Functions. Enable the "HTTPS Only" setting.

Lock down the function app

All calls to the function should go through the API gateway. You can achieve this as follows:

• Configure the function app to require a function key. The API Management gateway will include the function key when it calls the function app. This prevents clients from calling the function directly, bypassing the gateway.

• The API Management gateway has a static IP address. Restrict the Azure Function to allow only calls from that static IP address.

Protect application secrets

Don't store application secrets, such as database credentials, in your code or configuration files. Instead, use App settings, which are stored encrypted in Azure.

Alternatively, you can store application secrets in Key Vault. This allows you to centralize the storage of secrets, control their distribution, and monitor how and when secrets are being accessed. However, note that Functions triggers and bindings load their configuration settings from app settings. There is no built-in way to configure the triggers and bindings to use Key Vault secrets.


34

DevOps considerations Front-end deployment

The front end of this reference architecture is a single page application, with JavaScript accessing the serverless back-end APIs, and static content providing a fast user experience. The following are some important considerations for such an application:

• Deploy the application uniformly to users over a wide geographical area with a global-ready CDN, with the static content hosted on the cloud. This avoids the need for a dedicated web server.

• Use a fast and reliable CI/CD service such as Azure Pipelines, to automatically build and deploy every source change. The source must reside in an online version control system.

• Compress your website files to reduce the bandwidth consumption on the CDN and improve performance. Azure CDN allows compression on the fly on the edge servers. Alternatively, the deploy pipeline in this reference architecture compresses the files before deploying them to the Blob storage. This reduces the storage requirement, and gives you more freedom to choose the compression tools, regardless of any CDN limitations.

• The CDN should be able to purge its cache to ensure all users are served the freshest content. A cache purge is required if the build and deploy processes are not atomic, for example, if they replace old files with newly built ones in the same origin folder.

• A different cache strategy such as versioning using directories, may not require a purge by the CDN. The build pipeline in this front-end application creates a new directory for each newly built version. This version is uploaded as an atomic unit to the Blob storage. The Azure CDN points to this new version only after a completed deployment.

• Increase the cache TTL by caching resource files for a longer duration, spanning months. To make sure the cached files are updated when they do change, fingerprint the filenames when they are rebuilt. This front-end application fingerprints all files except for public-facing files such as index.html. Since the index.html is updated frequently, it reflects the changed filenames causing a cache refresh.

Back-end deployment

To deploy the function app, we recommend using package files ("Run from package"). Using this approach, you upload a zip file to a Blob Storage container and the Functions runtime mounts the zip file as a read-only file system. This is an atomic operation, which reduces the chance that a failed deployment will leave the application in an inconsistent state. It can also improve cold start times, especially for Node.js apps, because all of the files are swapped at once.

API versioning

An API is a contract between a service and clients. In this architecture, the API contract is defined at the API Management layer. API Management supports two distinct but complementary versioning concepts:

• Versions allow API consumers to choose an API version based on their needs, such as v1 versus v2.


35

• Revisions allow API administrators to make non-breaking changes in an API and deploy those changes, along with a change log to inform API consumers about the changes.

If you make a breaking change in an API, publish a new version in API Management. Deploy the new version side-by-side with the original version, in a separate Function App. This lets you migrate existing clients to the new API without breaking client applications. Eventually, you can deprecate the previous version. API Management supports several versioning schemes: URL path, HTTP header, or query string.

For updates that are not breaking API changes, deploy the new version to a staging slot in the same Function App. Verify the deployment succeeded and then swap the staged version with the production version. Publish a revision in API Management.

Cost considerations

Use the Azure pricing calculator to estimate costs. Consider these points to optimize cost of this architecture.

Azure Functions

Azure Functions supports two hosting models.

• Consumption plan.

Compute power is automatically allocated when your code is running.

• App Service plan.

A set of VMs are allocated for your code. This plan defines the number of VMs and the VM size.

In this architecture, a function is invoked when a client makes an HTTP request. Because a constant high-volume throughput is not expected in this use case, consumption plan is recommended because you pay only for the compute resources you use.

Azure Cosmos DB

Azure Cosmos DB bills for provisioned throughput and consumed storage by hour. Provisioned throughput is expressed in Request Units per second (RU/s), which can be used for typical database operations, such as inserts, reads. The price is based on the capacity in RU/s that you reserve.

Storage is billed for each GB used for your stored data and index.



36

In this architecture, the function application fetches documents from Cosmos DB in response to HTTP GET requests from the client. Cosmos DB is cost effective in this case because reading operations are significantly cheaper than write operations expressed on RU/s.

Content Delivery Network

Billing rate may differ depending on the billing region based on the location of the source server delivering the content to the end user. The physical location of the client is not the billing region. Any HTTP or HTTPS request that hits the CDN is a billable event, which includes all response types: success, failure, or other. Different responses may generate different traffic amounts. To lower costs, consider increasing the cache TTL by caching resource files for a longer duration and setting the longest TTL possible on your content.


37

Multi-tier web application with HA and DR on Azure

Below section covers considerations for multi-tier (minimum 3 tier) web application with both HA and DR requirements.

Architecture


38

<<Above scenario demonstrates a multitier application that uses ASP.NET and Microsoft SQL Server. This diagram has to be changed as per requirement>>.

<<Considerations: We need to consider about

VM deployment within availability sets and replicating to target regions

Distribution strategy for VMs for each tier across regions to take care of DR and HA

Database considerations – For SQL server whether to go for Always on

Traffic Manager approach to redirect traffic

Need for public and internal load balancers

Azure Site recovery failover setup to take care of primary region failures>>

Some key points:

Deploying VMs - In Azure regions that support availability zones, deploy virtual machines (VMs) in a source region across availability zones and replicate the VMs to the target region used for disaster recovery. In Azure regions that don't support availability zones, we need to deploy VMs within an availability set and replicate the VMs to the target region.

• Distribute the VMs in each tier across two availability zones in regions that support zones. In other regions, deploy the VMs in each tier within one availability set.

• The database tier can be configured to use Always On availability groups. With this SQL Server configuration, one primary database within a cluster is configured with up to eight secondary databases. If an issue occurs with the primary database, the cluster fails over to one of the secondary databases, allowing the application to remain available.

• For disaster recovery scenarios, configure SQL Always On asynchronous native replication to the target region used for disaster recovery. We can also configure Azure Site Recovery replication to the target region if the data change rate is within supported limits of Azure Site Recovery.

• Users can access the front-end ASP.NET web tier via the traffic manager endpoint.

Components • Availability sets - ensure that the VMs you deploy on Azure are distributed across multiple

isolated hardware nodes in a cluster. With this setup, if a hardware or software failure occurs within Azure, only a subset of your VMs are affected and your entire solution remains available and operational.

• Availability zones - protect your applications and data from datacenter failures. Availability zones are separate physical locations within an Azure region.

• Azure Site Recovery - allows you to replicate VMs to another Azure region for business continuity and disaster recovery needs. You can conduct periodic disaster recovery drills to ensure you meet the compliance needs. The VM will be replicated with the specified settings to the selected region so that you can recover your applications in the event of outages in the source region.


39

• Azure Traffic Manager - is a DNS-based traffic load balancer that distributes traffic optimally to services across global Azure regions while providing high availability and responsiveness.

• Azure Load Balancer - distributes inbound traffic according to defined rules and health probes. A load balancer provides low latency and high throughput, scaling up to millions of flows for all TCP and UDP applications. A public load balancer is used in this scenario to distribute incoming client traffic to the web tier. An internal load balancer is used in this scenario to distribute traffic from the business tier to the back-end SQL Server cluster.

Alternatives • Windows can be replaced by other operating systems because nothing in the infrastructure is

dependent on the operating system. • SQL Server for Linux can replace the back-end data store. • The database can be replaced by any standard database application available.

Other considerations Scalability

You can add or remove VMs in each tier based on your scaling requirements. Because this scenario uses load balancers, you can add more VMs to a tier without affecting application uptime.

Security

All the virtual network traffic into the front-end application tier to be protected by network security groups.

Use rules to limit the flow of traffic so that only the front-end application tier VM instances can access the back-end database tier.

Better to block outbound internet traffic from the business tier or database tier. To reduce the attack footprint, no direct remote management ports are to be opened.

Pricing

Configuring disaster recovery for Azure VMs using Azure Site Recovery will incur the following charges on an ongoing basis.

• Azure Site Recovery licensing cost per VM. • Network egress costs to replicate data changes from the source VM disks to another Azure region.

Azure Site Recovery uses built-in compression to reduce the data transfer requirements by approximately 50%.

• Storage costs on the recovery site. This is typically the same as the source region storage plus any additional storage needed to maintain the recovery points as snapshots for recovery.


40

Microservices Architecture on AKS This below reference architecture shows a microservices application deployed to Azure Kubernetes

Service (AKS). It describes a basic AKS configuration that can be the starting point for most

deployments.

Architecture

Components

<<Typically for a Microservices Architecture on Azure, we would need to consider below components

• AKS

• Kubernetes Cluster

• Virtual Network

• Ingress

• Azure Load Balancer

• External Data Stores

• Azure Active Directory

• Azure Container Registry

• Azure Pipelines

• Helm


41

• Azure Monitor

>>

AKS is responsible for deploying the Kubernetes cluster and for managing the Kubernetes masters. We will only manage the agent nodes.

Virtual network. By default, we can make use of the virtual network created by AKS to deploy the agent nodes. For more advanced scenarios, we can create the virtual network first, which lets you control things like how the subnets are configured, on-premises connectivity, and IP addressing.

Ingress. HTTP(S) routes to services inside the cluster are exposed through Ingress

Azure Load Balancer. We would need the load balancer to route internet traffic to the ingress. An Azure Load Balancer is created when the NGINX ingress controller is deployed.

External data stores. External data stores, such as Azure SQL Database or Cosmos DB would be required to write/read respective Microservice state when required.

Azure Active Directory. AKS uses an Azure Active Directory (Azure AD) identity to create and manage other Azure resources such as Azure load balancers. Azure AD is also recommended for user authentication in client applications.

Azure Container Registry. Use Container Registry to store private Docker images, which are deployed to the cluster. AKS can authenticate with Container Registry using its Azure AD identity. Note that AKS does not require Azure Container Registry. You can use other container registries, such as Docker Hub.

Azure Pipelines. Pipelines is part of Azure DevOps Services and runs automated builds, tests, and deployments. You can also use third-party CI/CD solutions such as Jenkins.

Helm. Helm is as a package manager for Kubernetes — a way to bundle Kubernetes objects into a single unit that you can publish, deploy, version, and update.

Azure Monitor. Azure Monitor collects and stores metrics and logs, including platform metrics for the Azure services in the solution and application telemetry. Use this data to monitor the application, set up alerts and dashboards, and perform root cause analysis of failures. Azure Monitor integrates with AKS to collect metrics from controllers, nodes, and containers, as well as container logs and master node logs.


42

Design considerations

This reference architecture is focused on microservices architectures, although many of the recommended practices will apply to other workloads running on AKS.

Microservices

Microservices typically communicate through well-defined APIs and are discoverable through some form of service discovery. The service should always be reachable even when the pods move around.

The Kubernetes Service object is a natural way to model microservices in Kubernetes.

API gateway

API gateways are a general microservices design pattern. An API gateway sits between external clients and the microservices. It acts as a reverse proxy, routing requests from clients to microservices. It may also perform various cross-cutting tasks such as authentication, SSL termination, and rate limiting.

In Kubernetes, the functionality of an API gateway is mostly handled by the Ingress resource and the Ingress controller.

Data storage

Avoid storing persistent data in local cluster storage, because that ties the data to the node. Instead,

• Use an external service such as Azure SQL Database or Cosmos DB, or • Mount a persistent volume using Azure Disks or Azure Files. Use Azure Files if the same

volume needs to be shared by multiple pods.

Service object

The Kubernetes Service object provides a set of capabilities that match the microservices requirements for service discoverability:

• IP address - The Service object provides a static internal IP address for a group of pods (ReplicaSet). As pods are created or moved around, the service is always reachable at this internal IP address.

• Load balancing - Traffic sent to the service's IP address is load balanced to the pods.


43

• Service discovery - Services are assigned internal DNS entries by the Kubernetes DNS service. That means the API gateway can call a backend service using the DNS name. The same mechanism can be used for service-to-service communication. The DNS entries are organized by namespace, so if your namespaces correspond to bounded contexts, then the DNS name for a service will map naturally to the application domain.

The actual mapping to endpoint IP addresses and ports is done by kube-proxy, the Kubernetes network proxy.

Ingress

In Kubernetes, the Ingress controller might implement the API gateway pattern. In that case, Ingress and Ingress controller work in conjunction to provide these features:

• Route client requests to the right backend services. This provides a single endpoint for clients, and helps to decouple clients from services.

• Aggregate multiple requests into a single request • Offload functionality from the backend services, such as SSL termination, authentication,

IP whitelisting, or client rate limiting (throttling).

Ingress abstracts the configuration settings for a proxy server. You also need an Ingress controller, which provides the underlying implementation of the Ingress. There are Ingress controllers for Nginx, HAProxy, Traefik, and Azure Application Gateway, among others.

The Ingress resource can be fulfilled by different technologies. To work together, they need to be deployed as the Ingress controller inside the cluster. It operates as the edge router or reverse proxy. A reverse proxy server is a potential bottleneck or single point of failure, so always deploy at least two replicas for high availability.

Often, configuring the proxy server requires complex files, which can be hard to tune if you aren't an expert. So, the Ingress controller provides a nice abstraction. The Ingress controller also has access to the Kubernetes API, so it can make intelligent decisions about routing and load balancing. For example, the Nginx ingress controller bypasses the kube-proxy network proxy.

On the other hand, if you need complete control over the settings, you may want to bypass this abstraction and configure the proxy server manually.

For AKS, you can also use Azure Application Gateway, using the Application Gateway Ingress Controller. This option requires CNI networking to be enabled when you configure the AKS cluster, because Application Gateway is deployed into a subnet of the AKS virtual network. Azure Application Gateway can perform layer-7 routing and SSL termination. It also has built-in support for web application firewall (WAF).


44

TLS/SSL encryption

In common implementations, the Ingress controller is used for SSL termination. So, as part of deploying the Ingress controller, you need to create a TLS certificate. Only use self-signed certificates for dev/test purposes.

For production workloads, get signed certificates from trusted certificate authorities (CA).

You may also need to rotate your certificates as per the organization's policies.

Namespaces

Use namespaces to organize services within the cluster. Every object in a Kubernetes cluster belongs to a namespace. By default, when you create a new object, it goes into the default namespace.

It is a good practice to create namespaces that are more descriptive to help organize the resources in the cluster.

Namespaces help prevent naming collisions with possibly hundreds of microservices. In addition, namespaces allow you to:

• Apply resource constraints to a namespace, so that the total set of pods assigned to that namespace cannot exceed the resource quota of the namespace.

• Apply policies at the namespace level, including RBAC and security policies.

For a microservices architecture, consider:

• organizing the microservices into bounded contexts, and creating namespaces for each bounded context. Alternatively, create a namespace for each development team.

• place utility services into their own separate namespace.

Autoscaling

Kubernetes supports scale-out at two levels:

• Scale the number of pods allocated to a deployment. • Scale the nodes in the cluster, to increase the total compute resources available to the cluster.

Although you can scale out pods and nodes manually, we recommend using autoscaling, to minimize the chance that services will become resource starved under high load. An autoscaling strategy must take both pods and nodes into account. If you just scale out the pods, eventually you will reach the resource limits of the nodes.


45

Pod autoscaling

The Horizontal Pod Autoscaler (HPA) scales pods based on observed CPU, memory, or custom metrics. To configure horizontal pod scaling, you specify a target metric (for example, 70% of CPU), and the minimum and maximum number of replicas. You should load test your services to derive these numbers.

A side-effect of autoscaling is that pods may be created or evicted more frequently, as scale-out and scale-in events happen. To mitigate the effects of this:

• Use readiness probes to let Kubernetes know when a new pod is ready to accept traffic. • Use pod disruption budgets to limit how many pods can be evicted from a service at a time.

Cluster autoscaling

The cluster autoscaler scales the number of nodes. If pods can't be scheduled because of resource constraints, the cluster autoscaler will provision more nodes.

Whereas HPA looks at actual resources consumed or other metrics from running pods, the cluster autoscaler is provisioning nodes for pods that aren't scheduled yet. Therefore, it looks at the requested resources, as specified in the Kubernetes pod spec for a deployment. Use load testing to fine-tune these values.

You can't change the VM size after you create the cluster, so you should do some initial capacity planning to choose an appropriate VM size for the agent nodes when you create the cluster.

Health probes

Kubernetes defines two types of health probe that a pod can expose:

• Readiness probe: Tells Kubernetes whether the pod is ready to accept requests. • Liveness probe: Tells Kubernetes whether a pod should be removed and a new instance

started.

When thinking about probes, it's useful to recall how a service works in Kubernetes. A service has a label selector that matches a set of (zero or more) pods. Kubernetes load balances traffic to the pods that match the selector. Only pods that started successfully and are healthy receive traffic. If a container crashes, Kubernetes kills the pod and schedules a replacement.

Sometimes, a pod may not be ready to receive traffic, even though the pod started successfully. For example, there may be initialization tasks, where the application running in the container


46

loads things into memory or reads configuration data. To indicate that a pod is healthy but not ready to receive traffic, define a readiness probe.

Liveness probes handle the case where a pod is still running, but is unhealthy and should be recycled. For example, suppose that a container is serving HTTP requests but hangs for some reason. The container doesn't crash, but it has stopped serving any requests. If you define an HTTP liveness probe, the probe will stop responding and that informs Kubernetes to restart the pod.

Here are some considerations when designing probes:

• If your code has a long startup time, there is a danger that a liveness probe will report failure before the startup completes. To prevent this, use the initialDelaySeconds setting, which delays the probe from starting.

• A liveness probe doesn't help unless restarting the pod is likely to restore it to a healthy state. You can use a liveness probe to mitigate against memory leaks or unexpected deadlocks, but there's no point in restarting a pod that's going to immediately fail again.

• Sometimes readiness probes are used to check dependent services. For example, if a pod has a dependency on a database, the probe might check the database connection. However, this approach can create unexpected problems- creating cascading failures upstream. A better approach is to implement retry handling within your service, so that your service can recover correctly from transient failures.

Resource constraints

Resource contention can affect the availability of a service. Define resource constraints for containers, so that a single container cannot overwhelm the cluster resources (memory and CPU). For non-container resources, such as threads or network connections, consider using the Bulkhead Pattern to isolate resources.

Use resource quotas to limit the total resources allowed for a namespace. That way, the front end can't starve the backend services for resources or vice-versa.

Role based access control (RBAC)

Kubernetes and Azure both have mechanisms for role-based access control (RBAC):

• Azure RBAC controls access to resources in Azure, including the ability to create new Azure resources. Permissions can be assigned to users, groups, or service principals. (A service principal is a security identity used by applications.)

• Kubernetes RBAC controls permissions to the Kubernetes API. For example, creating pods and listing pods are actions that can be authorized (or denied) to a user through RBAC. To assign Kubernetes permissions to users, you create roles and role bindings:


47

o A Role is a set of permissions that apply within a namespace. Permissions are defined as verbs (get, update, create, delete) on resources (pods, deployments, etc.).

o A RoleBinding assigns users or groups to a Role. o There is also a ClusterRole object, which is like a Role but applies to the entire cluster,

across all namespaces. To assign users or groups to a ClusterRole, create a ClusterRoleBinding.

AKS integrates these two RBAC mechanisms. When you create an AKS cluster, you can configure it to use Azure AD for user authentication.

Once this is configured, a user who wants to access the Kubernetes API (for example, through kubectl) must sign in using their Azure AD credentials.

By default, an Azure AD user has no access to the cluster. To grant access, the cluster administrator creates RoleBindings that refer to Azure AD users or groups. If a user doesn't have permissions for a particular operation, it will fail.

An AKS cluster actually has two types of credentials for calling the Kubernetes API server: cluster user and cluster admin. The cluster admin credentials grant full access to the cluster. The Azure CLI command az aks get-credentials --admin downloads the cluster admin credentials and saves them into your kubeconfig file. The cluster administrator can use this kubeconfig to create roles and role bindings.

Because the cluster admin credentials are so powerful, use Azure RBAC to restrict access to them:

• The "Azure Kubernetes Service Cluster Admin Role" has permission to download the cluster admin credentials. Only cluster administrators should be assigned to this role.

• The "Azure Kubernetes Service Cluster User Role" has permission to download the cluster user credentials. Non-admin users can be assigned to this role. This role does not give any particular permissions on Kubernetes resources inside the cluster — it just allows a user to connect to the API server.

When you define your RBAC policies (both Kubernetes and Azure), think about the roles in your organization:

• Who can create or delete an AKS cluster and download the admin credentials? • Who can administer a cluster? • Who can create or update resources within a namespace?

It's a good practice to scope Kubernetes RBAC permissions by namespace, using Roles and RoleBindings, rather than ClusterRoles and ClusterRoleBindings.

Finally, there is the question of what permissions the AKS cluster has to create and manage Azure resources, such as load balancers, networking, or storage. To authenticate itself with


48

Azure APIs, the cluster uses an Azure AD service principal. If you don't specify a service principal when you create the cluster, one is created automatically. However, it's a good security practice to create the service principal first and assign the minimal RBAC permissions to it.

Secrets management and application credentials

Applications and services often need credentials that allow them to connect to external services such as Azure Storage or SQL Database. The challenge is to keep these credentials safe and not leak them.

For Azure resources, one option is to use managed identities. The idea of a managed identity is that an application or service has an identity stored in Azure AD, and uses this identity to authenticate with an Azure service. The application or service has a Service Principal created for it in Azure AD, and authenticates using OAuth 2.0 tokens. The executing process calls a localhost address to get the token. That way, you don't need to store any passwords or connection strings. You can use managed identities in AKS by assigning identities to individual pods, using the aad-pod-identity project.

Currently, not all Azure services support authentication using managed identities.

Even with managed identities, you'll probably need to store some credentials or other application secrets, whether for Azure services that don't support managed identities, third-party services, API keys, and so on. Here are some options for storing secrets securely:

• Azure Key Vault. In AKS, you can mount one or more secrets from Key Vault as a volume. The volume reads the secrets from Key Vault. The pod can then read the secrets just like a regular volume.

The pod authenticates itself by using either a pod identity (described above) or by using an Azure AD Service Principal along with a client secret. Using pod identities is recommended because the client secret isn't needed in that case.

• HashiCorp Vault. Kubernetes applications can authenticate with HashiCorp Vault using Azure AD managed identities. You can deploy Vault itself to Kubernetes, consider running it in a separate dedicated cluster from your application cluster.

• Kubernetes secrets. Another option is simply to use Kubernetes secrets. This option is the easiest to configure but has some challenges. Secrets are stored in etcd, which is a distributed key-value store. AKS encrypts etcd at rest. Microsoft manages the encryption keys.

Using a system like HashiCorp Vault or Azure Key Vault provides several advantages, such as:

• Centralized control of secrets. • Ensuring that all secrets are encrypted at rest. • Centralized key management.


49

• Access control of secrets. • Auditing

Pod and container security

This list is certainly not exhaustive, but here are some recommended practices for securing your pods and containers:

Don't run containers in privileged mode. Privileged mode gives a container access to all devices on the host.

When possible, avoid running processes as root inside containers. Containers do not provide complete isolation from a security standpoint, so it's better to run a container process as a non-privileged user.

Store images in a trusted private registry, such as Azure Container Registry or Docker Trusted Registry. Use a validating admission webhook in Kubernetes to ensure that pods can only pull images from the trusted registry.

Scan images and running containers for known vulnerabilities, using a scanning solution such as Twistlock and Aqua, which are available through the Azure Marketplace.

Automate image patching using ACR Tasks, a feature of Azure Container Registry. A container image is built up from layers. The base layers include the OS image and application framework images, such as ASP.NET Core or Node.js. The base images are typically created upstream from the application developers, and are maintained by other project maintainers. When these images are patched upstream, it's important to update, test, and redeploy your own images, so that you don't leave any known security vulnerabilities. ACR Tasks can help to automate this process.

Deployment (CI/CD) considerations

Here are some goals of a robust CI/CD process for a microservices architecture:

• Each team can build and deploy the services that it owns independently, without affecting or disrupting other teams.

• Before a new version of a service is deployed to production, it gets deployed to dev/test/QA environments for validation. Quality gates are enforced at each stage.

• A new version of a service can be deployed side by side with the previous version. • Sufficient access control policies are in place. • For containerized workloads, you can trust the container images that are deployed to production.

Container best practices


50

Here are some other best practices to consider for containers:

• Define organization-wide conventions for container tags, versioning, and naming conventions for resources deployed to the cluster (pods, services, and so on). That can make it easier to diagnose deployment issues.

• During the development and test cycle, the CI/CD process will build many container images. Only some of those images are candidates for release, and then only some of those release candidates will get promoted to production. Have a clear versioning strategy, so that you know which images are currently deployed to production, and can roll back to a previous version if necessary.

• Always deploy specific container version tags, not latest. • Use namespaces in Azure Container Registry to isolate images that are approved for

production from images that are still being tested. Don't move an image into the production namespace until you're ready to deploy it into production. If you combine this practice with semantic versioning of container images, it can reduce the chance of accidentally deploying a version that wasn't approved for release.

• Follow the principle of least privilege by running containers as a nonprivileged user. In Kubernetes, you can create a pod security policy that prevents containers from running as root.

Helm charts

Consider using Helm to manage building and deploying services. Here are some of the features of Helm that help with CI/CD:

• Often a single microservice is defined by multiple Kubernetes objects. Helm allows these objects to be packaged into a single Helm chart.

• A chart can be deployed with a single Helm command, rather than a series of kubectl commands. • Charts are explicitly versioned. Use Helm to release a version, view releases, and roll back to a

previous version. Tracking updates and revisions, using semantic versioning, along with the ability to roll back to a previous version.

• Helm charts use templates to avoid duplicating information, such as labels and selectors, across many files.

• Helm can manage dependencies between charts. • Charts can be stored in a Helm repository, such as Azure Container Registry, and integrated into

the build pipeline.

A single microservice may involve multiple Kubernetes configuration files. Updating a service can mean touching all of these files to update selectors, labels, and image tags. Helm treats these as a single package called a chart and allows you to easily update the YAML files by using variables. Helm uses a template language (based on Go templates) to let you write parameterized YAML configuration files.

Although your CI/CD pipeline could install a chart directly to Kubernetes, we recommend creating a chart archive (.tgz file) and pushing the chart to a Helm repository such as Azure Container Registry.


51

Consider deploying Helm to its own namespace and using role-based access control (RBAC) to restrict which namespaces it can deploy to.

Helm Revisions

Helm charts always have a version number, which must use semantic versioning. A chart can also have an appVersion. This field is optional, and doesn't have to be related to the chart version. Some teams might want to application versions separately from updates to the charts. But a simpler approach is to use one version number, so there's a 1:1 relation between chart version and application version. That way, you can store one chart per release and easily deploy the desired release.

Another good practice is to provide a change-cause annotation in the deployment template which lets you view the change-cause field for each revision, using the kubectl rollout

history command.

Use the --history-max flag when initializing Helm. This setting limits the number of revisions that Tiller saves in its history. Tiller stores revision history in configmaps. If you're releasing updates frequently, the configmaps can grow very large unless you limit the history size.

Azure DevOps Pipeline for Microservices on Kubernetes

In Azure Pipelines, pipelines are divided into build pipelines and release pipelines. The build pipeline runs the CI process and creates build artifacts.

For a microservices architecture on Kubernetes, these artifacts are the container images and Helm charts that define each microservice. The release pipeline runs that CD process that deploys a microservice into a cluster.

A typical build pipeline might consist of the following tasks:

1. Build the test runner container 2. Run the tests, by invoking docker run against the test runner container 3. Publish the test results 4. Build the runtime container 5. Push to the container to Azure Container Registry (or other container registry) 6. Package the Helm chart 7. Push the Helm package to Azure Container Registry (or other Helm repository)

The following diagram shows a typical end-to-end CI/CD process for Microservices on Kubernetes


52

The output from the CI pipeline is a production-ready container image and an updated Helm chart for the microservice. At this point, the release pipeline can take over. It performs the following steps:

• Deploy to dev/QA/staging environments. • Wait for an approver to approve or reject the deployment. • Retag the container image for release • Push the release tag to the container registry. • Upgrade the Helm chart in the production cluster

Cost considerations

Use the Azure pricing calculator to estimate costs. Other considerations are described in the Cost section in Azure Architecture Framework.

Here are some points to consider for some of the services used in this architecture.

Azure Kubernetes Service (AKS)

There are no costs associated for AKS in deployment, management, and operations of the Kubernetes cluster. You only pay for the virtual machines instances, storage, and networking resources consumed by your Kubernetes cluster.

To estimate the cost of the required resources please see the Container Services calculator.


https://docs.microsoft.com/en-us/azure/architecture/framework/cost/overview

https://azure.microsoft.com/pricing/calculator/?service=kubernetes-service


53

Azure Load balancer

You are charged only for the number of configured load-balancing and outbound rules. Inbound NAT rules are free. There is no hourly charge for the Standard Load Balancer when no rules are configured.

See Azure Load Balancer Pricing for more information.

Azure DevOps Services

This reference architecture only uses Azure Pipelines. Azure offers the Azure Pipeline as an individual Service. You are allowed a free Microsoft-hosted job with 1,800 minutes per month for CI/CD and 1 self-hosted job with unlimited minutes per month, extra jobs have charges. For more information, see Azure DevOps Services Pricing.

Azure Monitor

For Azure Monitor Log Analytics, you are charged for data ingestion and retention. For more information, see Azure Monitor Pricing for more information.

Exclusions We have purposefully excluded deep diving into different design and integration patterns and

focused mainly on Azure Web Application / Microservices architectural styles. We have also

excluded detailing on other dependent .Net / Azure services (e.g. Event Hubs, Ingestion

framework etc.) which will go beyond the scope of this document.

Also, we have not focused on pricing aspects as it’s directly dealt with by product provider

(Microsoft) and a rough estimate is available through Azure Pricing online tool itself.

Conclusion We would suggest to use this document as a reference and use our recommendations as a

guidance.

There are many more Azure services and application styles available in Azure platform. It would

require separate documents to cover specific Architectures around AI / Data / ML etc.

References We have extracted and consolidated data points across various articles covered in MS Azure

Architecture center and documented them as one single document within the defined scope.

MS Azure Architecture center URL: https://docs.microsoft.com/en-us/azure/architecture/

Acknowledgement All diagrams in this document are taken from Microsoft sites or knowledgebases.

https://azure.microsoft.com/pricing/details/load-balancer

https://azure.microsoft.com/pricing/details/devops/azure-devops-services

https://azure.microsoft.com/pricing/details/monitor

https://docs.microsoft.com/en-us/azure/architecture/

azure architecture reference template for web applications...azure architecture reference template...

Documents