BlakYaks

View Original

Managing large-scale cloud infrastructure platforms with code 

Are you running large scale Cloud Infrastructure platforms? Manage entirely with code! 

Cloud engineering teams invest significant time, energy and money automating the deployment of their cloud platforms using modern Infrastructure as Code (IaC) tools and methods. However, the value of this upfront investment is often eroded in scenarios where traditional operations teams make subsequent changes using the Azure admin UIs or with PowerShell scripts. 

If your cloud platforms have any genuine scale to them, you should ensure your teams manage these platforms with code throughout their entire life cycle.  It does not matter what those platforms are, you must deploy and maintain them with code. This includes but is not limited to: 

  • Azure landing zones (tenants, subscriptions, resource groups). 

  • AKS container platforms. 

  • Azure Red Hat OpenShift platforms. 

  • Azure-hosted data platforms (e.g. Azure data warehouse, Databricks). 

  • Azure ETL data solutions. 

  • IaaS / VM platforms. 

  • Integrated PaaS solutions. 

Be rigorous in ensuring that your modern cloud platforms are managed entirely with code and resist all temptation to use the cloud platform provider’s user interface and admin tools. The moment you begin to make changes using those tools you immediately start to invalidate and erode the value of the deployment code. 

Integrating IaC code libraries into mature DevSecOps processes and CICD pipelines

Also make sure these IaC code libraries are integrated into mature DevSecOps processes and CICD pipelines. Do these things well and the business benefits become crystal clear. During my 30 years working with, for and alongside CIOs, there is always a common set of strategic themes and major considerations / challenges that surface when managing large-scale technology infrastructure.

These challenges typically prevail across the full spectrum of hardware, software and services: 

  • Tight management and optimisation of the infrastructure cost base. 

  • Controlling and managing platform risks (e.g., security and compliance). 

  • Ensuring that platforms remain stable as they scale, and that capacity is understood and well-managed.  

  • Providing the business with new features, new capabilities that provide competitive edge (e.g., more scale, faster to market, better online experiences, simplified operations). 

  • Responding to growing business demand for new services whilst managing all of the above. 

There are clear benefits in making the switch to IaC-managed platforms that should be a key part of senior platforms teams strategy.  

Key business benefits in managing cloud platforms entirely with code 

Key Business Benefit 1: Platform Stability and Risk Management improvements through:

  • Reduced level of human error due to improved process and policy rigour  

  • Policy controlled platform / code changes increases stability. 

  • Improved change governance reduces risk and increases stability. 

  • Improved security posture as updates can happen quickly and iteratively to address existing and emerging threats rapidly. 

  • All changes are planned, logged and managed with code improving deployment, rollback and creating an audit trail for improved change / fault diagnosis. 

  • Consistent, simple and reliable rollback to previous “known good” platform states. 

  • Quality state management process provides rich control of platforms and configuration states. 

  • Clear path from development release to production release using trusted, tested platform code base.

Key Business Benefit 2: Speed to market with new services/propositions: 

  • Deploy entire platforms quickly and easily in any target cloud venue provided by your chosen hyper-scaler/s (e.g. any supported region) .

  • Ability to deploy changes, enhancements and features quickly and safely with easy rollback. 

  • Provision new services to application development and product teams rapidly (e.g. to dev and sandbox environments). 

  • Able to respond rapidly to increased/changing business demands with rapid provisioning of services from a ready-made catalogue deployed with code. 

Key Business Benefit 3: Cost reduction and optimisation 

  • The process of turning services on and off when not in use is greatly simplified and removes unnecessary costs. 

  • The process of right-sizing services in accordance with changing requirements is simpllified, leading to cost savings. 

  • Low-cost disaster recovery and business continuity processes as platforms and services can be instantiated as and when required (and not sitting active when not in use). 

Conclusion: Treat IaC in a binary way 

It’s important to recognise that once you have committed to managing your cloud platforms with IaC, your team must be “all in”. When you have an IaC code base that caters for your own variety of common deployment scenarios, every team member that could make changes, must do so with code.

Just one person making changes in an admin / UI tool can erode the value of the code base that was created previously. Any change (no matter how small) that has to be made using a UI should be managed, treated as an exception, justified then replaced with code later.  

It’s entirely conceivable that in an emergency, someone may have an expedient fix to a real problem, but does not yet have the IaC skills to implement the change (e.g. operator on evening shift that needs to correct an errant configuration to return service to norm) and makes a change outside of the code base. In this scenario, the change via the UI may be justified, but it should be logged, and managed. Once the emergency has been resolved, that same change should be re-implemented via code. 

So commit to IaC.

Train your staff.

Work with expert partners to run the first mile.

And most of all, do not tolerate any change that is not made via code once you have started down this strategic path if you want to drive huge business benefits in the long run. 


ABOUT BLAKYAKS

BlakYaks specialise in delivering Enterprise Microsoft Azure platforms and solutions, that are entirely deployed and managed with code throughout their full life-cycle. While we focus largely on Azure platform solutions, we would advocate a similar strategy and set of principles for customers and partners that focus on other cloud hyper-scalers such as AWS and GCP. 

If you would like to discuss how BlakYaks can support your organisation's Azure transformation journey, please get in touch or book an introductory meeting.