At Microsoft, we rely on the Microsoft Azure cloud and related services to manage 98% of our IT infrastructure, which has resulted in significant benefits across the organization.
At Microsoft, we are proudly a cloud-first organization: Today, 98% of our IT infrastructure—which serves more than 200,000 employees and incorporates over 750,000 managed devices—runs on the Microsoft Azure cloud.
The company’s massive transition from traditional datacenters to a cloud-native infrastructure on Azure has fundamentally reshaped our IT operations. By adopting a cloud-first, DevOps-driven model, we’ve realized significant gains in agility, scalability, reliability, operational efficiency, and cost savings.
“We’ve created a customer-focused, self-serve management environment centered around Azure DevOps and modern engineering principles,” says Pete Apple, a technical program manager and cloud architect in Microsoft Digital, the company’s IT organization. “It has really transformed how we do IT at Microsoft.”
“Our service teams don’t have to worry about the operating system. They just go to a website, fill in their info, add their data, and away they go. That’s a big advantage in terms of flexibility.”
Pete Apple, technical program manager and cloud architect, Microsoft Digital
What it means to move from the datacenter to the cloud
Historically, our IT environment was anchored in centralized, on-premises datacenters. The initial phase of our cloud transition involved a lift-and-shift approach, migrating workloads to Azure’s infrastructure as a service (IaaS) offerings. Over time, the company evolved toward more of a decentralized, platform as a service (PaaS) DevOps model.
“In the last six or seven years we’ve seen a lot more focus on PaaS and serverless offerings,” says Faisal Nasir, a principal architect in Microsoft Digital. “The evolution is also marked by extensibility—the ability to create enterprise-grade applications in the cloud—and how we can design well-architected end-to-end services.”
Because we’ve moved nearly all our systems to the cloud, we have a very high level of visibility into our network operations, according to Nasir. We can now leverage Azure’s native observability platforms, extending them to enable end-to-end monitoring, debugging, and data collection on service usage and performance. This capability supports high-quality operations and continuous improvement of cloud services.
“Observability means having complete oversight in terms of monitoring, assessments, compliance, and actionability,” Nasir says. “It’s about being able to see across all aspects of our systems and our environments, and even from a customer lens.”
Decentralizing our IT services with Azure
As Microsoft was becoming a cloud-first organization, the nature of the cloud and how we use it changed. As Microsoft Azure matured and more of our infrastructure and services moved to the cloud, we began to move away from IT-owned applications and services.
The strengths of the Azure self-service and management features means that individual business groups can handle many of the duties that Microsoft Digital formerly offered as an IT service provider—which enables each group to build agile solutions to match their specific needs.
“Our goal with our modern cloud infrastructure continues to be a solution that transforms IT tasks into self-service, native cloud solutions for monitoring, management, backup, and security across our entire environment,” Apple says. “This way, our business groups and service lines have reliable, standardized management tools, and we can still maintain control over and visibility into security and compliance for our entire organization.”
The benefits to our businesses of this decentralized model of IT services include:
Empowered, flexible DevOps teams
A native cloud experience: subscription owners can use features as soon as they’re available
Freedom to choose from marketplace solutions
Minimal subscription limit issues
Greater control over groups and permissions
Better insights into Microsoft Azure provisioning and subscriptions
Business group ownership of billing and capacity management
“With the PaaS model, and SaaS (software as a service), it’s more DIY,” Apple says. “Our service teams don’t have to worry about the operating system. They just go to a website, fill in their info, add their data, and away they go. That’s a big advantage in terms of flexibility.”
“The idea of centralized monitoring is gone. The new approach is that service teams monitor their own applications, and they know best how to do that.”
Cory Delamarter, principal software engineering manager, Microsoft Digital
Leveraging the power of Azure Monitor
Microsoft Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to monitoring data from cloud and on-premises environments. Across Microsoft, we use Azure Monitor to ensure the highest level of reliability for our services and applications.
Specifically, we rely on Azure Monitor to:
Create visibility. There’s instant access to fundamental metrics, alerts, and notifications across core Azure services for all business units. Azure Monitor also covers production and non-production environments as well as native monitoring support across Microsoft Azure DevOps.
Provide insight. Business groups and service lines can view rich analytics and diagnostics across applications and their compute, storage, and network resources, including anomaly detection and proactive alerting.
Enable optimization. Monitoring results help our business groups and service lines understand how users are engaging with their applications, identify sticking points, develop cohorts, and optimize the business impact of their solutions.
Deliver extensibility. Azure Monitor is designed for extensibility to enable support for custom event ingestion and broader analytics scenarios.
Because we’ve moved to a decentralized IT model, much of the monitoring work has moved to the service team level as well.
“The idea of centralized monitoring is gone,” says Cory Delamarter, a principal software engineering manager in Microsoft Digital. “The new approach is that service teams monitor their own applications, and they know best how to do that.”
Patching and updating, simplified
Moving our operations to the cloud also means a simpler and more automated approach to patching and updating. The shift to PaaS and serverless networking has allowed us to manage infrastructure patching centrally, which is much more scalable and efficient. The extensibility of our cloud platforms reduces integration complexity and accelerates deployment.
“It depends on the model you’re using,” Nasir says. “With the PaaS and serverless networks, the service teams don’t need to worry about patching. With hybrid infrastructure systems, being in the cloud helps with automation of patching and updating. There’s a lot of reusable automation layers that help us build end-to-end patching processes in a faster and more reliable manner.”
Apple stresses the flexibility that this offers across a large organization when it comes to allowing teams to choose how they do their patching and updating.
“In the datacenter days, we ran our own centralized patching service, and we picked the patching windows for the entire company,” Apple says. “By moving to more automated self-service, we provide the tools and the teams can pick their own patching windows. That also allowed us to have better conversations, asking the teams if they want to keep doing the patching or if they want to move up the stack and hand it off to us. So, we continue to empower the service teams to do more and give them that flexibility.”
Securing our infrastructure in a cloud-first environment
As security has become an absolute priority for Microsoft, it’s also been a foundational element of our cloud strategy.
Being a cloud-first company has made it easier to be a security-first organization as well.
“The cloud enables us to embed security by design into everything we build,” Nasir says. “At enterprise scale, adopting Zero Trust and strong governance becomes seamless, with controls engineered in from the start, not retrofitted later. That same foundation also prepares us for an AI-first future, where resilience, compliance, and automation are built into every system.”
Cloud-native security features combined with integrated observability allow for better compliance and risk management. Delamarter agrees that the cloud has had huge benefits when it comes to enhancing network security.
“Our code lives in repositories now, and so there’s a tremendous amount of security governance that we’ve shifted upstream, which is huge,” Delamarter says. “There are studies that show that the earlier you can find defects and address them, the less expensive they are to deal with. We’re able to catch security issues much earlier than before.”
“There are less and less manual actions required, and we’re automating a lot of business processes. It basically gives us a huge scale of automation on top of the cloud.”
Faisal Nasir, principal architect, Microsoft Digital
We use Azure Policy, which helps enforce organizational standards and assess compliance at scale using dashboards and other monitoring tools.
“Azure Policy was a key part of our security approach, because it essentially offers guardrails—a set of rules that says, ‘Here’s the defaults you must use,’” Apple says. “You have to use a strong password, for example, and it has to be tied to an Azure Active Directory ID. We can dictate really strong standards for everything and mandate that all our service teams follow these rules.”
AI-driven operations in the cloud
Just like its impact on the rest of the technology world, AI is in the process of transforming infrastructure management at Microsoft. Tasks that used to be manual and laborious are being automated in many areas of the company, including network operations.
“AI is creating a new interface of agents that allow users to interact with large ecosystems of applications, and there’s much easier and more scalable integration,” says Nasir. “There are less and less manual actions required, and we’re automating a lot of business processes. Microsoft 365 Copilot, Security Copilot, and other AI tools are giving us shared compute and extensibility to produce different agents. It basically gives us a huge scale of automation on top of the cloud.”
Apple notes that powerful AI tools can be combined with the incredible amount of data that the Microsoft IT infrastructure generates to gain insights that simply weren’t possible before.
“We can integrate AI with our infrastructure data lakes and use tools like Network Copilot to query the data using natural language,” Apple says. “I can ask questions like, ‘How many of our virtual machines need to be patched?’ and get an answer. It’s early, and we’re still experimenting, but the potential to interact with this data in a more automated fashion is exciting.”
Ultimately, Microsoft has become a cloud-first company, and that has allowed us to work toward an AI-first mentality in everything we do.
“Having a complete observability strategy across our infrastructure modernization helps us to make sure that whatever changes we’re making, we have a design-first approach and a cloud-first mindset,” Nasir says. “And now that focus is shifting towards an AI-first mindset as well.”
Key takeaways
Here are some of the benefits we’ve accrued by becoming a cloud-first IT organization at Microsoft:
Transformed operations: By moving from our legacy on-premises datacenters, through Azure’s infrastructure as a service (IaaS) offerings, and eventually to a platform as a service (PaaS) DevOps model, we’ve reaped great gains in reliability, efficiency, scalability, and cost savings.
A clear view: With 98% of our organization’s IT infrastructure running in the Azure cloud, we have a huge level of observability into our systems—complete oversight into network assessment, monitoring, compliance, patching/updating, and many other aspects of operations.
Empowered teams: Operating a cloud-first environment allows us to have a more decentralized approach to IT infrastructure. This means we can offer our business groups and service lines more self-service, cloud-native solutions for monitoring, management, patching, and backup while still maintaining control over and visibility into security and compliance for our entire organization.
Seamless updates: The shift to PaaS and serverless networking has enabled a more planned and automated approach to patching and updating our infrastructure, which produces greater efficiency, integration, and speed of deployment.
Dependable security: Our cloud environment has allowed us to implement security by design, including tighter control over code repositories and the use of standard security policies across the organization with Azure Policy.
Future-proof infrastructure: As we shift to an AI-first mindset across Microsoft, we’re using AI-driven tools to enhance and maintain our native cloud infrastructure and adopt new workflows that will continue to reap dividends for our employees and our organization.
Microsoft Digital stories The rate of change in IT is accelerating at a blistering pace. AI-powered capabilities like Microsoft 365 Copilot have enabled a new era of employee productivity. Today, agentic capabilities are supercharging IT…
Microsoft Digital readiness guide AI transformation is one of the most profound business changes in decades. Making the most of AI tools will require careful planning, thoughtful communication, comprehensive employee enablement, and diligent tracking. Fortunately,…
Microsoft Digital stories Editor’s note: This story was created with the help of artificial intelligence. To learn more about how Inside Track is using the power of generative AI to augment our human staff, see…
Microsoft Digital stories Traditionally, missing a meeting—or even just a part of one—could mean being left behind while the rest of the team pressed forward. Referring to prior meetings was tedious, requiring employees to sift…
Microsoft Digital stories We didn’t just hope Microsoft 365 Copilot for Sales would make a difference for our sales team here at Microsoft—we measured it. The results were outstanding. In the first few months after…
Microsoft Digital stories In just a few short years, work has changed in fundamental ways. The rapid rise of AI has brought an entirely new mindset to the workplace. To adapt to that shifting mindset,…
Microsoft Digital stories The modern enterprise network is complex, to say the least. Enterprises like ours are increasingly adopting hybrid infrastructures that span on-premises data centers, multiple cloud environments, and a diverse array of remote users. In this context, traditional security tools are still playing checkers while the malicious actors are playing chess. To make… Read more
At Microsoft, we operate one of the world’s largest IT infrastructures. So, when we embarked on the journey nearly a decade ago to move from a primarily on-premises network of physical servers to one that now operates almost entirely in the Azure cloud, it was a mammoth undertaking. And like all long and rewarding journeys,… Read more
In today’s rapidly evolving technology landscape, maintaining network reliability is paramount. Our team in Microsoft Digital, the company’s IT organization, keeps the company connected and maintains foundational network services for all our employees and guests. With an environment comprising 64,000 network devices, over 700 buildings supporting 350,000 users with over 1 million connected devices generating… Read more
The digital transformation of Microsoft spans the entire personal computing revolution, from the days of DOS and early Windows desktops, through our journey to the Azure cloud, to our modern engineering era highlighted by the rise of AI. The company has grown into a global organization with more than 220,000 employees, all relying on us… Read more