How we track the market share of the major cloud platforms

Cloud market share is one of those topics where everyone has an opinion but almost nobody has real data. AWS, Azure, and Google Cloud all publish revenue numbers. Analysts publish estimates. But none of that tells you what’s actually happening at the company level: which cloud a specific company uses, when they started using it, and when they switched.

We wanted to fix that. To do that, we analyzed infrastructure signals across the 5 million biggest companies in the world (according to their Linkedin page). Here’s what we found, and how we did it. For those curious, you can explore our live data for AWS, Azure, and Google Cloud.

Why existing methods are flawed

The most common ways people track cloud usage are job postings, tools like BuiltWith or Wappalyzer, and self-reported data from surveys or review sites like G2.

Job postings are noisy. Engineers routinely laundry-list every technology they’ve ever heard of in job descriptions. A posting that mentions AWS, Azure, and GCP doesn’t tell you which one they actually use. And job postings are a lagging indicator — by the time a company posts a role, the infrastructure decision was made months ago.

BuiltWith and Wappalyzer detect JavaScript tags and pixels on marketing sites. They’re excellent at telling you what analytics tools or ad platforms a company uses. They’re not designed to detect infrastructure. A company can run their entire backend on Hetzner while their marketing site loads a Google Tag Manager snippet, and BuiltWith would classify them as a Google Cloud customer.

Self-reported data has obvious problems. Companies don’t always know exactly what they use, they underreport for competitive reasons, and surveys capture a snapshot in time rather than ongoing changes.

We needed a different approach. One that looked at actual infrastructure, not proxies for it.

How we actually did it

The core insight is this: certain subdomains are almost always company-owned infrastructure. When a company sets up grafana.company.com, jenkins.company.com, or gitlab.company.com, they’re making a deliberate engineering decision about where to run their tooling. These subdomains don’t get proxied through Cloudflare or fronted by a CDN the way marketing sites do. They point directly at the cloud provider running them.

We built a list of these infrastructure subdomains: grafana, jenkins, gitlab, argocd, vault, kibana, prometheus, sonarqube, netbox, rancher, mcp, n8n, litellm, and dozens more. For each company in our panel, we resolve the A records for these subdomains and match the IPs against the published IP ranges for 30+ cloud providers.

We supplement this with Cisco Umbrella’s top 1 million DNS logs — a daily ranking of the most queried domains on the internet. By scanning these logs for infrastructure-related subdomains, we discover company infrastructure endpoints we wouldn’t have found otherwise, with the added signal that high Umbrella ranking means real traffic volume.

We also analyze HTTP response headers to catch cases where a CDN masks the origin IP. Headers like x-amz-cf-id confirm AWS even when the A record points to CloudFront. x-ms-* headers confirm Azure. via: 1.1 google or server: Google Frontend confirm GCP. This lets us see through a significant portion of the CDN blind spot.

Filtering out wildcard noise

One of the trickier problems we had to solve was wildcard subdomains. Many companies set up a *.company.com wildcard DNS record that catches any subdomain and routes it to the same IP — for staging environments, customer subdomains, or CI/CD preview deployments. The problem is that a wildcard makes every subdomain resolve, including ones that don’t actually point to real infrastructure.

If grafana.company.com resolves to the same IP as randomnonsense.company.com, that’s not a real Grafana instance. It’s a wildcard catch-all. Attributing that IP to a cloud provider would be noise, not signal.

To handle this, we probe a random nonsense subdomain for every domain we scan. If the A records for grafana.company.com overlap with the A records for our nonsense probe, we discard it. Zero overlap means someone explicitly set up that subdomain — a near-certain sign of a deliberate infrastructure decision. We apply the same check against the apex domain, so we don’t misattribute a subdomain that’s just pointing at the company’s marketing site.

How we detect migrations

The first time we see a subdomain, we record it as a baseline — no event. When we see the same subdomain resolve to a different cloud provider on a subsequent crawl, we record a migration event.

The direction of migration matters enormously. A company that has run on Hetzner for years and suddenly has grafana.company.com resolving to AWS has crossed a psychological and architectural threshold. The first AWS workload is always the hardest. Once one team does it, others follow. That’s a high-signal sales event the day it happens.

The reverse is equally interesting. A company moving from AWS to OVH or Hetzner is signaling active cost optimization pressure. They have strong opinions about infrastructure, they’re willing to do the work to move, and they’re a very different buyer profile than a company migrating toward AWS.

The market share of all the major cloud platforms (what we found)

Across all the infrastructure subdomains we tracked, here’s the breakdown by cloud provider:

Cloud provider distribution across 2M companies

Number of infrastructure subdomains detected per provider

Source: Bloomberry.com

AWS dominates with 132,085 detected infrastructure subdomains — 51.1% of everything we tracked. That’s more than Azure, Google Cloud, and the next five providers combined. That’s not a surprise. But the scale of the gap is striking.

What is surprising is how strong the “on-prem spirit” cluster is. Hetzner (5.7%), OVH (5.1%), and DigitalOcean (4.8%) together account for 15.6% of all infrastructure subdomains we detected. These aren’t companies that don’t know about AWS. These are engineering teams that have made a deliberate choice to avoid hyperscaler abstractions — for cost reasons, for control reasons, or both. They manage their own servers, run their own hypervisors, and treat cloud like a hosting provider rather than a platform.

Azure (12.2%) and Google Cloud (11.2%) are closer than most people expect. Azure’s strength comes from enterprise Microsoft shops. Google Cloud’s footprint is harder to measure because many GCP workloads sit behind Cloudflare, which masks the origin IP. Our header analysis catches a portion of this, but GCP is likely somewhat undercounted relative to AWS and Azure.

Honest limitations

No methodology is perfect and we want to be upfront about ours.

Path-based APIs are invisible to us. A company that runs their API at company.com/api rather than api.company.com doesn’t show up in our subdomain scan. This is common with Rails and Laravel apps, which default to path-based API routing. We’re working on supplementing this with HTTP path probing.

GCP is likely undercounted. Several large GCP-hosted platforms front everything with Cloudflare, which masks GCP IPs completely. Our header analysis catches some of this via via: 1.1 google and server: Google Frontend signals, but not all of it.

We only see what’s publicly resolvable. Infrastructure on private networks, VPNs, or internal-only subdomains is invisible to us. Companies with strong network segmentation will be undercounted.

That said, we believe this is still the most accurate method available for tracking cloud infrastructure at company level and at scale. It measures what’s actually running, not what people claim to run or what shows up on their marketing site.

You can dig into the full data for AWS, Azure, and Google Cloud.

Leave a Reply

Your email address will not be published. Required fields are marked *