OpenAI, the company behind ChatGPT and other breakthrough AI models, is known for pushing technological boundaries. But one surprising part of OpenAI’s story is how much it leans on a tried-and-true technology: PostgreSQL. Postgres is the backbone of OpenAI’s most critical systems. In this blog, we’ll explore OpenAI’s PostgreSQL journey with Microsoft Azure—the challenges faced, the solutions implemented, and the impressive results achieved. More importantly, we’ll distill lessons you can use to scale your database.
The beginning: Initial architecture focused on simplicity
What is PostgreSQL?
From early on, OpenAI used Azure Database for PostgreSQL, which spared the team from low-level database maintenance while providing important features like automated backups and high availability. The architecture was initially simple: one primary Postgres instance handled writes, with multiple read-only replicas to shoulder the heavy read traffic. This classic primary-replica setup worked well through OpenAI’s early growth.
For read-intensive workloads, this single-shard approach was a big win. Read scalability was excellent, thanks to dozens of replicas that the team could add as needed. Each replica is a live copy of the primary, so spreading out read queries among them allowed OpenAI to serve millions of users with low latency. Geographic distribution of replicas even enabled snappy read performance for users around the world. It’s a showcase of how cloud-managed Postgres can scale out reads efficiently.
However, as usage of ChatGPT and other services grew, the limits of this design were tested. Write requests became a growing bottleneck. All write operations had to funnel into the single primary database. As traffic surged, a few incidents occurred where database performance affected OpenAI’s services. These events were wake-up calls to implement new strategies to support read and write scale-out for their PostgreSQL workloads.
Scaling up with PostgreSQL on Azure as demand grows
At POSETTE 2025, OpenAI shared how their team scaled PostgreSQL to support ChatGPT and other mission-critical services. Microsoft Azure Database for PostgreSQL team worked closely with OpenAI’s engineers to push the service to new limits. The result was a series of upgrades and best practices that transformed the database layer into a resilient component of OpenAI’s data platform.
Let’s break down the key strategies OpenAI used to scale and sharpen PostgreSQL, as shared in Bohan Zhang’s talk:
1. Offloading and smoothing write workloads
On a single database server, writes are often the hardest to scale. PostgreSQL’s design can introduce bloat and performance issues under heavy write loads. OpenAI encountered exactly this. Their solution was to minimize the burden on the primary by:
- Reducing unnecessary writes at the source
- Introducing controlled timing for certain operations
- Offloading write-heavy loads to other systems when possible
These optimizations paid off by keeping the primary database lean and efficient.
2. Scaling reads with replicas and smart query routing
With write pressure under control, OpenAI focused on optimizing read-heavy workloads, which form the bulk of ChatGPT’s traffic. Key steps included:
- Maximizing read offloading to replicas
- Categorizing requests by priority, then assigning dedicated replica servers for the high-priority traffic
- Optimizing slow queries
- Connection pooling with PgBouncer
After all these efforts, the OpenAI team went from fighting fires to feeling in control.
3. Schema governance and safeguards
Scaling isn’t only about raw performance; it’s also about maintaining stability and uptime. OpenAI implemented processes to ensure that pushing the limits of PostgreSQL wouldn’t compromise reliability:
- Strict schema change rules
- Managing long transactions
- Introducing rate limits at the application, connection, and query levels
- High availability out of the box
All these measures contributed to a robust PostgreSQL setup with cloud-grade reliability.
The result: PostgreSQL at scale
OpenAI’s journey with Azure Database for PostgreSQL has resulted in some meaningful outcomes for their business, illustrating just how far a startup can go with a well-architected relational database in the cloud:
- Peak throughput: PostgreSQL cluster handles millions of queries per second (combined reads and writes) across OpenAI’s services, showing massive throughput is possible on a single coordinated database cluster.
- Global read scale: OpenAI added dozens of read replicas—including cross-region replicas—to serve a worldwide user base with low latency, without overwhelming the primary or increasing lag.
- Reliability: In nine months, only one critical incident (Sev0) was attributed to PostgreSQL after improvements—a significant increase in reliability compared to earlier periods.
- Ten times faster: Database response times improved from approximately 50 milliseconds to under five milliseconds for many queries after introducing connection pooling and optimizations, making interactions feel instantaneous.
OpenAI’s PostgreSQL setup is handling a workload that few companies have ever seen and yet, it’s doing so on a foundation of open-source technology and cloud services that any startup can use. This kind of scale was once thought to require exotic databases or enormous engineering teams, but OpenAI achieved it with a small team focused on systematic, pragmatic optimizations. In Bohan Zhang’s words, “After all the optimization we did, we are super happy with Postgres right now for our read-heavy workloads.”
Why Azure Database for PostgreSQL was key
By using Azure Database for PostgreSQL, OpenAI benefited from a service built for high-scale, mission-critical workloads. Azure Database for PostgreSQL provided several advantages that complemented OpenAI’s engineering work.
Ease of scaling and replication
Azure made it straightforward to add replicas on demand. Learning from OpenAI’s workload evolution, the Azure Database for PostgreSQL team developed the elastic clusters feature, now available in preview, which enabled the OpenAI team to scale horizontally through row-based and schema-based sharding. The Azure team also introduced the cascading read replicas capability, also available in preview, which lets users create additional read replicas from an existing one. This helped them easily scale read workloads more efficiently across regions.
As Bohan Zhang, a member of OpenAI’s infrastructure team, highlighted, “At OpenAI, we utilize an unsharded architecture with one writer and multiple readers, demonstrating that PostgreSQL can scale gracefully under massive read loads.”
Additional Azure advantages included:
- High availability and management
- Co-innovation and support
- Security and compliance
Azure Database for PostgreSQL provided a reliable canvas on which OpenAI executed these optimizations. If you’re a startup, using a managed database means you get enterprise readiness out of the box, so you can devote your energy to product innovation and the specific tuning that your use case needs.
Making Postgres work for you
OpenAI’s success with Azure Database for PostgreSQL is a story of resilience and innovation. It shines a light on what’s possible when a startup pairs a powerful cloud platform with smart engineering. This balance of old and new is often a winning formula—you innovate where it differentiates you, and you rely on well-established solutions for things like databases for their proven reliability. Here are some key takeaways for startup developers and technical decision makers looking to replicate this success:
- Start simple and optimize gradually
- Leverage cloud managed services
- Monitor, measure, and address bottlenecks
- Apply best practices from the Postgres community
If you’re feeling inspired to supercharge your own startup’s data layer, a great way to begin is by learning more about Azure Database for PostgreSQL and how to use it effectively.