Overview
Serverless computing, and more specifically serverless functions, offers several undeniable benefits. Among others, it gives developers a new way of building and deploying applications using cloud providers without worrying about the underlying server infrastructure.
However, since serverless functions are a relatively recent technology paradigm, it comes with its set of challenges when coupled with traditional relational databases, such as PostgreSQL or MySQL.
In this article, we'll look at how relational databases handle connections to execute queries and review the potential issues that occur when using them in a serverless environment. Then, we will cover some of the solutions and tools that can help you overcome these problems. Finally, through a practical example, we'll see how to get started quickly with querying a relational database from serverless functions the right way.
Relational databases in a serverless environment
Querying a traditional relational database such as PostgreSQL (or MySQL) requires establishing a TCP connection beforehand between the client and the database server. Once a connection is established, the client can send queries to the database server it's connected to. Internally, PostgreSQL spawns a new server process for each connection to handle the requests of the corresponding connected client.
Hence, due to this connection requirement, relational databases tend to work well with clients who set up long-lasting connections to the database upfront and re-use them across multiple and subsequent requests.
However, in the case of serverless functions, connections to the database server are created on-demand each time the client application triggers a function.
Moreover, each serverless function establishes a new connection to the database server and closes it once the function has been executed, as opposed to a traditional server architecture that keeps the connections alive during the entire server lifetime.
In other words, a typical serverless function workflow that requires database access may look like this:
- First, the function establishes a connection to the database (after starting up)
- Once the connection is settled, the function submits the query to the database and waits for the response
- Then, it returns the data to the client application that invoked the function
- Finally, it closes the database connection, and the server stops the function instance
Hence because of this connection mechanism and despite all their benefits, serverless functions come with their own challenges when coupled with traditional relational databases.
Let’s quickly review the potential issues that arise when accessing a relational database from a serverless function.
Serverless functions quickly exhaust the database connections limit
Even if a serverless function can stay active for some time and run a few additional requests, your FaaS (Function-as-a-Service) provider will inevitably terminates the function instance at some point. In other words, serverless functions are ephemeral and short-lived, and using them with relational databases leads to numerous and brief database connections. That’s not suitable for traditional relational databases, which have been designed for handling persistent connections.
On top of that, serverless functions can automatically scale up very quickly and almost indefinitely in response to high demand. Indeed, a serverless function can easily scale to thousands or even tens of thousands of concurrent requests, which is one of many benefits of using them in the first place.
But that auto-scaling benefit also causes database connections to scale. Indeed, one function instance can handle only one request at a given time and will use one database connection to process the queries of that request.
As a result, the number of functions, and database connections, scales together and linearly with the number of concurrent requests being processed. In other words, if you have 1,000 client requests at any given time, that’s 1,000 connections to your database! Furthermore, any functions that are paused (zombies) keep their connections open by default and block them from being used by another function.
Traditional relational databases are not designed to accept so many connections. They have a fixed number of connections constraint based on the database server capacity, and once that limit is hit, additional requests will fail.
As an example, here's the maximum number of connections allowed by Aurora PostgreSQL DB instances:
Instance class | Largest possible value for max_connections |
---|---|
db.t3.medium | 420 |
db.t3.large | 844 |
db.t4g.medium | 405 |
db.t4g.large | 844 |
db.r4.large | 1600 |
db.r4.xlarge | 3200 |
db.r4.2xlarge | 5000 |
… | … |
Too many connections impact the database performance
One could think that an easy solution to the previous issue is to increase the number of maximum connections to the database.
However, even if it seems a good idea at first, that can significantly impact the database server's performance in handling queries as each active connection consumes memory and CPU resources.
Indeed, your database server has a limited amount of resources. If all of those resources are in use, your server will be overloaded, responding to queries will take more time, and your throughput will drop due to the overhead of handling so many connections. In the worst-case scenario, it can result in the database being shut down due to an out-of-memory error.
In other words, having too many open connections can hurt your database performance. You can have only as many connections as your database server is capable of handling based on its underlying resources.
Opening and closing connections take time
Finally, besides the issues related to the database connection limit, the performance of your application can be impacted by the fact that database connections do not get reused.
Opening up a new connection to the database and closing it for every client’s request is quite expensive in terms of latency. It is due to the TCP connection establishment (and termination) and the corresponding resource allocation for that connection.
Those operations take time and add considerable latency to the initial delay of starting up a new serverless function instance (cold start) for each request.
That’s only when the serverless function is ready, and the database connection is established that the queries needed to process a request are executed.
Potential solutions
Connection pooling
Fortunately, we can solve the problems described above by using connection pooling.
Indeed, with a connection pool, connections are opened beforehand and already available for use when our application makes requests. In other words, the pool keeps several persistent database connections open and uses these connections to handle database requests.
It helps reduce the number of processes a database has to handle at any given time, as there is no need to spawn a new process each time a request comes in anymore.
It also mitigates the latency problem with establishing new TCP connections because we can open them in advance and reuse them across the serverless functions.
So now, with connection pooling, when a serverless function needs to make a database request, it uses a connection from the pool instead of establishing a direct connection to the database. Then, when the transaction or session is completed, the connection is returned to the pool. That connection can then be used by another function again.
What's more, when all connections in the pool are in use, the server doesn't reject any incoming requests. Instead, it queues them and waits until a connection is returned to the pool to process those requests.
But how do we implement connection pooling?
There are several ways we could do it, but in the context of serverless computing, the only way is to introduce a middleware server that will manage the pool of database connections. Indeed, as serverless functions are stateless by nature, they cannot keep any connections open and reuse them across requests. To manage the connections, we must use an intermediary between our serverless functions and our relational database.
pgBouncer for PostgreSQL
We can use existing tools like pgBouncer, an open-source connection pooler for PostgreSQL. We can install it on the database server itself or a stand-alone server. But in both cases, the serverless functions will connect to pgBouncer as if it were the PostgreSQL database.
PgBouncer sits between the serverless functions and the PostgreSQL server and acts as the intermediary between them. Hence, the serverless functions interact with pgBouncer instead of the PostgreSQL database. And pgBouncer handles the connection pooling necessary for scaling many simultaneous connections created by concurrent serverless functions.
Adding pgBouncer allows our serverless functions to reuse existing connections rather than creating new connections for every request. Hence, we lower the performance impact of opening new connections to the database, prevent incoming requests from failing and improve the query performance.
Of course, you can install and configure pgBouncer yourself. Still, again as with database servers and serverless functions, you can rely on your cloud provider to manage the connection pool for you. Companies like Supabase, DigitalOcean, or Heroku to name just a few, includes a built-in solution for managing connection pooling with pgBouncer.
Data proxy and API
Besides pgBouncer, you can use another external connection pooling solution such as AWS RDS proxy if you use Amazon RDS or Aurora for your databases. Or you can use the Prisma Data proxy if you are already using the Prisma ORM to interact with your data.
Finally, as a third option, you can use a Data API on top of your relational database to execute SQL queries without managing the connections. Indeed, all you have to do inside your serverless functions is make API calls, and the Data API will manage connections to the database for you. Again, you can implement it yourself using open-source libraries like PostgREST to build a REST API for PostgreSQL. Or, you can leverage built-in solutions from third-party providers such as Supabase or Hasura which provides a REST or GraphQL API on top of your data out of the box.
At the end of the day, if we think about it, all those solutions are just a way to "reproduce" what we had before serverless functions, which were establishing persistent connections through a long-running and always-on middleware server.
Practical examples
Enough talking. Let's put this into practice and demonstrate the benefit of using connection pooling to solve the problems mentioned earlier.
For that, we'll start with a Next.js application and leverage the Next.js built-in API to query data using the Prisma ORM from a PostgreSQL database. Each Next.js API route is deployed as a separate serverless function, and it will make a direct connection to PostgreSQL to execute the SQL queries.
Then, we'll use and set up two different connection pooling solutions, pgBouncer and the Prisma Data Proxy, and briefly compare the performance of our application between those two solutions.
1. Direct connection: Next.js API + PostgreSQL
Let's start by using direct connections between our serverless functions and our PostgreSQL databases.
The database is hosted on DigitalOcean as our cloud provider and uses PostgreSQL version 14 with a 22 connections limit for running queries.
Regarding the application, it's a simple blog app with a few Next.js API routes to create new posts, like posts, and register visitor views. You can check out the source code on Github.
To demonstrate the database connection issues we've described earlier, we'll test our Next.js API by putting a load on it and see how it performs. For that, we'll use k6, an open-source loading testing tool.
We'll execute a scenario-based load test script that runs four consecutive HTTP requests to our API to fetch the latest blog posts, create a new post, view it, and finally like it. This script would create 30 concurrent sessions (virtual users), which is greater than our database connection limit (22).
Here's the result of that test:
As we can clearly see, not all our requests are successful. Lot of them (~20-45%) have failed due to the exhaustion of our database connections limit, as we expected.
2. Connection pooling: Next.js API + PostgreSQL + pgBouncer
Alright! Now, let’s use connection pooling by setting up pgBouncer in front of our PostgreSQL database. To do so, we’ll use the DigitalOcean built-in connection pooling solution as it uses pgBouncer and manages everything for us.
We have set up the pool mode to “Transaction” so that a database connection is assigned to a serverless function for the duration of a single transaction. After that, the connection is returned to the pool.
So, with that pool mode, two consecutive transactions from the same serverless function could be done over two different connections. And, if there are not enough available connections in the pool to handle the transactions received, the additional transactions are queued to run once a connection is available.
Once our connection pool is created and up and running, we can retrieve its connection string and use it within our application in place of the direct database connection URL. That way, all incoming requests from our serverless function will go through pgBouncer first before reaching the database.
Also, because we are using Prisma as our ORM, we need to add the
?pgBouncer=true
flag to the connection URL:
DATABASE_URL=postgresql://USER:PASSWORD@HOST:PORT/Pooler?pgbouncer=true
Finally, since the Prisma Migration Engine is designed to use a single
connection to the database and does not support connection pooling with
pgBouncer, we need to override the DATABASE_URL
environment variable
before running prisma migrate
:
{ ..., "scripts": { "vercel-build": "prisma generate && npm run migrate:deploy && next build", "migrate:deploy": "DATABASE_URL=\"$MIGRATE_DATABASE_URL\" prisma migrate deploy" ... } }
And set the MIGRATE_DATABASE_URL
variable with our direct database connection
string:
MIGRATE_DATABASE_URL=postgresql://USER:PASSWORD@HOST:PORT/mydb
We can now run our load tests with 30 concurrent sessions again and see the result:
All our requests are successful this time, which means that we no longer have the database connections limit exhaustion problem.
We can even run the same load tests with more concurrent sessions with no problem, like in the following example that uses 100 VUs:
3. Connection pooling: Next.js API + PostgreSQL + Prisma Data proxy
For our last test, as we are already using Prisma as our ORM, let’s also use the Prisma connection pooling solution instead of pgBouncer from DigitalOcean and see if we can get the same or better results.
So, navigate to the Prisma Data Platform and create a new project. If you don’t have an account yet, sign up with your Github account first.
Configure your project by giving it a name and selecting the corresponding Github repository and branch for your project.
Next, paste your direct database connection string (and not the pgBouncer connection), and select the region for the Prisma data proxy closer to your database.
Once you are ready, click “Create project”.
Finally, copy and paste your Prisma Data Proxy connection string into your project. We’ll now use this connection to query our database. Note that you should keep the direct database connection string for the migration command.
We are now ready to run our load tests with 30 concurrent sessions:
And now, with 100 concurrent sessions:
One important thing to note here is that first, we don’t have any database connection limit problem, but we also have better performance overall. Indeed, all the HTTP requests have a significantly better response time than when we were using pgBouncer.
This deviation can be explained by the fact that the Prisma engine that manages the connections is now part of the Prisma Data Proxy, which reduces the bundle size of our application and leads to faster serverless function loading times, reducing the impact of the cold start problem.
In conclusion, pgBouncer is a great connection pooler for PostgreSQL. Still, using the Prisma Data Proxy in our example not only solves the connection issues but also gives us better performance than just using pgBouncer.
In addition, one might prefer using the Prisma Data Proxy over pgBouncer because not all cloud providers offer pgBouncer, so you will have to configure and operate the corresponding infrastructure yourself. Another reason is that pgBouncer only works with PostgreSQL, while the Prisma Data Proxy works will all databases that Prisma supports (PostgreSQL, MySQL, MongoDB, and more).
Wrapping up
Serverless functions offer many benefits, but it's important to remain aware of the particular challenges that you might have to address when using them with traditional relational databases.
Indeed, as we've seen, due to the temporary nature of serverless functions and the way relational databases handle connections, it is easy to exhaust your databases connection limit and impact your application performance.
Hence, it's crucial to include an intermediary infrastructure component to manage a pool of persistent database connections to overcome those issues. Many solutions already exist, so make sure to leverage them so you can focus on your core product and not manage servers.
Finally, there is something else we haven't discussed in this article, but that is worth mentioning when working with databases: serverless databases.
Indeed, they allow developers to interact with their data simply by making HTTP requests through a built-in API. And again, the cloud provider manages everything, so you don't have to worry about operating and scaling any infrastructure and thus the challenges that come with it.
So, if you want to go further into serverless computing, that's another solution you should also consider checking out. There are already a few companies offering this new kind of database, such as Fauna, PlanetScale, and Cockroach.
I hope you enjoyed reading the article as much as I enjoyed writing it.
Guest Authors
Write for the AlterClass blog to empower the developer community and grow your brand.
Join usStay up to date
Subscribe to the newsletter to stay up to date with tutorials, courses and much more!