Dealing with Timeouts with the ChatGPT API

Rian Schmidt

November 08, 2023

Table of Contents:

The Problem with CirciBot

The Current State of Affairs with ChatGPT's API

Netlify and Timeouts

This Sounds Like a Job for Background Jobs!

Enter Redis

Putting It All Together

This Is Working in an Evolving Field

The Problem with CirciBot

I built my little experimental chatbot as a playground for trying out a couple of cool technologies-- one was just LLMs and vector databases and the other was LangChain-- a framework for building AI applications.

Things were going along swimmingly, until I began hearing from users that it was crashing. (Worse, it was crashing with a blank-looking screen! But that's another story about upgrading Remix and how it affected ErrorBoundary).

There's nothing less fun in the world of debugging stuff than an error that happens intermittently and leaves little trace behind about what caused it, but I quickly put two and two together and determined that my problem was with tImE ItSElf!!!

The Current State of Affairs with ChatGPT's API

One of the big problems with the move to these GenAI platforms is that you cede control of a core component of your application to a particular vendor, who can decide to throttle your usage, raise prices, or shut down their product on a whim.

Now, I'd say "Hah! Open source to the rescue!" except that the LLMs are largely a hardware-constrained operation. That is, unless you've got your own stack of GPUs sitting around to run these models, you're going to need to pay someone to do it for you. Also, the really good models are BIG, which only exacerbates the problem.

So, for now, we're kind of stuck with what's available. ChatGPT is obviously the leader in the field. But their API can be slowwwwwww. GPT-4 can take minutes to respond. That's obviously not great for a chatbot.

Netlify and Timeouts

I host my site on Netlify, which provides me all kinds of benefits in terms of ease-of-use. Easy to set-up CI, so I can just push code to GitHub and have it built and deployed. Good handling of caching on their CDN. Nothing I need to really manage.

But with that comes more loss of control. The functions run under lambdas, and all of that is controlled by Netlify. So, if I submit something to a Remix ActionFunction, and it takes longer than, say, 10 seconds, it'll time out. You can ask for an increase to 26 seconds (for some reason, that's the magic number), and it'll help. But with ChatGPT, passing 26 seconds isn't uncommon.

This Sounds Like a Job for Background Jobs!

Clearly we need to do this talking to ChatGPT stuff asynchronously-- i.e., submit the request and just get the results whenever it's done. Thankfully, Netlify also provides background functions, which can run for 15 minutes. Now, I figure if ChatGPT takes longer than a few seconds, whoever was talking to it has long since left, so 15 minutes ought to cover me.

So, the task now is to take the request, pass it to an action, check if it's cached (more on that in a second), fire off a background job if not, and then poll for some results until my own timeout hits (ten seconds is plenty).

Part of this process is not asking ChatGPT to answer the same questions over and over. If someone asks "What does Circinaut do?", I might as well stash the answer for a while. Also, if I fire off these background jobs, I need somewhere to track them.

Enter Redis

Redis is a fast key-value store. Great for caching stuff. While it can do a lot more (hashes, pub/sub, etc.), at its most basic level, you read and write key-value pairs, and it only takes a few milliseconds.

That means I can take a message, hash it into a key, and store the answer for however long I like. It also means that I can easily use it to track my API requests' progress.

The downside here is that it introduces another piece outside of my Remix app, but given my dependencies on external APIs anyway, this seems like a pretty low-risk addition. There are lots of hosted Redis (or memcache or whatever) caches out there. I happen to run Redis for my own purposes, so I'll just use that server. No additional work or cost.

Putting It All Together

The current solution works like this:

  • User asks a question.
  • The UI submits that to an ActionFunction that hashes it for the key and asks Redis if it has that key.
  • If Redis says 'yes', we return the value. Done!
  • If Redis says 'no', we submit a job to the background function to deal with it.
  • The background function then writes a 'pending' value to Redis on that key and starts its API work, writing the result to Redis when it completes.
  • Meanwhile, the UI starts polling the ActionFunction for the message.
  • If it comes back with the newly written message, it displays it. Done!
  • If it comes back 'pending', it gradually steps off until ten seconds have passed, at which time, it times out more gracefully than the function just disappearing.

So far, so good. I think I'll try pub/sub next to get more immediate results when they come back, but for now, this is working pretty well.

This Is Working in an Evolving Field

One of the main challenges that I've encountered in building GenAI apps over the past year or so, is that the results are so unpredictable-- in format, time, and content. OpenAI's most recent release is addressing some of those issues, but I suspect it's just the nature of the beast.

These things are huge language models; the more we tighten down their operation, the more AI becomes an API. Even so, we'll still need to know how to deal with them taking too long or saying the wrong thing.

I'm hopeful that, over time, running our own models will become more mainstream and affordable so that we're not all at the mercy of the Elon Musks and Sam Altmans (Altmen?) of the world. When that happens, we have a shot at building applications that we can tune for our purposes and rely on for critical functions.