How I Leveraged Open Supply LLMs to Obtain Large Financial savings on a Giant Compute Undertaking | by Ryan Shrott | Aug, 2023

Unlocking Value-Effectivity in Giant Compute Initiatives with Open Supply LLMs and GPU Leases.

Photograph by Alexander Grey on Unsplash


On this planet of huge language fashions (LLMs), the price of computation could be a important barrier, particularly for intensive tasks. I lately launched into a undertaking that required working 4,000,000 prompts with a median enter size of 1000 tokens and a median output size of 200 tokens. That’s almost 5 billion tokens! The standard strategy of paying per token, as is widespread with fashions like GPT-3.5 and GPT-4, would have resulted in a hefty invoice. Nevertheless, I found that by leveraging open supply LLMs, I may shift the pricing mannequin to pay per hour of compute time, resulting in substantial financial savings. This text will element the approaches I took and evaluate and distinction every of them. Please be aware that whereas I share my expertise with pricing, these are topic to alter and should fluctuate relying in your area and particular circumstances. The important thing takeaway right here is the potential value financial savings when leveraging open supply LLMs and renting a GPU per hour, reasonably than the precise costs quoted. In case you plan on using my advisable options in your undertaking, I’ve left a few affiliate hyperlinks on the finish of this text.


I performed an preliminary take a look at utilizing GPT-3.5 and GPT-4 on a small subset of my immediate enter information. Each fashions demonstrated commendable efficiency, however GPT-4 persistently outperformed GPT-3.5 in a majority of the circumstances. To offer you a way of the associated fee, working all 4 million prompts utilizing the Open AI API would look one thing like this:

Complete value of working 4mm prompts with enter size of 1000 tokens and 200 token output size

Whereas GPT-4 did provide some efficiency advantages, the associated fee was disproportionately excessive in comparison with the incremental efficiency it added to my outputs. Conversely, GPT-3.5 Turbo, though extra reasonably priced, fell quick when it comes to efficiency, making noticeable errors on 2–3% of my immediate inputs. Given these elements, I wasn’t ready to take a position $7,600 on a undertaking that was…

Leave a Reply

Your email address will not be published. Required fields are marked *