An open source toolkit to support the development of AI models from Google

One of Google’s two main yearly developer conferences, Cloud Next (together with I/O) often offers only managed and other closed source, gated behind locked-down APIs products and services. But this year, Google unveiled a series of open source tools mostly intended to support generative AI projects and infrastructure, maybe in an effort to promote developer goodwill or further its ecosystem objectives (or both).

The first, called MaxDiffusion, was actually secretly released by Google in February. It is a set of reference implementations of different diffusion models that run on XLA devices, such as the picture generator Stable Diffusion. Accelerated Linear Algebra, or “XLA” for short, is the acronym for a method that optimizes and accelerates some kinds of AI workloads, such as serving and fine-tuning.

Both the latest Nvidia GPUs and Google’s own tensor processing units (TPUs) are XLA hardware.

Google is introducing JetStream, a new engine designed to run generative AI models, notably text-generating models, which differ from Stable Diffusion. For models like Google’s Gemma 7B and Meta’s Llama 2, JetStream gives up to three times more “performance per dollar,” according to Google. However, it is currently limited to supporting TPUs, with GPU compatibility reportedly coming in the future.

Mark Lohmeyer, general manager of computation and machine learning infrastructure at Google Cloud, stated in a blog post shared with TechCrunch that “as customers bring their AI workloads to production, there’s an increasing demand for a cost-efficient inference stack that delivers high performance.” “JetStream meets this need… and has optimizations for widely used open models like Gemma and Llama 2.”

Now, calling for a “3x” improvement is a bold statement, and it’s unclear how Google arrived at that estimate. Which TPU generation are you using? In relation to what baseline engine? Furthermore, what does “performance” mean in this setting?

I’ve sent all of these queries to Google, and I’ll update this site with any responses I receive.

The newest additions to MaxText, Google’s collection of text-generating AI models aimed at TPUs and Nvidia GPUs in the cloud, come in second-to-last place on the list of his open source contributions. Now available in MaxText are models from AI company Mistral, Llama 2, OpenAI’s GPT-3 (predecessor to GPT-4), and Gemma 7B. According to Google, all of these models may be adjusted and modified to meet the demands of developers.

“We’ve worked closely with Nvidia to optimize performance on large GPU clusters, and we’ve heavily optimized [the models’] performance on TPUs,” Lohmeyer said. “By optimizing GPU and TPU utilization, these enhancements result in increased energy efficiency and cost optimization.”

Lastly, Google and the AI startup Hugging Face worked together to develop Optimum TPU, which offers tools for bringing specific AI workloads to TPUs. Google claims that the objective is to lower the entrance barrier for generative AI models, particularly text-generating algorithms, onto TPU technology.

However, Optimum TPU is currently somewhat basic. It is compatible exclusively with the Gemma 7B model. Furthermore, generative model training on TPUs is currently only supported by running them in Optimal TPU.

Google promises additional improvements in the future.