Fastino Secures $7 Million in Funding to Develop GPU-Free, Task-Oriented LLMs

An AI-generated image of a sleak, futuristic server room filled with glowing GPUs and CPUs. Image credit: Adobe Firefly

Fastino, a San Francisco-based startup specializing in foundation AI models, has emerged from stealth. The company wants to provide task-oriented language models that are “more accurate, faster, and safer” than the industry-leading LLMs. To further this goal, it has raised $7 million in pre-seed funding in a round led by Insight Partners and Microsoft’s M12 venture fund.

“[We aim] to bring the world more performant AI with task-specific capabilities,” Fastino Chief Executive and co-founder Ash Lewis states. “Whereas traditional LLMs often require thousands of GPUs, making them costly and resource-intensive, our unique architecture requires only CPUs or NPUs. This approach enhances accuracy and speed while lowering energy consumption compared to other LLMs.”

Other participating investors include NEA, Valor, and GitHub’s Chief Executive Thomas Dohmke, among others.

Founded in 2024 by Lewis and George Hurn-Maloney, Fastino aims to deliver high-performing language models. It wants to help enterprise companies better adopt and deploy generative AI technology that will solve business challenges. However, Fastino believes “conventional LLMs” aren’t fulfilling their full potential. “Traditional LLMs are incredibly difficult to fine-tune and run on-prem,” Lewis tells me. “This prevents enterprises, in many cases, from effectively integrating LLMs into their applications and workflows. Companies spend months on prompt engineering to make LLMs workflow-ready, while task-optimized models are specifically designed to stay within scope.”

The Fastino AI team. Photo credit: Fastino
The Fastino AI team. Photo credit: Fastino

He shares that his company has taken its architecture and training data and made it “highly optimized” for enterprise tasks such as textual data structuring, retrieval-augmented generation (RAG), summarization, and task planning. “We have borrowed from the traditional transformer-based architectural approach while making significant architectural breakthroughs that allow for CPU inference, better performance for enterprise tasks, significantly lower hallucination, better observability and inference speed.”

Although the company has launched, it has revealed little more about what it’s working on. Lewis declined to provide specifics about Fastino’s model family, including its name, variations, and number of parameters. “When we do the official product launch, we will provide more information on the model family,” he remarks. However, Lewis revealed the startup plans to incorporate open source into its product strategy but didn’t offer any additional details.

Fastino boasts its work results in inferencing that’s 1,000 times faster than traditional LLMs. Lewis claims his models are “currently beating [OpenAI’s] GPT-4o on certain tasks, including summarization” but says benchmarks will be released when the company officially launches its products.

Hurn-Maloney explains that Fastino will use its new-found capital to boost its research and engineering team. Because the company doesn’t use high-end GPUs for training and inference, he expects training costs for Fastino’s first model family to be lower than most companies.

“Fastino’s innovative architecture enables high performance while addressing critical challenges like safety, data leakage, accuracy and efficiency,” M12 Managing Partner Michael Stewart is quoted as saying. “Our investment will accelerate Fastino’s development of secure and performant Foundation AI, tunable to address enterprise challenges, from the banking to the consumer electronics sectors.”

Featured Image: An AI-generated image of a sleek, futuristic-looking server room filled with GPUs and CPUs. Created using Adobe Firefly.

Leave a Reply

Discover more from Ken Yeung

Subscribe now to keep reading and get access to the full archive.

Continue reading