Big Four Consulting Firm: AirgapAI and Blockify Case Study
We automated the seemingly impossible.
Iternal Technologies is supercharging a Big Four firm’s sales force. Blockify’s patented data distillation and AirgapAI’s secure, on‑device inference shrink massive decks and white papers by 97%, virtually eliminate hallucinations, and drive a 78× leap in LLM accuracy, all offline. Sellers now get instant, trusted answers that win clients faster.
Supporting A Big Four Professional Services and Advisory Firm’s Sales Teams through Improving LLM Accuracy by 78x with Blockify and AirgapAI
AI is rapidly becoming a strategic asset for global professional services organizations such as the “Big Four” professional services and advisory firms. To stay ahead of the competition and meet client expectations, it is more important than ever that these firm’s sales teams are able to access the right information, at the right time, in a safe, secure, and trusted manner – all while being confident in the veracity of the information being provided.
Together, Intel and Iternal Technologies have partnered to provide a unique solution that supports one of the “Big Four” firms, herein referred to as “The Firm”.
The Firm’s sales enablement efforts by addressing many of the common challenges associated with AI adoption, such as hallucinations, data curation, governance, and access control provisioning of critical data.
Given the requirement to harness massive amounts of data to remain competitive in a rapidly evolving market, AI inference – the ability to generate real-time insights from large datasets – is becoming essential.
Pairing highly performant data center solutions like Gaudi 2 & Blockify® for data preparation and optimization, with edge-based Intel NPU chips & AirgapAI™ software for local inference capabilities and data summarization, an end-to-end solution can deliver substantial competitive advantage.
Benchmark Results:
A corpus of 771 pages of slide decks and white papers (approx. 154,049 words) was processed in about 5 minutes. The ability to process 5 million pages of text per month on a single Gaudi 2 core demonstrates the scalability and efficiency of this approach.
- Total Time: ≈315 seconds
- Total Responses: ≈287
This process wasn’t just about speed; it was about enhancing accuracy while minimizing issues like AI hallucinations. Blockify’s approach increased the precision of vector searches and RAG models, ensuring that the system retrieved the most relevant, contextually accurate information from The Firm’s knowledge base.
Why AI Inference Matters and Why You Should Care
AI inference is more than just running a trained model; it goes beyond simple data queries. It allows you to extract real-time, accurate, and contextually relevant insights from unstructured data at scale.
In the context of The Firm’s busy sales operations, data preparation and optimization is usually a complex and time-consuming process. Without an effective approach to content lifecycle, sales engagements risk relying on an AI that will never be fully accurate. It’s a case of garbage in, garbage out – and if you can’t trust your AI to continually be correct, you can’t trust it at all because the mistakes are unpredictable.
Intel and Iternal used AI to bring structure to unstructured client-facing documents such as sales decks, proposals, FAQs, and knowledge bases through an advanced data ingestion and optimization approach, powered by Blockify.
The result is a highly accurate and optimized dataset that includes greater governance, control, management, and quality of data while simplifying human oversight. These improvements combine to virtually eliminate AI hallucinations commonly associated with RAG and improve LLM accuracy by approximately 78x (7800%) compared to a traditional RAG pipeline.
Blockify’s patented approach creates a single source of truth. Distilling client insights, FAQs, and knowledge down to 2.5% of the original size allows for easy content lifecycle management.
A Summary of the AI Inference Solution
Blockify Data Ingestion with Gaudi
Leveraging Intel Gaudi 2 AI accelerators and Iternal’s patented data ingestion solution, Blockify, teams at The Firm can provide documents and information sources to ingest, then optimize them for an improved large language model inferencing pipeline when paired with retrieval augmented generation (RAG).
- Processing Time: Documents are processed at an average speed of 900 words per second.
- Accuracy: Retrieval-augmented generation (RAG)-based LLM accuracy increased by 40 times, and vector search precision improved by 51%.
- Inference Speed: Achieved 0.68 inferences per second, with a throughput of 5,404 bytes per second.
These results derive from an efficient pipeline that combines Iternal’s Blockify data ingestion technology with the powerful Intel Gaudi 2 AI Accelerators. Iternal’s Blockify deduplication and data cleansing solution is then applied to create an optimized data taxonomy of modular content indexed for dynamic contextual responses.
This three-pronged ingestion, distillation, and taxonomy approach drastically optimizes how a RAG-based system can interact with, search, and utilize data to power large language models.
This use case demonstrates how scalable AI inference hardware paired with highly capable software can solve real-world problems – particularly in fields like professional consulting where speed, accuracy, and scalability are vital for competitive sales enablement.
AirgapAI – Inferencing at the Edge with Intel NPU
Optimizing a dataset only reaches its full potential when applied “in-field.” AirgapAI is a powerful, network-independent AI solution designed to run locally on an AI PC.
In The Firm’s context, AirgapAI can be employed to empower sales teams by running large language models locally on Intel NPU chips. AirgapAI enables critical tasks in environments where fast, secure, and offline access to insights is essential.
- Processing Time: Vector search and inference is approximately 2.2x faster using Intel NPU.
- Accuracy: Retrieval-augmented generation (RAG)-based LLM accuracy increased by 40 times, and vector search precision improved by 51%.
- Retrieval Speed: Achieved vector search of ≈6.6 million records in 1 second.
AirgapAI supports running any open-source large language model, including custom fine-tunes. When paired with Blockify, The Firm’s teams can utilize specialized role-based guardrails and curated datasets specific to their global practice areas, ensuring a safe and highly relevant AI experience for tasks such as responding to client FAQs, generating detailed proposal documents, and creating real-time market analyses.
After outputting the dataset from Blockify, the newly created Blockified dataset can be loaded onto an AI PC and combined with AirgapAI, where onsite inference is conducted. AirgapAI uses Intel chips to scan through up to 10 million records in the vector database in under 1.5 seconds. After identifying the right information for a user query, an LLM inference is initiated to summarize the key information into a concise, easily digestible answer.
By operating completely offline, AirgapAI upholds stringent data protection standards – no external connection is required, minimizing potential vulnerabilities and ensuring sensitive or proprietary client information remains strictly confidential. The Firm sales members can rely on AirgapAI to quickly respond to both client-facing and internal FAQs, create briefing documents, produce policy references, and more.
One of the primary issues with legacy RAG is the semantic differences in how unique user questions are represented in vector space, compounded by a dilution of vector accuracy when extraneous information is incorporated into a chunk of text. Blockify eliminates these issues by delivering a 51% improvement in vector accuracy, along with a dedicated query element designed to help retrieve the right information.
What We Did: Breaking Down the Framework
For AI inference to perform at scale, robust infrastructure is essential. Intel Gaudi 2 was used to accelerate the inference of large language models (LLMs). Gaudi 2 is designed for deep learning workloads, and its efficiency in both training and inference makes it an ideal choice for this type of project. Its architecture allows for high throughput, perfect for tasks that require parallel processing of complex, layered data – like The Firm’s sales documents and knowledge bases.
The Firm’s sales and FAQ materials are known for their dense and varied content. To enable real-time retrieval and answer detailed client queries, the relevant documents were preprocessed and indexed using Iternal’s Blockify technology to modularize content into manageable blocks. These blocks are then deduplicated, distilled, indexed, tagged, and merged into a final output dataset. That dataset can then be packaged into a secure file that can be loaded onto an edge device running AirgapAI.
The LLAMA 3 LLM was prepared and fine-tuned using Low-Rank Adaptation (LoRA) and run on a single Intel Gaudi 2 core. Through extensive optimization testing across documents from multiple consulting practices, Iternal and Intel determined the optimal quality of output and compute performance could be achieved by running 8,000-character segments and generating 1,000 tokens per query output with 100 parallel jobs.
The Blockify workflow steps included:
- Chunking the Text: The source documents were divided into smaller content chunks based on a proprietary algorithm. Those chunks were passed into the specially configured LLM, which output modular blocks of content. These blocks offer a robust taxonomy that can be reused or reassembled based on user needs.
- Embeddings: These content blocks were converted into embeddings (vector representations) to capture unique context and structure, enabling content-aware retrieval within AirgapAI.
- Retrieval and Response Generation: Based on user queries, the system retrieves relevant content from the Context-Aware Retrieval Database for accurate, contextually relevant responses.
This enhanced workflow allows the system to instantly recall specific insights, marketing strategies, service offerings, and proposals, with the ability to dynamically assemble content into diverse structured outputs using large language models.
The result? Real-time expertise, engagement, and personalized content in minutes – something that would otherwise require hours of manual research and compilation.
Business Use Cases: Real-Time AI Inference in Action
The implications for broader expansion of this technology in support of The Firm’s global practice are extensive.
- Industry Intelligence and Whitepapers: By combining Intel Gaudi 2 acceleration with Iternal’s Blockify technology, The Firm’s sales teams can swiftly process vast amounts of client, industry, and organizational data to extract precise insights. AirgapAI ensures these analyses remain both secure and accurate in real-time, even in on-premises or bandwidth-limited environments.
- Sales Enablement Documents: With Blockify’s chunking and deduplication creating a single source of truth, sales briefs and client presentations can be modularized for quick updates and cross-referencing. Intel Gaudi 2 acceleration delivers near-instant recall of critical information, while AirgapAI ensures critical data stays protected in a secure environment.
- Technical and Methodology Guides: By converting lengthy methodology papers into concise, context-rich segments, The Firm’s consultants and sales teams can instantly access the most relevant guidance. Gaudi 2’s processing power and AirgapAI’s local inference enable secure, offline retrieval that supports rapid responses to client inquiries.
- Policy and Knowledge Materials: Blockify ingests and distills detailed policies, FAQs, and knowledge content into easily searchable content blocks, significantly reducing the risk of error or misinterpretation. Paired with Gaudi 2 and AirgapAI, organizations achieve high-speed, secure retrieval of vital documents for immediate and compliant decision-making.
For The Firm, the ability to retrieve accurate, contextually rich data in real time is more than an operational advantage – it’s a competitive necessity. By eliminating the burden of constantly managing AI systems, teams can focus on delivering greater value and insights to clients.