Blockify Data Ingestion and Optimization On-Prem Technical Documentation
1 | 2025-7-8 | First Full Release for On-Prem | All |
Rev# | Date | Purpose | Pages Revised |
---|---|---|---|
Document Revision History |
Overview
Blockify® is a patented data ingestion, distillation, and governance pipeline designed to optimize unstructured enterprise content for use with Retrieval-Augmented Generation (RAG) and other AI/LLM applications.
Blockify transforms text documents into small, semantically complete “IdeaBlocks,” dramatically improving both the quality and efficiency of downstream AI systems.
This document provides technical guidance for deploying and utilizing Blockify within modern enterprise architectures. To learn more about the business value, visit: https://iternal.ai/blockify-results
Blockify Licensing and Use
As a developer there are a few simple rules for how you/your organization can use Blockify based on the licensing you have acquired.
- You can install and use Blockify (object code only) on as many Devices (Servers, Workstations, etc) as you need, on your own infrastructure or with a third-party host, as long as you’ve paid for the required licenses for the associated number of users.
- User Licensing: Every person or AI Agent in your organization who accesses or uses data that is generated via Blockify—either directly (RAG Chatbot), or indirectly (Agentic AI and through another app or system that processes data from Blockify)—requires a valid, fully-paid Blockify license.
- Internal Use Only: All data processed through Blockify must be for your company’s internal use only. You can’t share, resell, sublicense, or otherwise provide Blockify or its outputs to outside parties unless you have explicit written permission from Iternal or it’s otherwise stated in your license agreement.
- External Use Licenses: If you are using data that is generated via Blockify for external consumption (website visitors accessing a public chatbot or a 3rd party AI Agent) a “Blockify External User License – Human” or a “Blockify External User License – AI Agent” license will be required.
- For all the details, refer to your full legal license agreement.
Table of Contents
- Overview
- Blockify Licensing and Use
- Table of Contents
- Architecture Overview
- Model Details
- Prerequisites
- Deployment Steps
- Chunking Guidelines
- Sample API (Open API Standard)
- Troubleshooting
- Support & Maintenance
- Testing / Validation
- Getting Help
Architecture Overview
Blockify for on-prem deployments is made available as a LLAMA fine-tuned large language model. There are two primary LLM components to Blockify:
- Blockify Ingest
- Blockify Distill
There are 4 versions of the Blockify LLM available for deployment on a datacenter server for each of the two primary LLM components. The versions are:
- LLAMA 3.2 1B
- LLAMA 3.2 3B
- LLAMA 3.1 8B
- LLAMA 3.1 70B
The Blockify LLM is intended to fit seamlessly into any AI data pipeline and be infrastructure agnostic. Blockify typically sits between the document parsing stage and vector storage/LLM retrieval layer.
Custom Versions of Blockify
If your organization requires deployment of Blockify using a different LLM or a custom fine-tuned version of Blockify designed to accommodate specific data tagging or parsing requirements, these services are available. Contact your Iternal Technologies representative for details and pricing.
Model Details
Blockify Ingest Model
Blockify Ingest is designed to receive input raw source (parsed / chunked) text via LLM API request and output structured optimized XML “IdeaBlocks,” repackaging the source data into a cleaned format.
The process is ≈99% lossless for numerical data, facts, and key information. We always encourage a human to be in the loop to review the outputs of IdeaBlock content.
For optimal results we recommend sending between 1,000 and 4,000 Character chunks per Blockify Ingest API request.
Blockify Distill Model
Blockify Distill is designed to receive a code-curated semantically similar collection of XML based IdeaBlocks via LLM API request and output an optimized and condensed set of XML based IdeaBlocks, removing redundant and duplicative information while preserving the unique facts.
Blockify Distill is also trained for when content should be separated instead of merged. This means if input text combines multiple ideas that should be separate; i.e. a single IdeaBlock input that contains both information about your Company Mission and your Product Features, the Model will intelligently separate these concepts into two unique IdeaBlocks.
The process is ≈99% lossless for numerical data, facts, and key information. We always encourage a human to be in the loop to review the outputs of IdeaBlock content.
For optimal results we recommend sending between 2 and 15 IdeaBlocks of input per Blockify Distill API request.
Recommended Architecture Components
- Document Ingestor: To accepts various formats and convert into plain text for LLM Input (PDF, DOCX, PPTX, HTML, PNG/JPG, Markdown, etc.).
- Semantic Chunker: Splits text along natural semantic boundaries.
- Blockify Ingest (LLM-Based): Converts chunks to IdeaBlocks with metadata.
- Blockify Distill (LLM-Based): Merges near-duplicate, semantically similar IdeaBlocks.
- User-added Supplemental Information: Injects user added information into the relevant XML structures post processing (i.e. User Defined Tags, Keyworks, or Entities).
- Integration APIs: Pushes output to vector DBs (Pinecone, Azure AI Search, Milvus, etc.)
Prerequisites
- System Requirements
- Compute
- CPU LLM Inferencing
- Xeon Series 4, 5, or 6
- GPU LLM Inferencing
- Intel Gaudi 2 / Gaudi 3
- NVIDIA GPUs
- AMD GPUs
- CPU LLM Inferencing
- Compute
- Software Dependencies
- MLops or LLM Runtime Environment supporting LLAMA LLMs
- Embeddings Model
- Blockify supports any Embeddings Model
- OpenAI Embeddings
- Mistral Embeddings
- Jina Embeddings
- AWS Bedrock Embeddings
- others
- If using AirgapAI then you must use the Jina-V2 Embeddings Model https://jina.ai/embeddings/
- Blockify supports any Embeddings Model
- Vector DB Integration:
- Any vector database – at the user’s discretion
- Milvus
- Zilliz
- Pinecone
- AWS Vector Database
- Azure Vector Database
- others
- Any vector database – at the user’s discretion
- Parsing / Chunking Integration:
- Any Parsing / Chunking system – at the user’s discretion
- Unstructured IO
- Pinecone
- LangChain
- others
- Any Parsing / Chunking system – at the user’s discretion
- Licensing Options
- Blockify Internal Use – 1 User (Human)
- Blockify Internal Use – 1 User (AI Agent)
- Blockify External Use – 1 User (Human)
- Blockify External Use – 1 User (AI Agent)
Deployment Steps
- Download and Unzip the Blockify LLM
- Upload / Convert the Blockify Safetensors LLM package to required format and deploy on MLops Platform for Inference
- Deploy API for Inference
- Test Inference
Chunking Guidelines
- Default Chunk Size: For Blockify, aim for 1,000 – 4,000 characters per chunk, with 2,000 Characters being the default recommended.
- Recommend 4,000 Characters for highly technical documentation
- Recommended 4,000 Characters for customer meeting transcripts and recordings
- Chunk Splitting Locations: Chunk at logical points, such as paragraphs, sentences, or sections. Avoid splitting data mid-sentence or mid-paragraph as this can confuse the model.
- Consistent Structure: If possible, keep each chunk similar in size.
- Chunk Overlap (Recommended): For continuity between chunks include an overlap of 10% on the boundaries between chunks (front and back).
Sample API (Open API Standard)
If you are deploying your LLM using the Open API standard, below is a sample API payload that you can use as a reference. Depending on how you deployed your model, the payload may change. The input source text should always be sent in a single user payload. Do not send a multi-chain chat.
Configuration Values
Recommended settings are ~8,000+ Output Tokens, Temperature of 0.5
Sample API Payload for Open API:
curl https://XXXXXXXXXX/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $API_KEY" \
-d '{
"model": "blockify-ingest-XXXXXXXX",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "This report presents a comprehensive analysis of the Blockify® data ingestion, distillation and optimization capabilities to support Big Four Consulting Firm, compared to traditional chunking methods. Using Blockify\"s distillation approach, the projected aggregate Enterprise Performance improvement for Big Four Consulting Firm is 68.44X. This performance includes the improvements made by Blockify® in enterprise distillation of knowledge, vector accuracy, and data volume reductions for enterprise content lifecycle management.\nAccording to IDC\"s \"Accelerating Efficiency and Driving Down IT Costs Using Data Duplication\" study, the average enterprise has between an 8:1 and 22:1 duplication frequency (\"Enterprise Performance\"). Factoring in an average Enterprise Data Duplication Factor of 15:1 (which accounts for typical data redundancy across multiple documents and systems in an enterprise setting), the aggregate performance improvement of 4.56X based on vector accuracy and data volume reductions for enterprise content lifecycle management is further increased ≈15X to a projected aggregate Enterprise Performance of 68.44X. This highlights the compounded benefits of Blockify® in a larger-scale enterprise environment."
}
]
}
],
"response_format": {
"type": "text"
},
"tools": [],
"temperature": 0.5,
"max_completion_tokens": 8000,
"top_p": 1,
"frequency_penalty": 0,
"presence_penalty": 0
}'
Troubleshooting
Issue | Probable Cause |
---|---|
IdeaBlocks repeat or are nonsensical | Temperature may be misconfigured |
IdeaBlock output is truncated | Output tokens may be too low. Each IdeaBlock outputs approx. 1300 tokens – therefore if you cannot increase output tokens, reduce chunk input size |
IdeaBlocks are not outputting much data | Your input text may not contain data that has facts, figures, or numbers (it may be “marketing fluff only”) |
Support & Maintenance
Patching & Upgrades: Download the latest Blockify LLM and run.
Testing / Validation
Blockify Ingest Input #1:
This report presents a comprehensive analysis of the Blockify® data ingestion, distillation and optimization capabilities to support Big Four Consulting Firm, compared to traditional chunking methods. Using Blockify's distillation approach, the projected aggregate Enterprise Performance improvement for Big Four Consulting Firm is 68.44X. This performance includes the improvements made by Blockify® in enterprise distillation of knowledge, vector accuracy, and data volume reductions for enterprise content lifecycle management. According to IDC's "Accelerating Efficiency and Driving Down IT Costs Using Data Duplication" study, the average enterprise has between an 8:1 and 22:1 duplication frequency ("Enterprise Performance"). Factoring in an average Enterprise Data Duplication Factor of 15:1 (which accounts for typical data redundancy across multiple documents and systems in an enterprise setting), the aggregate performance improvement of 4.56X based on vector accuracy and data volume reductions for enterprise content lifecycle management is further increased ≈15X to a projected aggregate Enterprise Performance of 68.44X. This highlights the compounded benefits of Blockify® in a larger-scale enterprise environment.
Blockify Ingest Output #1:
Blockify Ingest Input #2:
Blockify is a data optimization tool that takes messy, unstructured text, like hundreds of sales‑meeting transcripts or long proposals, and intelligently optimizes the data into small, easy‑to‑understand "IdeaBlocks." Each IdeaBlock is just a couple of sentences in length that capture one clear idea, plus a built‑in contextualized question and answer. With this approach, Blockify improves accuracy of LLMs (Large Language Models) by an average aggregate 78X, while shrinking the original mountain of text to about 2.5% of its size while keeping (and even improving) the important information. When Blockify's IdeaBlocks are compared with the usual method of breaking text into equal‑sized chunks, the results are dramatic. Answers pulled from the distilled IdeaBlocks are roughly 40X more accurate, and user searches return the right information about 52% more accurate. In short, Blockify lets you store less data, spend less on computing, and still get better answers- turning huge documents into a concise, high‑quality knowledge base that anyone can search quickly.
Blockify Ingest Output #2:
Getting Help
- For commercial licenses, deployment references, product keys, or technical support, contact: [email protected]
© Blockify® / Iternal Technologies. All rights reserved.
(Do not remove this notice.)