Production-ready / On-premise

GSAI · 2026 · 06 · 0024

AI Solution Brief

Turn scattered enterprise knowledge into a queryable, trusted, citable private AI expert

A RAG-based platform that unifies Confluence, SharePoint, Notion, shared drives, and business databases — on-prem deployment, full citation chains, and inherited ACLs. Let your team query the company the way they query a senior expert.

Book a call See the architecture

10 min read · v1.0 · updated Jun 2026

92.4%

Retrieval accuracy

Top-3 hit · hybrid search

< 2.5s

End-to-end latency

P95 ≤ 4.0s

-68%

Expert queries

90 days post-launch

100%

Answer citation

Every answer is sourced

Knowledge RAG · Live

Live indexing

Sources

Confluence

Wiki · 8.2K

SharePoint

Docs · 21.0K

Notion

Teams · 4.6K

Shared drive

PDF/Word · 56.0K

Business DB

MySQL · 32.4K

Ticketing

Tickets · 2.2K

6 sources · 124K docs

RAG Hub

Pipeline

01 Parse02 Chunk03 Embed04 Retrieve05 Generate

Answers · with citations

Q: How do new hires request VPN?

A: IT portal → submit access form → manager approves → auto-provisioned

[1][2]

Q: Q3 sales rebate policy?

A: 3–5% standard · up to 8% strategic. See rebate matrix.

[1][2][3]

Q: Can we add this clause to the contract?

A: Yes. See template clause 14.2, cleared by legal in 2024.

[1][2]

100% sourced

Top-3 hit

92%

P50 latency

Citations / ans

00/BACKGROUND

Enterprise knowledge management is moving from document piles to expert systems

For two decades enterprises have piled knowledge into Wikis, SharePoint, and shared drives. Searchable ≠ usable. People still ping senior coworkers, ask in group chats — 90% of internal knowledge sits idle. RAG combines retrieval with LLM reasoning to give enterprise knowledge its first real Q&A interface.

CarriersPDF / Word / PPT / Excel / Confluence / email / chat

TodayFull-text search + manual filtering + senior fallback

FailuresNot found / wrong answer / hard to read / no one explains

GoalQueryable + citable + governable

01/PROBLEM

Four old problems generic LLMs can't solve

Generic LLMs (ChatGPT / Qwen / ERNIE) have zero knowledge of your private data — and they don't admit it. In legal, compliance, or technical decisions, this 'confident hallucination' is fatal.

P-01 · Pain

Knowledge scattered across a dozen systems

A single workflow's docs may span Confluence (specs) + SharePoint (templates) + group files (changes) + senior memory (tacit). Point search misses; cross-system search fails. New hires burn three months stepping on the same rakes.

P-02 · Pain

Senior staff are the search engine; attrition = knowledge loss

How the workflow really runs, why this client is special, how that bug got fixed — it's all in 1-2 people's heads. When they leave, they take not just experience but the company's Q&A capability.

P-03 · Pain

Generic LLMs hallucinate with confidence

They know nothing about your private knowledge but answer as if they do. In legal, compliance, or decision contexts, one wrong answer that sounds right is enough to cause real damage.

P-04 · Pain

On-premise vs. AI is a compliance dead zone

Customer data, contracts, IP, HR records can't leave the company. Yet teams genuinely need AI. The two demands looked irreconcilable. RAG is the third path.

Industry

Universal · sector-agnostic

Volume

100K – millions of docs

Today

Full-text search + asking around

Goal

On-prem RAG + citable

02/SCENARIOS

One stack, five immediately landable scenarios

RAG isn't a chatbot. It's the knowledge layer of the enterprise. Below are five landing points the same stack covers.

S-01

Onboarding assistant

HR · new hire

Onboarding docs + policies + workflow charts feed RAG. New hires 'ask the company' instead of pinging seniors.

Metric

Time-to-ramp -60%

S-02

IT / process self-service

Everyone

Reimbursement, VPN, hardware requests — daily process questions stop chewing IT time.

Metric

IT tickets -55%

S-03

Sales knowledge copilot

Pre-sales · BD

Product handbook + standard responses + competitor compare + case library on tap during live calls.

Metric

First-response time -70%

S-04

Tier-1 customer support

Support

RAG drafts replies from the ticket history; agents either approve or send directly.

Metric

Resolution rate +40%

S-05

Legal / compliance lookup

Legal · compliance

Templates + precedents + regulations. Ask 'is this clause OK' and get a cited answer.

Metric

Review time -50%

03/CAPABILITIES

Three core capabilities, broken down with real UI

From employee 'asks' to system 'answers', from answer to source, from permission to audit — we break down RAG's three highest-value paths so every step is visible, verifiable, and auditable.

Ask: how is Q3 sales rebate calculated? RAG

AI answerConfidence 87%

Standard accounts get 3–5% ^[1]; strategic accounts up to 8% ^[2]. See rebate matrix v2026.Q3 ^[3].

Sources (3)

[1]Confluence · Rebate policy2026.06

[2]SharePoint · Strategic accounts2026.05

[3]Notion · Rebate matrix Q32026.07

12Next: draft customer email

Top-3 hit

92.4%

Avg latency

2.1s

C-0103-A · Q&A + Citation Chain

Cross-source Q&A with 100% sourced answers

Every fact opens to the original passage

Users ask in natural language; the system runs hybrid search over Confluence / SharePoint / Notion / drives. The LLM is forced to cite every fact with [n] markers, hoverable to the original passage. Low-confidence queries return 'not found' instead of fabricating.

Hybrid search across Confluence + SharePoint + Notion + drives + DBs
100% enforced citation — every fact carries an [n] chip
Hover the citation to reveal the source passage + path + last modified
Answer confidence score (retrieval quality + LLM self-rating)
Feedback loop (👍 👎) feeds continual weekly iteration

Query: EVA valuation method explained filters: dept=fin

01 · BM25 full-text

1Finance manual · EVAhit 0.84

2Valuation methodshit 0.71

3Case: EVA appliedhit 0.66

02 · Vector search (BGE-M3)

1Finance manual · EVAsim 0.89

2Valuation methodssim 0.76

3Case: EVA appliedsim 0.71

03 · Merge + dedupe45 candidates

04 · Cross-encoder rerank+24pp

①EVA model explained

0.96

②Finance manual · EVA

0.91

③Case: EVA applied

0.84

Rerank lift

+24pp

Hallucination rate

< 2%

C-0203-B · Hybrid Retrieval + Rerank

Hybrid retrieval with cross-encoder rerank

A visible recall pipeline, not a black box

A single query fires BM25 full-text + vector semantic in parallel; candidates merge and dedupe, then a cross-encoder reranks. The diagnostic panel visualizes hit scores, similarity scores, and filters at every step — any recall miss is traceable to a specific stage.

BM25 keyword + BGE-M3 vector search in parallel
Cross-encoder rerank lifts Top-K precision by +24pp
Multilingual embeddings align EN/ZH terminology — cross-lingual Q&A
Metadata filters (department / time / permission / doc type)
Recall diagnostics panel — every query is replayable

ACL inheritance

User A · Sales

12,400

Visible docs

✓Confluence: Marketing

✗Confluence: Legal

✓SharePoint: Sales

✗SharePoint: Legal

✓Notion: Leads

User B · Legal

8,200

Visible docs

✗Confluence: Marketing

✓Confluence: Legal

✗SharePoint: Sales

✓SharePoint: Legal

✗Notion: Leads

Audit log · last 1h47 total

14:32User ACustomer listanswered · 2 cited

14:28User BContract tpl v3.1answered · 3 cited

14:21User CPayroll listblocked

14:18User ASupport workflowfeedback 👍

Out-of-scope events

Audit coverage

100%

C-0303-C · ACL Inheritance + Audit

ACL inheritance to the field, full audit trail

Data stays in; every question is logged

RAG inherits ACLs live from source systems (Confluence Space / SharePoint Site / Notion Workspace) — what a user sees in RAG strictly equals what they see in the source. Out-of-scope access is auto-blocked + alerted. All Q&A is logged and traceable to user / question / cited docs / model version.

Live ACL inheritance from Confluence / SharePoint / Notion
Department / role / field-level granular permissions
Full Q&A audit traceable to user / docs / model version
Out-of-scope access auto-blocked + real-time alerts
Compliance export (PRC Class III / GDPR / ISO 27001)

03/ARCHITECTURE

An observable, intervenable RAG pipeline + on-prem architecture

RAG must be a pipeline, not a black box. Each step has clear inputs, outputs, and degradation strategy. Below are the 5 layers plus the 5-stage processing pipeline.

5-stage pipeline

From document ingestion to answer return, every step is observable, intervenable, and replayable. Quality drops are traceable to a specific stage.

Ingestion · multi-source

Live incremental sync

Native connectors pull from Confluence / SharePoint / Notion / shared drives / business DBs. Full bootstrap + incremental sync + webhook push keep document lifecycle aligned with source systems.

Confluence APIGraph APINotion APIS3 / FTPWebhook

OutputNormalized doc objects + metadata + permission tags

Parsing & chunking

Semantic boundaries

PDF / Word / PPT / Excel / scanned docs normalized to structured objects. Chunks split by sections / paragraphs / tables — not fixed character windows — to preserve context.

PaddleOCRUnstructuredMarkdownSemantic chunk

OutputBoundary-aware chunks + section path + metadata

Embedding & indexing

Multilingual + tuned

Multilingual embeddings (BGE-M3 / text-embedding-3) cover EN/ZH and domain-specific terms. Chunks + metadata land in the vector DB; BM25 inverted index is built in parallel for hybrid search.

BGE-M3Milvus / QdrantBM25Inverted index

OutputVector + full-text dual index + metadata graph

Retrieval & rerank

Hybrid + cross-encoder

Queries hit vector + BM25 in parallel; candidates merged and re-ranked by cross-encoder. Metadata filters (department / time / permission) and query rewriting are applied.

Vector searchBM25Cross-encoderMetadata filter

OutputTop-K relevant chunks + recall explanation

Generation & citation

Enforced citation + hallucination filter

LLM generates over the retrieved chunks with citations enforced per fact. Self-RAG validates the answer is supported by the retrieval. When uncertain, return 'not found' instead of fabricating.

DeepSeek / Qwen / LlamaSelf-RAGCitation enforcementConfidence

OutputCitable answer + sources + confidence

5-layer architecture

五层各司其职、可独立替换演进。任何一层都可以从云服务切到自部署,从 SaaS 模型切到自训练模型。

Application

Web Q&A · WeCom / DingTalk / Slack · Open API · Admin

Generation

LLM inference · enforced citation · Self-RAG · confidence · hallucination filter

Retrieval

Vector DB (Milvus / Qdrant) · hybrid search · cross-encoder rerank · metadata filter

Processing

PDF / Word / PPT / Excel parsing · OCR · semantic chunking · multilingual embedding

Ingestion

Confluence / SharePoint / Notion / S3 / MySQL / Lark · live sync · webhooks

04/SCENARIOS IN ACTION

Three representative scenarios, anonymized

Each scenario is an abstracted, anonymized representation to help you judge what RAG can land in your organization.

Legal services

Mid-size law firm · compliance + precedent

120K historical contracts + regulations + precedents in scope. Lawyers ask 'is this clause supported by precedent in X industry' and get specific case IDs + key citations + risk rating. Review time drops from ~4h to 2.2h.

First-pass review

-45%

Citation rate

100%

Clause recall

+38%

Advanced manufacturing

Manufacturing group · process + equipment library

80K SOPs + equipment manuals + fault histories. Engineers scan a QR and ask 'machine reports E-217 — how do I handle it', RAG returns the SOP + similar cases + owner. New-engineer ramp shrinks from 90 to 35 days.

Time-to-ramp

-61%

Mean fix time

-42%

Repeat questions

-58%

Internet / SaaS

Mid-size SaaS · company-wide IT self-service

IT policies + processes + FAQs + ticket archive ingested. Employees @ the assistant in Lark, ask 'how do I swap my laptop', RAG answers and auto-drafts the OA request. Monthly IT tickets drop from 1,200 to 540.

IT tickets

-55%

First response

30s → instant

Employee NPS

+22

Scenarios are representative and anonymized; actual project data is delivered separately under partner NDA.

05/SECURITY

Data stays on-prem; permissions inherit down to the field

RAG becomes the enterprise's neural hub. Its security equals the security of your digital assets. We treat security as a first-class concern, not a post-launch patch.

S-01

On-premise deployment

Full on-prem / hybrid cloud / domestic stack (Xinchuang · Kunpeng · Phytium) supported. Processing, vector DB, and model inference all run inside the enterprise network — data never leaves.

S-02

Permission inheritance

ACLs from source systems (Confluence / SharePoint / Notion) propagate into RAG. Users can't see anything they couldn't see in the source. Department / role / field-level control.

S-03

Audit & traceability

All Q&A logged in full. Every answer is traceable to user / question / cited documents / model version. Citation chains expand to the original source.

S-04

End-to-end encryption

TLS 1.3 in transit, AES-256 at rest. Vector DB and document store encrypted independently. KMS integration so keys never leave the enterprise.

S-05

Compliance

Classified Protection Level 3 / GDPR / ISO 27001 / financial regulatory frameworks supported. Compliance checklists and pen-test reports delivered with the system.

S-06

Model isolation

100% self-hosted LLM option (DeepSeek / Qwen / Llama). Zero third-party API dependency. All retrieval and generation runs inside the enterprise — no outbound calls.

If you have a concrete workflow AI hasn't solved yet, let's figure out the right approach together.

We unpack the workflow with you, judge whether AI is worth using and which approach makes the most sense, then come back within 5 business days with a practical initial plan and estimate.

Business email

contact@boilingwater.cn

Office

10F, South Tower, Kingkey Yujing Times, Longgang District, Shenzhen

Turn scattered enterprise knowledge into a queryable, trusted, citable private AI expert

10 min read · v1.0 · updated Jun 2026

92.4%

Retrieval accuracy

Top-3 hit · hybrid search

< 2.5s

End-to-end latency

P95 ≤ 4.0s

-68%

Expert queries

90 days post-launch

100%

Answer citation

Every answer is sourced

Enterprise knowledge management is moving from document piles to expert systems

Turn scattered enterprise knowledge into a queryable, trusted, citable private AI expert

Enterprise knowledge management is moving from document piles to expert systems

Four old problems generic LLMs can't solve

Knowledge scattered across a dozen systems

Senior staff are the search engine; attrition = knowledge loss

Generic LLMs hallucinate with confidence

On-premise vs. AI is a compliance dead zone

One stack, five immediately landable scenarios

Onboarding assistant

IT / process self-service

Sales knowledge copilot

Tier-1 customer support

Legal / compliance lookup

Three core capabilities, broken down with real UI

Cross-source Q&A with 100% sourced answers

Hybrid retrieval with cross-encoder rerank

ACL inheritance to the field, full audit trail

An observable, intervenable RAG pipeline + on-prem architecture

5-stage pipeline

5-layer architecture

Three representative scenarios, anonymized

Mid-size law firm · compliance + precedent

Manufacturing group · process + equipment library

Mid-size SaaS · company-wide IT self-service

Data stays on-prem; permissions inherit down to the field

On-premise deployment

Permission inheritance

Audit & traceability

End-to-end encryption

Compliance

Model isolation

More Wavesteam solutions

Capital Markets Doc AI

Handwritten Order OCR

AI Vision for Security

If you have a concrete workflow AI hasn't solved yet, let's figure out the right approach together.

Turn scattered enterprise knowledge into a queryable, trusted, citable private AI expert

Enterprise knowledge management is moving from document piles to expert systems

Four old problems generic LLMs can't solve

Knowledge scattered across a dozen systems

Senior staff are the search engine; attrition = knowledge loss

Generic LLMs hallucinate with confidence

On-premise vs. AI is a compliance dead zone

One stack, five immediately landable scenarios

Onboarding assistant

IT / process self-service

Sales knowledge copilot

Tier-1 customer support

Legal / compliance lookup

Three core capabilities, broken down with real UI

Cross-source Q&A with 100% sourced answers

Hybrid retrieval with cross-encoder rerank

ACL inheritance to the field, full audit trail

An observable, intervenable RAG pipeline + on-prem architecture

5-stage pipeline

5-layer architecture

Three representative scenarios, anonymized

Mid-size law firm · compliance + precedent

Manufacturing group · process + equipment library

Mid-size SaaS · company-wide IT self-service

Data stays on-prem; permissions inherit down to the field

On-premise deployment

Permission inheritance

Audit & traceability

End-to-end encryption

Compliance

Model isolation

More Wavesteam solutions

Capital Markets Doc AI

Handwritten Order OCR

AI Vision for Security

If you have a concrete workflow AI hasn't solved yet, let's figure out the right approach together.