Vendor Security and Customer Requirement Questionnaires as a Retrieval Problem

If you work in or around enterprise software security, you have answered the same hundred questions many times over. The Standardised Information Gathering questionnaire (SIG) asks them one way, the CSA Consensus Assessments Initiative Questionnaire (CAIQ) another, the Higher Education Community Vendor Assessment Toolkit (HECVAT) a third, and the customer's bespoke spreadsheet yet another. Do you encrypt data at rest and in transit? Describe your incident response process. Provide evidence of SOC 2 Type II certification. What is your sub-processor list?

The standard approach is to treat each Vendor Security and Customer Requirement Questionnaire as a writing exercise: start from a blank page — or last quarter's submission, if you have one — re-derive answers from memory and scattered documentation, hunt for the right link to attach, and hope the wording aligns with what your company's InfoSec and Legal teams have already approved. It works, but it is slow, prone to inconsistency across submissions, and every novel phrasing of a familiar question consumes disproportionate time.

The structural property these questionnaires share is that the answer to any given question is largely determined by what your organisation has already said in prior approved submissions. Once you have answered the encryption question in an official SIG response, you have answered it in every future Vendor Security and Customer Requirement Questionnaire — if you can match the new phrasing to the old one. That is a retrieval problem, not a writing problem. In this post I will detail how I built a Claude Code skill that treats it as such, using a pre-built corpus of approved answers and a local TF-IDF matcher to handle the matching without any LLM round-trips or external API calls.

The components we will cover in this post are as follows.

Building a canonical answer corpus from official questionnaire submissions
Matching new questions to the corpus with offline TF-IDF
A fallback docs search for product-capability questions
Confidence scoring and link validation
Wiring the whole thing into Claude Code as a skill

Building the Answer Corpus

The answer bank is a flat JSON array of records in the shape {question, answer, source, category}. The source field is the most important design decision in the whole system — it encodes authority.

Vendor Security and Customer Requirement Questionnaire frameworks are typically submitted as pre-filled spreadsheets, with each framework having its own sheet layout and column conventions. The SIG, for instance, stores questions in column C and answers in column D of its named sheet, with headers at a specific row. CAIQ uses a different column mapping in its CAIQv* sheet. HECVAT locates its answer column by finding the cell literally named "Answer" and treating the longest adjacent text cell as the question. Each format is mechanical to parse once you have read the layout spec; a build script recurses a folder of source documents, normalises every record into the canonical schema, and deduplicates on the first 120 characters of the combined question and answer text. Duplicate copies of the same questionnaire from re-submissions collapse automatically.

The source hierarchy — the order of trust when multiple records match — is where the approach pays dividends over a simple search index. An approved submission to an industry framework carries legal weight; it is the company's actual recorded position. An internal content library is useful and broad but is typically uncurated, potentially stale, and carries no formal sign-off. When the matcher finds two candidate answers, the one from the official submission wins. This prevents a response from drifting toward a more optimistic library answer when the approved position is more careful.

Figure 1 — How the answer bank is assembled from source questionnaires

Matching Questions Offline

The matching problem is that "Do you encrypt customer data at rest and in transit?" and "Describe your encryption posture for data stored at rest and during transmission" should resolve to the same answer. Exact-match lookup fails. An LLM call would work but costs tokens on every question and introduces non-determinism — for a Vendor Security and Customer Requirement Questionnaire with two hundred rows, that adds up quickly and the output can vary between runs.

TF-IDF cosine similarity handles this well. The vocabulary is technical and repetitive, which is exactly where TF-IDF excels. The matcher is a small scikit-learn script that builds a TF-IDF matrix from the corpus at startup and ranks candidates by cosine similarity to the incoming question string.

python3 matcher.py "Do you encrypt customer data at rest and in transit?"
# or batch mode:
printf "question one\nquestion two\n" | python3 matcher.py

The script returns the top matches with their scores. I use three bands in practice:

≥ 0.45 — strong match. The answer can be used near-verbatim, subject to the date-verification caveats below.
0.25–0.45 — adjacent. There is likely a relevant answer but read it carefully before using it; the question may cover a subtly different scope.
< 0.25 — no real match. The bank does not have a good answer for this one; go to the docs or draft fresh.

The matcher runs entirely offline, against a local file, with no network dependency. For a 200-question questionnaire it finishes in under a second on commodity hardware.

The Docs Fallback for Product-Capability Questions

The answer bank is strong for corporate-fact questions: incident history, sub-processors, BCP dates, certification status, key management approach. These have stable approved answers that change infrequently.

Product-capability questions are different. "Which MFA factors do you support?", "Do you support FAPI 2.0?", "What are your Management API rate limits?" — these describe current product behaviour, and a pre-filled questionnaire from eighteen months ago may not reflect what has shipped since. For these, the authoritative source is current product documentation, not a prior submission.

The skill handles this with a second TF-IDF index built from a sparse clone of the vendor's documentation repository. The same scoring interface applies:

python3 search_docs.py "what MFA factors does Auth0 support"
# → score 0.72  secure/multi-factor-authentication.mdx  "Multi-Factor Authentication"
#              url: https://auth0.com/docs/secure/multi-factor-authentication
#              read: assets/docs-v2/main/docs/secure/multi-factor-authentication.mdx

A strong hit returns the local MDX file path. You read the current source directly and synthesise the answer from the document rather than paraphrasing a stale submission; the citation points to the live documentation URL rather than the questionnaire record. The docs index is built once from the sparse clone and refreshed incrementally — a git fetch and a conditional index rebuild that runs in one to two seconds if nothing relevant has changed.

Confidence Scoring and Link Validation

When a Vendor Security and Customer Requirement Questionnaire will go through InfoSec or Legal review before submission, it is useful to make uncertainty explicit rather than burying it in the prose. I assign a 1–5 confidence score to each drafted answer:

5 — corroborated by an official pre-filled questionnaire submission and all cited documentation links resolve.
4 — sourced from current product documentation (first-party authoritative, but the corporate position comes from the docs rather than a signed-off submission).
3 — supported by the internal content library only, or the cited links have not been verified.
1 — no bank corroboration, or the answer contradicts a known approved position.

The link validation step is worth dwelling on. Documentation sites reorganise their URL structures with some regularity — pages move, sections are renamed, and a URL that was valid six months ago may now return a 404. A cited link that 404s undermines the credibility of an entire response. The skill validates every URL it intends to cite against a local index of known-good paths, then runs HTTP HEAD checks against any URL not in the index. A 404 gets flagged with the nearest valid path from the index as a suggested replacement. This catches the majority of link-rot issues before the response leaves the draft stage.

Additionally, specific values to mark for manual verification before any submission: dates, version numbers, uptime percentages, and sub-processor lists. These change and the bank records are point-in-time snapshots. I flag these inline as [VERIFY date] so the reviewer knows exactly where to check.

Figure 2 — How the skill processes an incoming question at runtime

Wiring It Into Claude Code

The skill is a directory under ~/.claude/skills/ containing a SKILL.md file that describes when to invoke it, what assets are bundled, and the exact workflow sequence to follow. Claude Code loads it automatically when the session prompt contains any of the defined trigger phrases — "fill in the RFP", "security questionnaire", "CAIQ", "HECVAT", "vendor assessment" — and the context (corpus path, score thresholds, fallback sequence, link-validation commands) is available immediately without re-explanation.

The practical benefit of packaging this as a skill rather than a saved prompt is that the workflow is versioned alongside the corpus. When new questionnaire submissions arrive and the bank is rebuilt, the skill picks up the updated JSON on the next session. When the docs refresh script runs and the index updates, the search behaviour changes automatically. Instructions and data stay in sync.

Final Thoughts

The pattern here — pre-built corpus, offline TF-IDF matching, source authority hierarchy, confidence scoring, link validation — is not specific to identity or security products. Any domain where the same questions recur across multiple customers or frameworks, and where prior approved answers carry weight, is a candidate for the same approach. If you maintain answers to recurring Vendor Security and Customer Requirement Questionnaire questions in a structured document today, the conceptual distance to a matchable corpus is smaller than it might appear.

The one thing this does not replace is human sign-off. The skill produces drafts; InfoSec and Legal review them. The confidence scoring is designed to make their review faster, not to substitute for it.

The Shared Assessments SIG, CSA CAIQ, EDUCAUSE HECVAT, and MVSP framework templates are all publicly available if you want to understand the question categories these tools are built around.