// Case study · Boring AI in production

Putting AI to work, responsibly.

How I dropped a large language model into a regulated Australian financial advice workflow without letting it generate the answers, decide the outcome, or leave the audit trail.

Cameron Carmody Sydney May 2026 14 min read case study

// at a glance

What I built

A secure adviser and client portal with AI-assisted document extraction.

Who it helped

An Australian financial advice firm operating under an AFSL.

What changed

~2.5 hours saved per annual review. ~150 hours recovered per adviser per year.

What's different

AI proposes structured data, humans approve it, code handles the calculations and the audit trail.

I designed and built an adviser and client portal for an Australian financial advisory firm operating under an Australian Financial Services Licence. The portal covers client onboarding, annual reviews, fact-find capture, document collection, and electronic signature. AI sits inside that workflow at one specific point: parsing uploaded source documents into a structured fact-find that a human then reviews.

That is a deliberately small role for a large model. The model never speaks to a client, drafts financial advice, or sees the outbound copy. It reads prose and turns it into fields, and a person checks every value before anything leaves the building. The smaller scope is the point of the project. Extraction takes hours of retyping out of a regulated process while leaving the parts that matter — judgement, calculation, sign-off — on rails the firm can audit.

Results, in production live since early 2026

~2.5 hrs

saved per existing-client annual review

~150 hrs

recovered per adviser per year

~4 weeks

from kickoff to first client live

~$0.08

per extraction at current rates

8–15 s

p95 wall-clock per extraction

claude-sonnet-4-6

pinned model, upgrades gated on regression run

Prefer to see it instead of read about it? Open the five-minute walkthrough →

“We didn't want a chatbot anywhere near our clients. We wanted software where the compliance work was structural, so the advisers could spend their time on advice. Fact-find extraction has taken hours out of every annual review, and when we need to evidence how something happened the answer is sitting in the audit log.”
GM and Adviser · Australian financial advisory firm

Figure 1 End-to-end workflow. The AI step is one box in a longer chain. Every state change writes an immutable audit row, in the same database transaction as the change itself.

01 — Guardrails

Catching the model when it gets things wrong.

The first failure mode I designed against was wrong data flowing into a regulated workflow undetected. Four choices, working together, do most of the heavy lifting.

The model writes into a fixed schema

Extraction runs through Anthropic's tool_use API. The model returns JSON conforming to a JSON Schema that mirrors the firm's fact-find. Each response is validated against a Zod schema before it touches the database; if validation fails, the extraction is marked failed and the adviser sees why. There is no regex parsing of model prose. Every downstream surface — review UI, diff engine, audit log, PDF renderer — consumes the same typed data, so a single contract carries from extraction through to the signed artefact. The model version is pinned at claude-sonnet-4-6; upgrades are gated on a regression run against a frozen extraction corpus, never auto-rolling.

Source documents are treated as untrusted input. The system prompt is isolated from the document content, and the model has no tools beyond returning the extraction schema, so a hostile PDF that asks the model to set income high cannot reach into the application. Schema validation prevents structural surprises. Values that fit the shape but are factually wrong still rely on the adviser's eye, which is what the rest of this section is about.

The model flags its own uncertainty

Every leaf in the extracted tree carries a confidence label of ● high, ● medium or ● low. The model is instructed to lower its confidence when it is inferring, guessing, or finding the value in only one place. The review UI surfaces these as small red, amber, or green dots beside each field. On an internal sample of adviser-reviewed extractions, around nine in ten high-confidence fields were correct, and most adviser corrections sat under low or medium dots.

Honest caveat These are internal estimates rather than a published evaluation, and the model's confidence is not itself calibrated against ground truth. Treated honestly, the dots are a budget for adviser attention. The adviser is still the gate.

Multiple documents read in one batched call

Real adviser fact-finds arrive as a stack of source documents: a prior statement of advice, a bank statement, a super statement, a driver's licence. The extraction pipeline accepts up to five files in a single batch and sends them as separate content blocks to the model in one call. The model can reconcile facts across documents — the name on an ID compared with the name on a bank statement, for instance — instead of producing five independent extractions that have to be merged with brittle key-matching logic.

Every field links back to its source page

Each extracted field is stamped with the source document and the page number the value was lifted from. The review panel shows that source beside the field, and clicking it opens the original document at that page in an embedded viewer. The adviser can confirm a value against its source without leaving the review screen, which makes due diligence on a stack of documents much cheaper. Provenance also persists on the record after confirmation, so months later the firm can answer “where did that number come from?” without repeating the original analysis.

02 — Approval gate

Three humans in series, then signature.

Extracted data is a proposal. The portal treats it that way through three approval stages, with the AI's involvement ending at stage zero.

Stage one is adviser review. The adviser opens the review panel and works through each section, fixing values, adding missing partners or dependents, and confirming the extraction. The state machine moves the extraction row from pending to confirmed. No client communication can fire until that stamp is in place.

Stage two is client confirmation. The portal generates a magic link bound to the client and sends it by email, with a one-time SMS code as a second factor. The client opens a pre-filled form, edits anything they want to correct, and submits. Their submitted responses become the canonical record. A server-computed diff between the adviser-confirmed pre-fill and the client-confirmed response is stored alongside, so any change a client made is visible at a glance forever.

Stage three is the adviser cockpit. Before the fact find is sent for e-signature, the adviser ticks agenda items grouped by section, walks through a money audit, sets the scope of advice, and acknowledges general warnings. On confirmation, the portal snapshots a pinned view model of what the client will sign. Later edits to underlying overrides do not change what was signed; the signed document references the snapshot. The signed artefact and the working copy can drift, intentionally, and the system always knows which is which.

UX earns the human checks

Every one of those stages has to feel natural or it gets skipped. Pre-filled forms with stable step IDs and draft auto-save so a client can come back days later and resume. Magic-link entry with a fresh six-digit SMS code, no password to remember and no app to install. Confidence dots that take a quarter of a second to read. The controls work because they are not aversive to the people who have to use them.

Numbers don't come from the model

The model handles one thing: turning messy documents into structured fields. Arithmetic, consistency across a long document, and anything that has to look the same in a year run in TypeScript instead. A deterministic view-model builder merges client responses with adviser overrides, and the same view model feeds the money audit panel on screen and the signed PDF, so what the adviser sees is what the client signs. The diff between pre-fill and submitted values is also computed in code, by a single computeDiff() function, and it powers four downstream surfaces consistently. The deterministic side is also where it is cheap to test, which matters for anything the firm has to defend later.

The model is allowed to

Read PDFs and DOCX documents
Propose structured field values
Self-report a confidence label per field
Reconcile facts across multiple source documents

Code does the rest

All calculations (totals, ratios, surplus, money audit)
Diff between pre-fill and submitted responses
PDF rendering from React-PDF templates
Authentication, authorisation, state transitions
Audit log writes, retention policy, snapshotting

Figure 2 · The model reads and proposes. Everything that becomes part of the regulated record — numbers, decisions, history — is produced by code.

Things still break. The first day a new feature launched, two latent defects surfaced together: a body-size limit on Server Actions that had been fine for the older endpoints, and a missing worker resolution in a PDF library upgrade. Five errors hit inside seven minutes. Both were diagnosed and shipped the same day. The 30-day SLO dipped to 98.4% and was back at 99.8% inside a week. The error-budget burn alert didn't fire, because the affected endpoint sat below its minimum request floor, and that tuning gap is now itself in the runbook. The point of designing for auditability is that working out what had happened took minutes, with the audit trail doing the work.

03 — Compliance

Designed around the firm's obligations.

The firm operates under an Australian Financial Services Licence. The obligations that shaped this part of the system are the best-interest duty (s961B), the ongoing record-keeping obligation (s912G), and the general AFSL obligations under s912A, with ASIC's RG 175 and RG 244 as the operative guides.

Note This is a technical case study, not legal advice. The firm's compliance position was reviewed in context by the business and its advisers; nothing here should be read as a generic compliance template.

The portal is built and structured to be auditable by design, so day-to-day compliance is something the team does by working normally, and evidencing it for an internal review is reduced to running a query. That was a business driver in its own right: the previous tooling made fact-find capture and ongoing record-keeping a manual chore, and designing for auditability has been the lever to lift the firm's internal systems and the client experience at the same time.

Scope note AML/CTF obligations were assessed separately and were outside this portal's implemented scope.

Immutable audit log, written in-transaction

Every state transition writes a row to an audit_log table: who acted, what they did, on which entity, and the request metadata that prove it. Audit writes happen in the same database transaction as the state change wherever possible, so the trail cannot drift from reality through partial failure. An in-portal audit viewer queries the same table, which means an internal compliance review is a filter and a date range rather than a forensic exercise.

Append-only data model

Submissions are versioned and never overwritten. When an adviser asks a client to revise a field, a new submission row is inserted with a parent_submission_id pointer to the previous one. The seven-year retention obligation under s912G becomes a query, with no parallel backup pipeline to maintain or trust.

Australian-hosted, with one disclosed offshore step

App Service, Postgres Flexible, Blob Storage, and Microsoft Defender for Storage all sit in Azure Australia East. SharePoint runs in the firm's Microsoft 365 tenancy, also in Australia. The one cross-border step in the workflow is the extraction call to the Anthropic API, processed in the United States under the firm's Anthropic Zero Data Retention arrangement. The implementation is restricted to features covered by that arrangement — so customer prompts and outputs are not stored at rest beyond the response and are not used for training, subject to the usual legal-and-misuse exceptions. That offshore step is named in the firm's APP 5 collection notice and treated under APP 8 of the Privacy Act.

Figure 3 The Australian boundary contains data, application, and audit log. The one egress is the extraction call to the Anthropic API, processed in the United States under zero-retention — named in the firm's APP 5 collection notice.

Where the AI sits in the firm's governance map

Anthropic sits on the firm's outsourcing register, with the zero-retention configuration recorded against the entry. That treatment is consistent with the framing in ASIC's RG 104 for material service providers. Extraction errors that touch a client record route into the firm's breach-assessment process, aligned to the reportable-situation regime under s912DAA and RG 78. Any client complaint that arises from an extraction defect is handled inside the firm's internal dispute-resolution scheme under RG 271. The portal makes those workflows lighter on the firm by being the source of the underlying record.

Scan first, store second, push third

Every client upload is scanned by Defender for Storage before it can be released to the firm's SharePoint folder. The verdict arrives via Event Grid and drives a state machine on the documents row. A pending verdict blocks the SharePoint push. An infected file is quarantined and the blob is deleted, the whole quarantine action being one atomic step.

AuditA compliance review is a filter and a date range, not a forensic exercise.
RetentionSeven years of records lives as a database query, with no parallel backup pipeline to trust.
Data locationClient records stay in Australia. The one cross-border step is named in the privacy notice and disclosed up-front.
If it breaksThe audit log already has what happened, who did it, and when. Working out the story takes minutes, not days.

— Closing

Why software, not a chatbot.

A chat interface would have shipped sooner. None of the controls above survive a free chat with a language model: confidence ratings only mean something against a known schema, the three-stage approval gate depends on routes and state transitions, the audit log needs structured action names, and the pinned view model only makes sense if there is a defined thing to pin.

There are two places AI shows up in this project. The first is the build — I wrote this software alongside an AI coding assistant, with the same gate I applied to the extraction feature: it proposed, I reviewed, the deterministic code did the work that had to be defensible. The second is inside the running system, in the one bounded place described above. Together they let one person take on a project of this scope at small-business consulting rates.

Australia has a long tail of small firms whose compliance work is still being done out of file folders, spreadsheets, and software that does not quite fit. The kind of system the big firms take for granted was, until very recently, out of reach for them on cost, time, or expertise alone. It is not out of reach now. That is the work I want to do, and I think it is what putting AI to work responsibly looks like.

Want to see how it actually works?

I've put together a clickable walkthrough demo of the portal — the adviser review, the client confirmation flow, and the audit trail. Five minutes, no signup.

Open the walkthrough → Or talk to me about your workflow

Cameron Carmody builds AI-assisted internal tools for small firms — document intake, workflow automation, client portals, reporting, approval flows, and audit-ready systems where humans stay in control. For consulting enquiries please reach out via the contact page.