From an Audio Recording to a Signed Legal Document: How to Automate What Takes Hours Today

A notary records a regular shareholders' meeting. The session lasts an hour and a half. Afterwards, an assistant sits down to listen to the entire recording, takes notes, identifies who said what, extracts the agreements, the participants' data, the votes, and the relevant points. They draft the minutes in Word using the notary's template. The notary reviews, corrects, signs, and files.

Total time from recording to final document: between 4 hours and 2 days, depending on session complexity and the notary's workload.

This process hasn't changed in decades. The recorder went from cassette to digital, the word processor from typewriter to computer, but the workflow remains the same: someone listens, someone transcribes, someone drafts, someone reviews. Four manual steps, each error-prone and dependent on people's availability.

The bottleneck is transcription

Of the four steps, transcription is the slowest and adds the least value. It doesn't require legal judgment — it requires patience. An experienced assistant transcribes at a 4:1 ratio (four hours of work per hour of audio). An inexperienced one, more.

And it's not just transcribing. It's identifying who's speaking at each moment. In a meeting with 8 participants, distinguishing voices is an exercise in memory and concentration. A misattribution — assigning a statement to the wrong shareholder — can have serious legal consequences.

Automatic transcription with diarization

Automatic transcription technology has matured enormously. Models like WhisperX can transcribe Spanish audio with over 95% accuracy under reasonable recording conditions.

But transcription alone isn't enough for legal documents. What's needed is diarization: the ability to identify who's speaking in each audio segment. "Approved unanimously" said by the meeting chair is not the same as said by a minority shareholder expressing a wish.

Diarization assigns each transcription fragment to the corresponding speaker. The result is not a continuous text block but a structured conversation with timestamps and speaker labels.

From transcription to document: AI extraction

Once you have the speaker-structured transcription, a language model can extract relevant information according to the document type you need to generate.

For shareholders' meeting minutes: date, time and place, participants' names and roles, agenda, agreements with voting details, relevant statements.

For a statement of facts: description of observations, circumstances of manner, time and place, persons present.

For corporate meeting minutes: attendees, topics discussed, agreements, responsible parties, commitment dates.

The model doesn't invent content — it extracts and structures what's in the transcription. If something wasn't said in the recording, it doesn't appear in the document.

Customizable templates per organization

Each notary office, law firm, and corporation has its own document templates. A well-designed system lets each organization define its own DOCX templates with markers that the system fills automatically. The notary keeps using their usual format — only the variable data is filled by AI instead of an assistant.

This matters because technology adoption in legal environments has a high barrier: professionals don't want to change their format or workflow. If the system adapts to them (not the other way around), adoption is much more natural.

Blockchain sealing of the final document

Once generated, the final step is protecting the document with date certainty and integrity guarantee. The document's digital fingerprint (SHA-256 hash) is registered on a public blockchain, proving it existed on that exact date with that exact content. Any subsequent modification generates a different fingerprint and is automatically exposed.

The sealed document includes a QR code that lets anyone verify its authenticity without uploading the file to any server.

Concrete use cases

Notary offices. Meeting minutes, statements of fact, powers of attorney, deeds. The notary records, uploads, reviews the generated document, and seals it.

Law firms. Client meeting minutes, declarations, certifications. Any recorded session becomes a structured document.

Corporations. Board minutes, committee records, board of directors' agreements. Formal documentation generated automatically from session recordings.

Medical offices. Consultation notes and informed consent from the doctor-patient conversation. A particular case where diarization (distinguishing doctor from patient) is essential.

The difference between dictation and comprehension

A dictation system transcribes what one person deliberately says. A transcription system with extraction understands a natural conversation between multiple people and extracts structured information. The doctor talks with their patient normally, and the system understands that the onset of symptoms was Friday and structures everything in the correct format without anyone having dictated anything.

This difference is fundamental for adoption. No professional wants to change their way of working to adapt to a dictation system. Everyone accepts that a recording of their normal session automatically becomes a document.

Scriba: from audio to sealed legal document

At Leeuwwolk we developed Scriba, a system that automates the complete flow: audio → transcription with diarization → AI extraction → document in custom template → blockchain sealing via SureSeal.

Each organization configures its own DOCX templates and document types. The system supports multiple speakers, works in Spanish and English, and can be used from desktop or mobile (it's an installable PWA).

Leeuwwolk guarantees the privacy of your recordings and documents: encryption in transit and at rest, no sharing with third parties, no sending audio to public AI services like ChatGPT or Gemini.

→ Learn about Scriba and automate your documentation

Leeuwwolk is a Mexican company specializing in private AI for legal documentation. We guarantee your information's privacy: encryption in transit and at rest, no data shared with third parties or used for model training.