GOVERNMENT AGENCY
Question Time: securing interview data


50% Workload Reduction in Two Weeks
Kablamo built a secure interview data platform for a government agency that reduced transcription workload by over 50% using ML-powered speech recognition with custom vocabulary, encrypted storage, and full-scale security compliance.
“This lays the groundwork for a living interview solution that could support positive investigative outcomes.”
Government Agency, Interview Data Platform
The Challenge
Government interviews are valuable digital assets requiring careful management, transcription, storage, and retrieval. The existing system required administrative personnel or officers to transcribe interviews manually. With a high degree of variation in audio quality and an average interview length of one hour, the process was time consuming, expensive, and emotionally difficult for staff given the sensitive nature of the content.
A one-hour interview could take five to ten hours to transcribe, depending on the quality of the recording. A backlog of transcriptions was building up across the agency. Interviews were stored on physical media in a centralised warehouse, and search and retrieval times were lengthening as the archive grew. Legacy systems made it difficult to cross-reference or search across cases.
The solution needed to address four priorities with equal weight: speech-to-text optimisation, transcription editing, storage and retrieval, and encryption and security. Given the sensitivity of the data, security was not an afterthought but a first-class requirement from day one.


The Approach
Kablamo built a secure digital asset management platform with equal priorities across the four areas. Initial prototypes of the system and editor were delivered in under two weeks and given to agency teams to trial. User feedback from those trials informed the optimisation of the final implementation.
The speech-to-text research phase tested Amazon Transcribe with real interview data of varying quality, including different accents, slang usage, diction, recording equipment, and file types. Success rates were monitored across multiple configurations to inform the ML solution development.
A significant finding came from the audio preprocessing experiments. The team tested various methods to improve transcription of low-quality recordings, including volume normalisation, noise reduction, and speed adjustment. Counter-intuitively, audio preprocessing actually reduced accuracy in most cases. The ML pipeline had been trained to compensate for imperfections in raw audio, and altering the recordings introduced artefacts that caused more errors than they prevented. The decision to use raw audio as the primary input improved results across the board.
For speaker identification, the team tested channel-based diarisation approaches. Multi-channel separation with clear channel assignment achieved near-perfect accuracy when each channel contained a single speaker. Accuracy dropped when speakers talked simultaneously or alternated quickly, which informed the design of the transcription editor's manual correction interface.
Custom vocabulary training achieved strong results for domain-specific terms and phrases that appeared frequently in interviews. The system handled specialised legal and procedural language well after training. Proper nouns, including local place names, produced mixed results: the system sometimes replaced common words with similarly-sounding place names, a trade-off that was managed through the editing interface rather than attempting to eliminate it programmatically.
The platform included a transcription editor with audio waveform visualisation, speaker diarisation display, and stereo audio channel separation. The editor was designed for efficiency, allowing staff to review and correct transcriptions while listening to the original audio, with the interface highlighting areas of lower confidence for prioritised review.
The Results
The speech-to-text automation reduced administrative staff workload by 50% or more for most audio recordings. This delivered both efficiency gains and mental health benefits for staff who previously had to repeatedly listen to sensitive interview content to produce manual transcriptions.
The editing platform proved intuitive to use through its UX-focused design. Staff could review ML-generated transcriptions, correct errors, and annotate content without specialised training. The custom vocabulary training meant that domain-specific language was handled accurately from the first pass, reducing the volume of corrections needed.
The platform provided a secure cloud-based system for storing, searching, and editing interview data, replacing the legacy physical media warehouse with a digital asset repository. Encryption at rest and in transit, combined with comprehensive audit logging, met the agency's security requirements for handling sensitive material. Every access, edit, and export is logged with full attribution, providing the compliance trail that the agency's governance framework requires.
Search and retrieval times improved dramatically compared to the physical archive system. Staff can now locate specific interviews, search within transcriptions for keywords or phrases, and retrieve historical records in seconds rather than waiting for physical media to be located and delivered from the warehouse.
Looking Forward
This solution goes well beyond transcription efficiency. It opens up a powerful ML-based future for the agency. With interview data now stored digitally in a searchable, structured format, the platform supports capabilities that were impossible with the physical archive: meta-tagging interviews, cross-referencing data across cases, and identifying patterns across large volumes of interview material.
Future capabilities under consideration include hardware integration for editing efficiency, expanded cross-referencing tools, and additional ML models trained on the growing corpus of transcribed interview data. As the platform continues to evolve, each new interview that enters the system improves the ML model's accuracy for the agency's specific vocabulary and audio conditions. What started as a transcription efficiency project has become the foundation for a searchable, secure, ML-enhanced intelligence capability.
RELATED CASE STUDIES





