Native Voice Input Widget with Microphone Button for Custom Interfaces

The Need: For mobile-first workflows, users need to quickly capture information via voice while on the go - whether driving, walking, or in situations where typing is impractical.

Current Workaround (Clunky): Yes, users can technically use their device’s native keyboard microphone for speech-to-text (iOS dictation, Android voice typing), but this creates friction:

  • Extra step to find keyboard mic button

  • Not obvious to users

  • Breaks the app flow

  • Feels like a workaround, not a feature

Requested Feature: A native microphone button component directly in custom interfaces that:

  • Prominent mic icon users can tap to start/stop recording

  • Records audio directly in-browser (Web Speech API)

  • Transcribes in real-time as they speak

  • Seamlessly fills the text variable

  • Professional, integrated experience

  • Works on mobile (iOS Safari, Chrome/Android)

Why This Matters: The difference between “use your keyboard’s mic button” vs “tap the big microphone icon in the app” is the difference between a workaround and a polished product.

For any mobile-first use case, a dedicated in-app voice button is essential for professional deployment.

Business Impact: MindStudio already has Audio Transcribe for uploaded files - extending this to a native recording widget would unlock voice-first AI applications across industries including healthcare, field services, sales, and any mobile workforce.

1 Like

Through the miracle of custom Mindstudio interfaces, React, and Claude code I was able to figure this out by plugging in Claude Generated Code.

React Changes to Add Voice Input to MindStudio Custom Interface


1. Update the Import Statement

Change from:

typescript

import { useState } from 'react';

Change to:

typescript

import { useState, useEffect, useRef } from 'react';

2. Add New Styled Components

Add these after your existing styled components (before the App function):

typescript

const VoiceButton = styled.button`
  width: 56px;
  height: 56px;
  border-radius: 50%;
  border: none;
  background: ${props => props.isRecording 
    ? 'linear-gradient(135deg, #D64545 0%, #B93838 100%)'
    : 'linear-gradient(135deg, #9CAF88 0%, #89A076 100%)'};
  color: #FFFFFF;
  font-size: 24px;
  cursor: pointer;
  transition: all 0.3s ease;
  box-shadow: 0 4px 16px rgba(156, 175, 136, 0.25);
  margin-bottom: 16px;
  display: flex;
  align-items: center;
  justify-content: center;

  &:hover {
    transform: scale(1.05);
    box-shadow: 0 6px 24px rgba(156, 175, 136, 0.35);
  }

  &:active {
    transform: scale(0.95);
  }

  @media (max-width: 640px) {
    width: 48px;
    height: 48px;
    font-size: 20px;
  }
`;

const VoiceControls = styled.div`
  display: flex;
  align-items: center;
  gap: 12px;
  margin-bottom: 16px;
`;

const RecordingStatus = styled.div`
  font-size: 14px;
  color: ${props => props.isRecording ? '#D64545' : '#8B8680'};
  font-weight: 600;
`;

3. Add State and Refs to Your App Function

Add these inside your App function, after the existing useState declarations:

typescript

const [isRecording, setIsRecording] = useState(false);
const [recognition, setRecognition] = useState(null);
const lastSpeechTimeRef = useRef(Date.now());
const checkIntervalRef = useRef(null);

4. Add Speech Recognition useEffect

Add this after your state declarations:

typescript

// Initialize speech recognition
useEffect(() => {
  if ('webkitSpeechRecognition' in window || 'SpeechRecognition' in window) {
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
    const recognitionInstance = new SpeechRecognition();
    
    recognitionInstance.continuous = true;
    recognitionInstance.interimResults = true;
    recognitionInstance.lang = 'en-US';
    
    recognitionInstance.onresult = (event) => {
      // Update last speech time immediately
      lastSpeechTimeRef.current = Date.now();
      
      let finalTranscript = '';
      
      for (let i = event.resultIndex; i < event.results.length; i++) {
        const transcript = event.results[i][0].transcript;
        if (event.results[i].isFinal) {
          finalTranscript += transcript + ' ';
        }
      }
      
      if (finalTranscript) {
        setInteractionNotes(prev => prev + finalTranscript);
      }
    };

    recognitionInstance.onerror = (event) => {
      if (event.error !== 'no-speech' && event.error !== 'aborted') {
        console.error('Speech recognition error:', event.error);
      }
      setIsRecording(false);
    };

    recognitionInstance.onend = () => {
      if (checkIntervalRef.current) {
        clearInterval(checkIntervalRef.current);
        checkIntervalRef.current = null;
      }
      setIsRecording(false);
    };
    
    setRecognition(recognitionInstance);
  }

  return () => {
    if (checkIntervalRef.current) {
      clearInterval(checkIntervalRef.current);
    }
  };
}, []);

5. Add Recording State Management useEffect

Add this after the previous useEffect:

typescript

// Handle recording state changes with auto-stop after 4 seconds of silence
useEffect(() => {
  if (isRecording && recognition) {
    // Reset speech time when starting
    lastSpeechTimeRef.current = Date.now();
    
    // Start checking for silence every 500ms
    checkIntervalRef.current = setInterval(() => {
      const silenceDuration = Date.now() - lastSpeechTimeRef.current;
      if (silenceDuration > 4000) {
        recognition.stop();
        if (checkIntervalRef.current) {
          clearInterval(checkIntervalRef.current);
          checkIntervalRef.current = null;
        }
      }
    }, 500);
  } else {
    // Clean up interval when not recording
    if (checkIntervalRef.current) {
      clearInterval(checkIntervalRef.current);
      checkIntervalRef.current = null;
    }
  }

  return () => {
    if (checkIntervalRef.current) {
      clearInterval(checkIntervalRef.current);
      checkIntervalRef.current = null;
    }
  };
}, [isRecording, recognition]);

6. Add Toggle Recording Function

Add this function before your return statement:

typescript

const toggleRecording = () => {
  if (!recognition) {
    alert('Speech recognition is not supported in your browser. Please use Chrome or Safari.');
    return;
  }

  if (isRecording) {
    recognition.stop();
    setIsRecording(false);
  } else {
    recognition.start();
    setIsRecording(true);
  }
};

7. Add Voice Button to Your JSX

Add this inside your <FormGroup>, before your <TextArea>:

typescript

<VoiceControls>
  <VoiceButton 
    type="button"
    onClick={toggleRecording} 
    isRecording={isRecording}
    title={isRecording ? "Stop recording" : "Start voice input"}
  >
    {isRecording ? '⏹' : '🎤'}
  </VoiceButton>
  <RecordingStatus isRecording={isRecording}>
    {isRecording ? 'Recording... (stops after 4 sec pause)' : 'Tap mic to speak'}
  </RecordingStatus>
</VoiceControls>

Configuration Notes

  • Auto-stop timing: Change 4000 in the second useEffect to adjust silence duration (in milliseconds)

  • Language: Change recognitionInstance.lang = 'en-US' to other languages as needed

  • Colors: Adjust the gradient colors in VoiceButton to match your brand

  • Browser support: Works in Chrome, Safari (iOS/Android). Firefox doesn’t support Web Speech API.


That’s it! Your MindStudio custom interface now has native voice input with auto-stop functionality. :microphone:

2 Likes

This is awesome, Mark! Would you be willing to send this to me as a simple template workflow? If so, I would be very grateful! And if you could also share that on the Community Agents page, I’m sure many others would also find this very useful. Thank you!!

  • Ken

Had some issues with different IOS versions so several tweaks since posting, will post something in Community once finished my current project. I see there is a telegram connection now which might give more options.

1 Like