Perhaps text to speech would solve many of the perceived problems and provide advantages such as female avatars with a female voice, profanity filters, no unwanted background noises, less work and bandwidth used by servers, players choose from a dozen or so voices that can be tweaked slightly with sliders at the point of avatar creation, limiting the voice assets required on each client. Also provides a text log on the client in case you were afk during a conversation attempt, client could set distance range to hear conversation