Imagine if on your iPhone you had to type a whole paragraph, and then wait a few seconds for it to get sent to Apple’s server, and then get the text back to see if any words were mistyped or miscorrected.
That is how speech recognition today works on mobile devices. It is all done server side (to try out a state-of-the-art example, download Dragon Dictation app on your iPhone, or try the built-in speech rec on an Android device). Perhaps Apple’s Siri will improve this (I hope!). But until speech recognition gets to be very close to 100% accuracy, the best way to improve the user experience will be to show each word and sentence as you speak and let you correct as it goes without waiting for the back-and-forth to web servers.
Open source projects like CMU’s PocketSphynx seem to provide sophisticated client side mobile speech rec. My understanding is that modern mobile devices don’t have the resources (processing/memory) to allow client-side speech rec to get nearly the accuracy levels as you can on the server side. At least not yet.