Always record at the highest bit depth you can, so you have the best dynamic range to work with. After you've mixed and gotten the levels right, then you can convert to 16-bit.
Dithering is used when converting to a lower bit depth. It decreases nasty things like distortion by increasing the overall noise a little bit.
As for sampling frequency, I'd record at 88.2. It's not a matter of recording what humans can hear; 44.1 is plenty for that. Instead, it's a matter of avoiding aliasing. All converters have some amount of aliasing. Recording at a higher sampling rate ensures that the aliasing stays way up in the ultrasound, and then you can get rid of the ultrasound with a really good digital filter on the computer instead of the reasonably good digital filter in the oversampling ADC.
And 88.2 instead of 96 because 88.2 is a perfect multiple of 44.1. To downsample, you just do the digital filter and then drop half the samples. To convert 96 to 44.1, you need to, uh... do more than that. I'm sure they don't literally oversample by 147x and then downsample by 320x, but that's effectively what the algorithm is doing. It's more complicated than 88.2, so the processing will take longer, even if the converter is designed well enough to get the same output quality, so I see no benefit to 96.