"I was told to use a sound file that is a click followed by a sine sweep, followed by another click."
Hmmm... Who told you that? If you record the response to a click, then that is the impulse response. It's literally the response of a room to an impulsive noise, like a click. No deconvolution or other processing necessary. You then convolve your music with the impulse response, and it will sound as if you had played the music in the space instead of the click.
You probably want to use a dedicated clicker, though, rather than a speaker. You want the sound as close to an ideal impulse as possible, not filtered through a speaker's frequency and phase response. You want it to sound as if your performer is actually in the cathedral, not like you're playing a boombox of their CD inside a cathedral. You also want it as loud as possible (without clipping), so that the signal-to-noise ratio of your recording is high.
You can also derive the impulse response from a sine sweep or maximum-length sequence or other signal by deconvolving first. This improves the signal to noise ratio, but (ideally) it's going to produce the exact same thing as the straight impulse response. Practically, one method might produce better results than the other. See Wikipedia
And remember you can record from two microphones at once to get a stereo image of the response.