What statistical method should I be using?
I have 47,000 sound data files from a hydrophone recorded over a period of months, some of which contain boat noises. I have an algorithm in matlab that gives me a list of files which might be boat noises. I listen to these files and record which ones really are. Then I have listened to 2% of the total number of files and recorded how many them are boats. I would like to know how I can use the results of listening to the random files to calculate how many boats I have missed in the rest of the files that I haven't listened to.
We want to find how frequently boats use a particular area.
As an example, lets say my matlab analysis routine finds 50 boats in the 47,000 files, I then listen to 2% of the files selected at random and find 3 more boats. There must be a way to calculate how many more boats I would find if I listened to the rest of the files. I'm thinking along the lines of capture/recapture.