What statistical method should I be using?

I have 47,000 sound data files from a hydrophone recorded over a period of months, some of which contain boat noises. I have an algorithm in matlab that gives me a list of files which might be boat noises. I listen to these files and record which ones really are. Then I have listened to 2% of the total number of files and recorded how many them are boats. I would like to know how I can use the results of listening to the random files to calculate how many boats I have missed in the rest of the files that I haven't listened to.

2010-10-10T16:15:04Z

We want to find how frequently boats use a particular area.

2010-10-11T02:58:44Z

As an example, lets say my matlab analysis routine finds 50 boats in the 47,000 files, I then listen to 2% of the files selected at random and find 3 more boats. There must be a way to calculate how many more boats I would find if I listened to the rest of the files. I'm thinking along the lines of capture/recapture.

R G2010-10-11T02:04:03Z

Favorite Answer

hum...whats the typical noise of a boat? just some noisy frequency( boat motor)? convert the noise to frequency and try to find some picks or function that fits close to it (remember that the motor can acelerate). than compare all your files to it and if the difference is, lets say more 10%, you consider that its not a boat.

Note that im supposing that the noise you get is the motor....

Anonymous2010-10-10T22:26:19Z

why were you recording boat noises in the first place?