Cleaning harvested URLs

Hello,

I'm sure there would be an option but I'm not sure what or how it would be done. As we harvest many url, and the process is that we eliminate duplicates.

Then I want to delete the url with certain words like;

Youtube.
wiki
cnn
bbc

So, what I want is maybe to create a file or find a word on the blacklist and edit it, put those words in it and delete them, but those URLs still remain, so maybe there is something wrong with how I do it.

It would also be nice to know if they could guide me in the way I can harvest so that these url containing those stop words are not harvested.

Thanks again