node.js – import large documents in the elasticsearch?

I'm new to the elastic and I got to doubts, if I want to make the same index pages of Google, let's suppose that it is pure html no image … only a great ex content. wikepidia site ..

Do I have to treat the data and put in the elastic? If yes, do you have any tool that helps in this?

If you do not have to try to import the data with a crawler … my document inside the elastic would not be too big …

Any light helps, grateful …