FUSE’s numero uno choice to reach content is via API since we can get all of the nice juicy taxonomy and other metadata that way (which feeds our powerful filters). This isn’t always possible, however. As a last resort, we’ll crawl a content source just like Google does. Also called “scraping,” we avoid if we can because it often produces messy and inconsistent output that we then have to process/parse additionally.
No worries about authentication if the resource in question is behind a wall or a mixture of public/private - we can handle both scenarios no problem. Scraping/crawling makes sense sometimes when there’s a source you want to index but just aren’t interested in the metadata or there’s really no metadata to be had.