Technology

Frequently Asked Questions

 

Our content does not have an API. Will FUSE work?

What happens if a connector goes down?

FUSE’s numero uno choice to reach content is via API since we can get all of the nice juicy taxonomy and other metadata that way (which feeds our powerful filters). This isn’t always possible, however. As a last resort, we’ll crawl a content source just like Google does. Also called “scraping,” we avoid if we can because it often produces messy and inconsistent output that we then have to process/parse additionally.

No worries about authentication if the resource in question is behind a wall or a mixture of public/private - we can handle both scenarios no problem. Scraping/crawling makes sense sometimes when there’s a source you want to index but just aren’t interested in the metadata or there’s really no metadata to be had.

Monitoring is part of the ongoing service FUSE provides. A problem can occur, for example, if a webservice isn’t available or a vendor updates their API which breaks FUSE’s connection to your content.

First, no worries — your users won’t notice anything amiss. FUSE will continue to seamlessly use the latest successful index of that content. The index won’t have content that is new between the time of the connector outage and the search being conducted, however.

Next, we get alerted when this type of thing happens. Nine times out of ten, the problem fixes itself at the next indexing attempt. If something really is down, however, it’s up to the FUSE team to get it running again — the are no fees or development costs. This is part of the ongoing service you’re paying for.


How is performance of our website or our sources affected?

When FUSE first indexes your website or external content source, there will be a performance hit since we’re indexing potentially thousands of content items potentially going back several years. Performance is usually moot since it's a onetime event never to be seen again (unless for some reason a complete re-indexing is required again down the road). After the initial indexing, FUSE only indexes new items with no discernable performance penalty since it’s essentially the same as a web user requesting a webpage. We figure out what’s new content by performing a single item query at the next scheduled pull: Get most recent item from this content source. Does it match the latest item we already have in our index? If yes, stop. If no, get ten latest items. Now do we have them all? Rinse and repeat.

As for your website, there is no performance concern since we store your index in AWS and all of the computing power to run queries is happening there. A good analogy is YouTube. You can embed five hour long YouTube videos on a page on your site with no performance issues since the streaming/computing power comes from YouTube’s infrastructure.

We want to build our own feed for FUSE - what format should we use?

Baseline FUSE Search Fields-based Web Service Reader Requirements

  1. Service must be accessible via a single URL.

  2. Supported authentication mechanisms: IP-based, BASIC AUTH, URL-based (put the secret in the URL itself).

  3. URL must return valid XML or JSON format. JSON is preferred.

  4. The XML or JSON must contain an array of objects identifiable via a single parent.

  5. Each object in the array should represent a document and should have attributes such as link, last updated date-time, title, content (for indexing), pdf link reference, etc.

  6. All data must be returned in one request (no pagination - although we can handle pagination if we must).

Grab FUSE JSON and XML code examples here.


Can we use a development environment for testing?

Yes. Our embed code is platform agnostic - we'll send you the code to embed and you can put it anywhere you'd like. Our billing cycle will commence after two rounds of revision on the embed (once it’s ready). When you’re ready to go live, just place it on a page in your production environment.

Where is FUSE hosted?

Amazon Web Services (AWS). AWS runs a dominant third of all cloud computing with over $17 billion in revenue.


What's an uploader?

An uploader is a way for a user to launch Processors that operate on one or more files. Most FUSE clients do NOT use an uploader since we typically get to content using a combination of code and schedules. FUSE sets up an Uploader Preset by:

  1. associating it with an processor,

  2. giving permissions to users or teams to use it,

  3. overwriting any processor settings for when this uploader is used to launch it,

  4. adding descriptive language for users of the uploader.

A user then runs the uploader through FUSE Web by:

  1. opening the uploader tool,

  2. choosing the uploader Preset,

  3. uploading the file(s) they want to use,

  4. giving the upload job a descriptive name.

Keep in mind that uploaders launch processors that write series through Jobs. All data must be written through a job, and uploaders are not an exception.

How do schedules work?

Processors can be run manually, through Uploaders, or, most frequently, through Schedules. Each processor can have multiple schedules. Each Schedule specifies the frequency it should run using CRON syntax, as well as how it should run: as a Job or a Trigger. As the name suggests, Jobs immediately start a Job and try to import data. Triggers check if a Job should be started, and based on the logic in the code, may decide to start the Job or wait until the next time the Trigger runs.


How do you prevent version conflicts?

Series’ points or fields can only be changed by a job with the same or higher job ID than the previous one. This ensures that newer jobs always get preferential treatment in a conflict. Consider a case when two jobs that started seconds apart try to import the same 10 series, but the second one has a slightly more updated version of the data. FUSE guarantees that by the time both jobs are done, all 10 series will have the values set by the second job (the one with the higher job ID) even if the first job wrote the series last (because of slower processing, for example).

What's a processor?

A processor in an ETL-like concept highly optimized for getting data into FUSE. Processors extract data from external sources or from existing FUSE series, transform the data into the FUSE Series concept, and load it into FUSE. JSON-formatted objects store key things like URLs, passwords, tokens, etc. The processor code is written in Python and are generally 10 to 500 lines long, with the average around 150. Most processors use Schedules to run at the appropriate time while others are run manually or through Uploaders. Processors write series through Jobs.


What search engine is FUSE based on?

Licensed Elasticsearch. Elasticsearch is a search engine based on Lucene. Elasticsearch is the most popular enterprise search engine followed by Apache Solr, also based on Lucene.