The Criteria is one of the components that make up a Workflow. It specifies exactly what data you want to be collected from web pages.
A Criteria has two parts:
The Page Validator is an English description that tells your Workflow whether a page it encounters is relevant to your search.
Each Field of a Criteria describes the specific type of information you want to extract from web pages. It consists of a Name and Description.
The Name of a field is how this field will be referred to when data is collected.
The Description is a simple one-sentence English explanation of what data your Workflow should search for.
Some fields can optionally be marked as Unique Identifiers”. This is used to combine duplicated data. For example, if I had a field called “name”, and I marked it as a unique identifier, then any data which had the same value for the name field and doesn’t have any other conflicts will be merged. An important note is that if you add any unique identifiers, any data that does not populate at least one unique identifier will be discarded.
Helpful Tip
When creating your Fields, think of a spreadsheet containing the data output. The Name part of the Field will be the column header, and the Description will be the type of data that would populate that column.
Looking for information about restaurants in Edinburgh from a set of tourist websites.
Restaurant in Edinburgh
Now let’s say you are looking for the names of the restaurant and their descriptions:
Name:
Restaurant name
Description:
What is this restaurant called?
Name:
Restaurant type
Description:
What is a one-sentence summary of this restaurant?
Helpful Tip
Once your Criteria is created, you can reuse it when making new Workflows. This saves time if you want to collect the same types of information from different websites.