Harvests are configured and stored in the
sources.json file in the
harvest folder provides a list of harvest sources. The list uses the following format:
- id The human readable identifier for the harvest source
- source The source of the harvest. Can be a remote
https://source or a local source
- type The type of source. Currently DataJSON is the only option.
- filters Allows the filtering of sources by a key and value that will need to appear in each source document that is included in the harvest.
- exclude The opposite of filter.
- overrides Override a value in each doc.
- defaults Provides a default value only if that value is missing from each source doc.
Harvests sources are first cached to local files before processing. This makes dealing with remote source timeout issues easier. Cached sources are stored in the
harvest/SOURCE-NAME/SOURCE-TYPE folder. To run the cache type:
node cli.js harvest-cache SITE-NAME
Once files are cached type the following to run a harvest:
node cli.js harvest-run SITE-NAME
Harvest sources are now stored in the site’s
collections folder as site documents. The harvest source is added to the
interra object in each doc.