Harvesting¶
Harvests are configured and stored in the sites/SITE-NAME/harvest folder.
Harvest Sources¶
The sources.json file in the harvest folder provides a list of harvest sources. The list uses the following format:
- id The human readable identifier for the harvest source
- source The source of the harvest. Can be a remote
http://orhttps://source or a local sourcefile://. - type The type of source. Currently DataJSON is the only option.
- filters Allows the filtering of sources by a key and value that will need to appear in each source document that is included in the harvest.
- exclude The opposite of filter.
- overrides Override a value in each doc.
- defaults Provides a default value only if that value is missing from each source doc.
Running Harvests¶
Caching¶
Harvests sources are first cached to local files before processing. This makes dealing with remote source timeout issues easier. Cached sources are stored in the harvest/SOURCE-NAME/SOURCE-TYPE folder. To run the cache type:
node cli.js harvest-cache SITE-NAME
Running¶
Once files are cached type the following to run a harvest:
node cli.js harvest-run SITE-NAME
Harvest sources are now stored in the site’s collections folder as site documents. The harvest source is added to the interra object in each doc.