Harvesting¶
Harvests are configured and stored in the sites/SITE-NAME/harvest
folder.
Harvest Sources¶
The sources.json
file in the harvest
folder provides a list of harvest sources. The list uses the following format:
- id The human readable identifier for the harvest source
- source The source of the harvest. Can be a remote
http://
orhttps://
source or a local sourcefile://
. - type The type of source. Currently DataJSON is the only option.
- filters Allows the filtering of sources by a key and value that will need to appear in each source document that is included in the harvest.
- exclude The opposite of filter.
- overrides Override a value in each doc.
- defaults Provides a default value only if that value is missing from each source doc.
Running Harvests¶
Caching¶
Harvests sources are first cached to local files before processing. This makes dealing with remote source timeout issues easier. Cached sources are stored in the harvest/SOURCE-NAME/SOURCE-TYPE
folder. To run the cache type:
node cli.js harvest-cache SITE-NAME
Running¶
Once files are cached type the following to run a harvest:
node cli.js harvest-run SITE-NAME
Harvest sources are now stored in the site’s collections
folder as site documents. The harvest source is added to the interra
object in each doc.