Package 

Class IndexDocument.Builder

    • Method Summary

      Modifier and Type Method Description
      final IndexDocument.Builder with(IndexingFilters indexingFilters)
      final IndexDocument.Builder with(ScoringFilters scoringFilters)
      final IndexDocument build(String key, WebPage page) Index a WebPage, here we add the following fields:
      • <tt>id</tt>: default uniqueKey for the IndexDocument.

      • <tt>digest</tt>: Digest is used to identify pages (like unique ID) and is used to remove duplicates during the dedup procedure. It is calculated

      • <tt>batchId</tt>: The page belongs to a unique batchId, this is its identifier.

      • <tt>boost</tt>: Boost is used to calculate document (field) score which can be used within queries submitted to the underlying indexing library to find the best results. It's part of the scoring algorithms. See scoring.link, scoring.opic, scoring.tld, etc.

      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • IndexDocument.Builder

        IndexDocument.Builder(ImmutableConfig conf)
    • Method Detail

      • build

         final IndexDocument build(String key, WebPage page)

        Index a WebPage, here we add the following fields:

        • <tt>id</tt>: default uniqueKey for the IndexDocument.

        • <tt>digest</tt>: Digest is used to identify pages (like unique ID) and is used to remove duplicates during the dedup procedure. It is calculated

        • <tt>batchId</tt>: The page belongs to a unique batchId, this is its identifier.

        • <tt>boost</tt>: Boost is used to calculate document (field) score which can be used within queries submitted to the underlying indexing library to find the best results. It's part of the scoring algorithms. See scoring.link, scoring.opic, scoring.tld, etc.

        Parameters:
        key - The key of the page (reversed url).
        page - The WebPage.