Package 

Interface IndexingFilter

  • All Implemented Interfaces:
    ai.platon.pulsar.common.config.KConfigurable , ai.platon.pulsar.common.config.Parameterized , ai.platon.pulsar.crawl.common.JobInitialized

    
    public interface IndexingFilter
     implements Parameterized, JobInitialized, KConfigurable
                        

    Extension point for indexing. Permits one to add metadata to the indexed fields. All plugins found which implement this extension point are run sequentially on the parse.

    • Method Summary

      Modifier and Type Method Description
      abstract IndexDocument filter(IndexDocument doc, String url, WebPage page) Adds fields or otherwise modifies the document that will be indexed for a parse.
      abstract ImmutableConfig getConf()
      abstract Unit setConf(ImmutableConfig conf)
      • Methods inherited from class ai.platon.pulsar.crawl.index.IndexingFilter

        getParams, setup
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • filter

         abstract IndexDocument filter(IndexDocument doc, String url, WebPage page)

        Adds fields or otherwise modifies the document that will be indexed for a parse. Unwanted documents can be removed from indexing by returning a null value.

        Parameters:
        doc - document instance for collecting fields
        url - page url
      • getConf

         abstract ImmutableConfig getConf()