Package 

Class RobotRulesParser

  • All Implemented Interfaces:
    ai.platon.pulsar.common.config.Configurable

    
    public abstract class RobotRulesParser
     implements Configurable
                        

    This class uses crawler-commons for handling the parsing of robots.txt files. It emits SimpleRobotRules objects, which describe the download permissions as described in SimpleRobotRulesParser.

    • Method Summary

      Modifier and Type Method Description
      ImmutableConfig getConf() Get the Configuration object
      Unit setConf(ImmutableConfig jobConf) Set the Configuration object
      final BaseRobotRules parseRules(String url, ByteArray content, String contentType, String robotName) Parses the robots content using the SimpleRobotRulesParser from crawler commons
      final BaseRobotRules getRobotRulesSet(Protocol protocol, String url)
      abstract BaseRobotRules getRobotRulesSet(Protocol protocol, URL url)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

      • RobotRulesParser

        RobotRulesParser()
      • RobotRulesParser

        RobotRulesParser(ImmutableConfig conf)
    • Method Detail

      • getConf

        @NotNull() ImmutableConfig getConf()

        Get the Configuration object

      • setConf

         Unit setConf(ImmutableConfig jobConf)

        Set the Configuration object

      • parseRules

        @NotNull() final BaseRobotRules parseRules(String url, ByteArray content, String contentType, String robotName)

        Parses the robots content using the SimpleRobotRulesParser from crawler commons

        Parameters:
        url - A string containing url
        content - Contents of the robots file in a byte array
        contentType - The content type of the robots file
        robotName - A string containing all the robots agent names used by parser for matching