-
- All Implemented Interfaces:
-
ai.platon.pulsar.common.config.Configurable
public abstract class RobotRulesParser implements ConfigurableThis class uses crawler-commons for handling the parsing of
robots.txtfiles. It emits SimpleRobotRules objects, which describe the download permissions as described in SimpleRobotRulesParser.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public classRobotRulesParser.Companion
-
Constructor Summary
Constructors Constructor Description RobotRulesParser()RobotRulesParser(ImmutableConfig conf)
-
Method Summary
Modifier and Type Method Description ImmutableConfiggetConf()Get the Configuration object UnitsetConf(ImmutableConfig jobConf)Set the Configuration object final BaseRobotRulesparseRules(String url, ByteArray content, String contentType, String robotName)Parses the robots content using the SimpleRobotRulesParser from crawler commons final BaseRobotRulesgetRobotRulesSet(Protocol protocol, String url)abstract BaseRobotRulesgetRobotRulesSet(Protocol protocol, URL url)-
-
Method Detail
-
getConf
@NotNull() ImmutableConfig getConf()
Get the Configuration object
-
parseRules
@NotNull() final BaseRobotRules parseRules(String url, ByteArray content, String contentType, String robotName)
Parses the robots content using the SimpleRobotRulesParser from crawler commons
- Parameters:
url- A string containing urlcontent- Contents of the robots file in a byte arraycontentType- The content type of the robots filerobotName- A string containing all the robots agent names used by parser for matching
-
getRobotRulesSet
@NotNull() final BaseRobotRules getRobotRulesSet(Protocol protocol, String url)
-
getRobotRulesSet
@NotNull() abstract BaseRobotRules getRobotRulesSet(Protocol protocol, URL url)
-
-
-
-