-
- All Implemented Interfaces:
-
ai.platon.pulsar.common.config.Configurable
public class HttpRobotRulesParser extends RobotRulesParser
This class is used for parsing robots for urls belonging to HTTP protocol. It extends the generic RobotRulesParser class and contains Http protocol specific implementation for obtaining the robots file.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public classHttpRobotRulesParser.Companion
-
Constructor Summary
Constructors Constructor Description HttpRobotRulesParser(ImmutableConfig conf)
-
Method Summary
Modifier and Type Method Description BaseRobotRulesgetRobotRulesSet(Protocol protocol, URL url)Get the rules from robots. -
-
Method Detail
-
getRobotRulesSet
BaseRobotRules getRobotRulesSet(Protocol protocol, URL url)
Get the rules from robots.txt which applies for the given
url. Robot rules are cached for a unique combination of host, protocol, and port. If no rules are found in the cache, a HTTP request is send to fetch {{protocol://host:port/robots.txt}}. The robots.txt is then parsed and the rules are cached to avoid re-fetching and re-parsing it again.- Parameters:
protocol- The Protocol objecturl- URL robots.
-
-
-
-