-
- All Implemented Interfaces:
-
ai.platon.pulsar.common.config.Parameterized,ai.platon.pulsar.crawl.common.JobInitialized
public final class GenerateComponent implements Parameterized, JobInitialized
Parser checker, useful for testing parser. It also accurately reports possible fetching and parsing failures and presents protocol status signals to aid debugging. The tool enables us to retrieve the following data from any
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public classGenerateComponent.Companion
-
Field Summary
Fields Modifier and Type Field Description private final LoggerLOGprivate final CrawlFilterscrawlFiltersprivate final WebDbwebDbprivate final CrawlUrlFiltersurlFiltersprivate final CrawlUrlNormalizersurlNormalizersprivate final FetchSchedulefetchScheduleprivate final MiscMessageWritermessageWriterprivate final ImmutableConfigconf
-
Constructor Summary
Constructors Constructor Description GenerateComponent(CrawlFilters crawlFilters, WebDb webDb, CrawlUrlFilters urlFilters, CrawlUrlNormalizers urlNormalizers, FetchSchedule fetchSchedule, MiscMessageWriter messageWriter, ImmutableConfig conf)
-
Method Summary
Modifier and Type Method Description final LoggergetLOG()final CrawlFiltersgetCrawlFilters()final WebDbgetWebDb()final CrawlUrlFiltersgetUrlFilters()final CrawlUrlNormalizersgetUrlNormalizers()final FetchSchedulegetFetchSchedule()final MiscMessageWritergetMessageWriter()final ImmutableConfiggetConf()Unitsetup(ImmutableConfig jobConf)ParamsgetParams()final Pair<Boolean, String>shouldFetch(String url, String reversedUrl, WebPage page)TODO : We may move some filters to hbase query filters directly TODO : Move to CrawlFilter -
-
Constructor Detail
-
GenerateComponent
GenerateComponent(CrawlFilters crawlFilters, WebDb webDb, CrawlUrlFilters urlFilters, CrawlUrlNormalizers urlNormalizers, FetchSchedule fetchSchedule, MiscMessageWriter messageWriter, ImmutableConfig conf)
-
-
Method Detail
-
getLOG
final Logger getLOG()
-
getCrawlFilters
final CrawlFilters getCrawlFilters()
-
getWebDb
final WebDb getWebDb()
-
getUrlFilters
final CrawlUrlFilters getUrlFilters()
-
getUrlNormalizers
final CrawlUrlNormalizers getUrlNormalizers()
-
getFetchSchedule
final FetchSchedule getFetchSchedule()
-
getMessageWriter
final MiscMessageWriter getMessageWriter()
-
getConf
final ImmutableConfig getConf()
-
getParams
Params getParams()
-
-
-
-