-
- All Implemented Interfaces:
-
ai.platon.pulsar.common.config.Parameterized
public interface ScoringFilter implements ParameterizedA contract defining behavior of scoring plugins.
A scoring filter will manipulate scoring variables in CrawlDatum and in resulting search indexes. Filters can be chained in a specific order, to provide multi-stage scoring adjustments.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public classScoringFilter.Companion
-
Method Summary
Modifier and Type Method Description UnitinjectedScore(WebPage page)Set an initial score for newly injected pages. UnitinitialScore(WebPage page)Set an initial score for newly discovered pages. ScoreVectorgeneratorSortValue(WebPage page, ScoreVector initSort)This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation. UnitdistributeScoreToOutlinks(WebPage page, WebGraph graph, Collection<WebEdge> outgoingEdges, Integer allCount)Distribute score value from the current page to all its outlinked pages. UnitupdateScore(WebPage page, WebGraph graph, Collection<WebEdge> incomingEdges)This method calculates a new score during table update, based on the values contributed by inlinked pages. UnitupdateContentScore(WebPage page)FloatindexerScore(String url, IndexDocument doc, WebPage page, Float initScore)This method calculates a Lucene document boost. -
-
Method Detail
-
injectedScore
Unit injectedScore(WebPage page)
Set an initial score for newly injected pages. Note: newly injected pages may have no inlinks, so filter implementations may wish to set this score to a non-zero value, to give newly injected pages some initial credit.
- Parameters:
page- new page.
-
initialScore
Unit initialScore(WebPage page)
Set an initial score for newly discovered pages. Note: newly discovered pages have at least one inlink with its score contribution, so filter implementations may choose to set initial score to zero (unknown value), and then the inlink score contribution will set the "real" value of the new page.
- Parameters:
page- page row.
-
generatorSortValue
ScoreVector generatorSortValue(WebPage page, ScoreVector initSort)
This method prepares a sort value for the purpose of sorting and selecting top N scoring pages during fetchlist generation.
- Parameters:
page- page row.initSort- initial sort value, or a value from previous filters in chain
-
distributeScoreToOutlinks
Unit distributeScoreToOutlinks(WebPage page, WebGraph graph, Collection<WebEdge> outgoingEdges, Integer allCount)
Distribute score value from the current page to all its outlinked pages.
- Parameters:
page- page rowallCount- number of all collected outlinks from the source page
-
updateScore
Unit updateScore(WebPage page, WebGraph graph, Collection<WebEdge> incomingEdges)
This method calculates a new score during table update, based on the values contributed by inlinked pages.
- Parameters:
page- page row
-
updateContentScore
Unit updateContentScore(WebPage page)
-
indexerScore
Float indexerScore(String url, IndexDocument doc, WebPage page, Float initScore)
This method calculates a Lucene document boost.
- Parameters:
url- url of the pagedoc- document.page- page rowinitScore- initial boost value for the Lucene document.
-
-
-
-