-
public final class JsoupExtractor extends EntityOptions.Builder
Created by vincent on 16-9-14.
General parser, Css selector, XPath selector, Regex and Scent selectors are supported
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description public classJsoupExtractor.Companion
-
Field Summary
Fields Modifier and Type Field Description private FeaturedDocumentdocument
-
Constructor Summary
Constructors Constructor Description JsoupExtractor(WebPage page, ImmutableConfig conf)
-
Method Summary
Modifier and Type Method Description final FeaturedDocumentgetDocument()final UnitsetDocument(FeaturedDocument document)final FeaturedDocumentparse()final List<OpenMapFields>extractAll(EntityOptions options)Extract all fields using EntityOptions final List<OpenMapFields>extractAll()Extract all fields using EntityOptions final OpenMapFieldsextract(EntityOptions options)Parse entity final OpenMapFieldsextract()Parse entity final List<OpenMapFields>extract(CollectionOptions rules)Parse sub entity collection -
Methods inherited from class ai.platon.pulsar.crawl.parse.html.JsoupExtractor
as, build, c_css, c_css, c_css, c_item, c_name, c_re, c_re, c_re, c_root, c_xpath, c_xpath, css, css, css, cxpath, name, re, re, re, root, xpath, xpath, xpath -
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
-
Method Detail
-
getDocument
final FeaturedDocument getDocument()
-
setDocument
final Unit setDocument(FeaturedDocument document)
-
parse
final FeaturedDocument parse()
-
extractAll
@JvmOverloads() final List<OpenMapFields> extractAll(EntityOptions options)
Extract all fields using EntityOptions
-
extractAll
@JvmOverloads() final List<OpenMapFields> extractAll()
Extract all fields using EntityOptions
-
extract
@JvmOverloads() final OpenMapFields extract(EntityOptions options)
Parse entity
-
extract
@JvmOverloads() final OpenMapFields extract()
Parse entity
-
extract
final List<OpenMapFields> extract(CollectionOptions rules)
Parse sub entity collection
-
-
-
-