public class HTMLPageParser extends java.lang.Object implements PageParser
Builds an HTMLPage object from an HTML document. This behaves similarly to the FastPageParser, however it's a complete rewrite that is simpler to add custom features to such as extraction and transformation of elements.
To customize the rules used, this class can be extended and have the userDefinedRules() methods overridden.
HTMLProcessor
Constructor and Description |
---|
HTMLPageParser() |
Modifier and Type | Method and Description |
---|---|
protected void |
addUserDefinedRules(State html,
PageBuilder page) |
Page |
parse(char[] buffer)
Parse the given buffer into a Page object.
|
Page |
parse(SitemeshBuffer buffer)
Parse the given buffer into a page object.
|
public Page parse(char[] buffer) throws java.io.IOException
PageParser
parse
in interface PageParser
buffer
- The buffer for the page.java.io.IOException
- if an error occurspublic Page parse(SitemeshBuffer buffer) throws java.io.IOException
PageParser
DefaultSitemeshBuffer
is the appropriate implementation of
this interface to pass in.parse
in interface PageParser
buffer
- The buffer for the page.java.io.IOException
- if an error occursprotected void addUserDefinedRules(State html, PageBuilder page)