Ask a Question

Prefer a chat interface with context about you and your work?

Automatic wrappers for large scale web extraction

Automatic wrappers for large scale web extraction

We present a generic framework to make wrapper induction algorithms tolerant to noise in the training data. This enables us to learn wrappers in a completely unsupervised manner from automatically and cheaply obtained noisy training data, e.g., using dictionaries and regular expressions. By removing the site-level supervision that wrapper-based techniques …