Web data extraction, applications and techniques: A survey

Type: Article

Publication Date: 2014-07-24

Citations: 357

DOI: https://doi.org/10.1016/j.knosys.2014.07.007

View Chat PDF

Locations

  • Knowledge-Based Systems - View
  • arXiv (Cornell University) - View - PDF
  • DataCite API - View

Similar Works

Action Title Year Authors
+ PDF Chat Dictionary-based methods for information extraction 2004 Andrea Baronchelli
Emanuele Caglioti
Vittorio Loreto
Elisabetta Pizzi
+ Web Content Extraction - a Meta-Analysis of its Past and Thoughts on its Future 2015 Tim Weninger
Rodrigo E. Palacios
Valter Crescenzi
Thomas Gottron
Paolo Merialdo
+ PDF Chat An analysis of structured data on the web 2012 Nilesh Dalvi
Ashwin Machanavajjhala
Bo Pang
+ An Analysis of Structured Data on the Web 2012 Nilesh Dalvi
Ashwin Machanavajjhala
Bo Pang
+ Web Data Knowledge Extraction 2016 Juan M. Tirado
Ovidiu Șerban
Qiang Guo
Eiko Yoneki
+ PDF Chat Mining user queries with information extraction methods and linked data 2018 Anne Chardonnens
Ettore Rizza
Mathias Coeckelbergs
Seth van Hooland
+ Mining User Queries with Information Extraction Methods and Linked Data 2017 Anne Chardonnens
Ettore Rizza
Mathias Coeckelbergs
Seth van Hooland
+ Mining User Queries with Information Extraction Methods and Linked Data 2017 Anne Chardonnens
Ettore Rizza
Mathias Coeckelbergs
Seth van Hooland
+ Web Mining Research: A Survey 2000 Raymond Kosala
Hendrik Blockeel
+ Design of Automatically Adaptable Web Wrappers 2011 Emilio Ferrara
Robert Baumgartner
+ Design of Automatically Adaptable Web Wrappers 2011 Emilio Ferrara
Robert Baumgartner
+ PDF Chat An Effective System for Multi-format Information Extraction 2021 Yaduo Liu
Longhui Zhang
Shujuan Yin
Xiaofeng Zhao
Feiliang Ren
+ WebIE: Faithful and Robust Information Extraction on the Web 2023 Chenxi Whitehouse
Clara Vania
Alham Fikri Aji
Christos Christodoulopoulos
Andrea Pierleoni
+ A Benchmark Suite for Template Detection and Content Extraction 2014 Julián Alarte
Josep Silva
+ A Survey on Open Information Extraction 2018 Christina Niklaus
Matthias Cetto
André Freitas
Siegfried Handschuh
+ A Survey on Open Information Extraction 2018 Christina Niklaus
Matthias Cetto
André Freitas
Siegfried Handschuh
+ A Survey on Open Information Extraction 2018 Christina Niklaus
Matthias Cetto
André Freitas
Siegfried Handschuh
+ PDF Chat A Semi-automatic Data Extraction System for Heterogeneous Data Sources: a Case Study from Cotton Industry 2021 Richi Nayak
Thirunavukarasu Balasubramaniam
Sangeetha Kutty
Sachindra Banduthilaka
Erin E. Peterson
+ A Benchmark Suite for Template Detection and Content Extraction. 2014 Julián Alarte
David Insa
Josep Silva
Salvador Tamarit
+ DATA:SEARCH'18 -- Searching Data on the Web 2018 Paul Groth
Laura Koesten
Philipp Mayr
Maarten de Rijke
Elena Simperl

Cited by (26)

Action Title Year Authors
+ PDF Chat A Survey on Data Collection for Machine Learning: A Big Data - AI Integration Perspective 2019 Yuji Roh
Geon Heo
Steven Euijong Whang
+ PDF Chat Measuring Social Spam and the Effect of Bots on Information Diffusion in Social Media 2018 Emilio Ferrara
+ Sneak into Devil's Colony- A study of Fake Profiles in Online Social Networks and the Cyber Law 2018 Mudasir Ahmad Wani
Suraiya Jabin
Ghulam Yazdani
Nehaluddin Ahmad
+ FreeDOM: A Transferable Neural Architecture for Structured Information Extraction on Web Documents 2020 Bill Yuchen Lin
Ying Sheng
Nguyen Vo
Sandeep Tata
+ PDF Chat A large-scale community structure analysis in Facebook 2012 Emilio Ferrara
+ A Survey on Data Collection for Machine Learning: a Big Data -- AI Integration Perspective 2018 Yuji Roh
Geon Heo
Steven Euijong Whang
+ ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages 2020 Colin Lockard
Prashant Shiralkar
Dong Xin
Hannaneh Hajishirzi
+ PDF Chat On extracting data from tables that are encoded using HTML 2019 Juan C. Roldán
Patricia Jiménez
Rafael Corchuelo
+ PDF Chat SpEnD: Linked Data SPARQL Endpoints Discovery Using Search Engines 2017 Semih Yumuşak
Erdoğan Doğdu
Halife Kodaz
Andreas Kamilaris
Pierre-Yves Vandenbussche
+ PDF Chat Intelligent Self-repairable Web Wrappers 2011 Emilio Ferrara
Robert Baumgartner
+ Design of iMacros-based Data Crawler and the Behavioral Analysis of Facebook Users 2018 Mudasir Ahmad Wani
Nancy Agarwal
Suraiya Jabin
Syed Zeeshan Hussai
+ PDF Chat Unlocking Social Media and User Generated Content as a Data Source for Knowledge Management 2019 James Meneghello
Nik Thompson
Kevin Lee
Kok Wai Wong
Bilal Abu-Salih
+ PDF Chat ZeroShotCeres: Zero-Shot Relation Extraction from Semi-Structured Webpages 2020 Colin Lockard
Prashant Shiralkar
Dong Xin
Hannaneh Hajishirzi
+ PDF Chat GROWN+UP 2022 Benedict Yeoh
Huijuan Wang
+ PDF Chat Crawling Facebook for social network analysis purposes 2011 Salvatore Catanese
Pasquale De Meo
Emilio Ferrara
Giacomo Fiumara
Alessandro Provetti
+ PDF Chat Knowledge Graphs 2021 Aidan Hogan
Eva Blomqvist
Michael Cochez
Claudia d’Amato
Gerard de Melo
Claudio Gutiérrez
Sabrina Kirrane
José Emilio Labra Gayo
Roberto Navigli
Sebastian Neumaier
+ Design and Implementation of iMacros-based Data Crawler for Behavioral Analysis of Facebook Users. 2018 Mudasir Ahmad Wani
Nancy Agarwal
Suraiya Jabin
Syed Zeeshan Hussain
+ Automatically Extracting Web API Specifications from HTML Documentation 2018 Jinqiu Yang
Erik Wittern
Annie T. T. Ying
Julian Dolby
Lin Tan
+ Wextractor: Follow-up of the evolution of prices in web pages 2017 Jorge Lloret-Gazo
+ Unlocking Analytical Value from Social Media and User Generated Content. 2019 James Meneghello
Nik Thompson
Kevin Lee
Kok Wai Wong
Bilal Abu-Salih
+ Unlocking Social Media and User Generated Content as a Data Source for Knowledge Management 2019 James Meneghello
Nik Thompson
Kevin Lee
Kok Wai Wong
Bilal Abu-Salih
+ Machine Knowledge: Creation and Curation of Comprehensive Knowledge Bases 2021 Gerhard Weikum
Xin Dong
Simon Razniewski
Fabian M. Suchanek
+ PDF Chat Automatic Wrapper Adaptation by Tree Edit Distance Matching 2011 Emilio Ferrara
Robert Baumgartner
+ Analyzing the Facebook Friendship Graph 2010 Salvatore Catanese
Pasquale De Meo
Emilio Ferrara
Giacomo Fiumara
+ Design of Automatically Adaptable Web Wrappers 2011 Emilio Ferrara
Robert Baumgartner
+ Service Wrapper: a system for converting web data into web services 2019 Naibo Wang
Zhiling Luo
Xiya Lyu
Zitong Yang
Jianwei Yin

Citing (18)

Action Title Year Authors
+ Analyzing the Facebook Friendship Graph 2010 Salvatore Catanese
Pasquale De Meo
Emilio Ferrara
Giacomo Fiumara
+ PDF Chat How Unique and Traceable Are Usernames? 2011 Daniele Perito
Claude Castelluccia
Mohamed Ali Kâafar
Pere Manils
+ PDF Chat Traveling trends: social butterflies or frequent fliers? 2013 Emilio Ferrara
Onur Varol
Filippo Menczer
Alessandro Flammini
+ PDF Chat The Geospatial Characteristics of a Social Movement Communication Network 2013 Michael Conover
Clayton A. Davis
Emilio Ferrara
Karissa McKelvey
Filippo Menczer
Alessandro Flammini
+ PDF Chat Monadic datalog and the expressive power of languages for Web information extraction 2004 Georg Gottlob
Christoph Koch
+ PDF Chat Automatic wrappers for large scale web extraction 2011 Nilesh Dalvi
Ravi Kumar
Mohamed A. Soliman
+ PDF Chat The power of a good idea: Quantitative modeling of the spread of ideas from epidemiological models 2005 Luís M. A. Bettencourt
Ariel Cintrón-Arias
David Kaiser
Carlos Castillo‐Chávez
+ PDF Chat Folks in Folksonomies: social link prediction from shared metadata 2010 Rossano Schifanella
Alain Barrat
Ciro Cattuto
Benjamin Markines
Filippo Menczer
+ PDF Chat A large-scale community structure analysis in Facebook 2012 Emilio Ferrara
+ A new statistical parser based on bigram lexical dependencies 1996 Michael Collins
+ PDF Chat Machine learning in automated text categorization 2002 Fabrizio Sebastiani
+ PDF Chat Automatic Wrapper Adaptation by Tree Edit Distance Matching 2011 Emilio Ferrara
Robert Baumgartner
+ PDF Chat Crawling Facebook for social network analysis purposes 2011 Salvatore Catanese
Pasquale De Meo
Emilio Ferrara
Giacomo Fiumara
Alessandro Provetti
+ PDF Chat The Structure and Function of Complex Networks 2003 Michael Newman
+ PDF Chat The Directed Closure Process in Hybrid Social-Information Networks, with an Analysis of Link Formation on Twitter 2010 Daniel M. Romero
Jon Kleinberg
+ PDF Chat Analyzing user behavior across social sharing environments 2013 Pasquale De Meo
Emilio Ferrara
Fabian Abel
Lora Aroyo
Geert‐Jan Houben
+ PDF Chat Intelligent Self-repairable Web Wrappers 2011 Emilio Ferrara
Robert Baumgartner
+ PDF Chat Clustering memes in social media 2013 Emilio Ferrara
Mohsen JafariAsbagh
Onur Varol
Vahed Qazvinian
Filippo Menczer
Alessandro Flammini