Transcript DEiXTo

DEiXTo

Powerful web data extraction tool    Freeware GUI tool (built with Turbo Delphi, Windows-only) Free, cross-platform Command Line Executor (in Perl) DEiXToBot agent (implemented in Perl)  W3C Document Object Model (DOM)  DOM-based extraction rules (wrappers).

 Extracted data can be exported to a wide variety of formats (tab delimited, XML, RSS, etc).

 Command Line Executor:  has database support via the Database independent interface for Perl  supports additional formats: Excel, CSV, OpenDocument Spreadsheet (.ods), HTML

GUI DEiXTo

user friendly graphical interface  enhanced, tree based, extraction rules  HTML tag filtering  fast, flexible and high performance tree pattern matching algorithm  regular expression support  can follow "Next Page" links and submit simple forms  can export results to XML and tab delimited formats and create RSS feeds  XML encoded wrapper project files (.wpf) that can be executed at will  last but not least, it's freeware!

DEiXTo Command Line Executor (CLE)

 portable, efficient and fast command line executor of GUI DEiXTo generated wrappers  provides options and flexibility that you cannot get with GUI DEiXTo  supports additional output formats such as CSV, Excel and OpenDocument Spreadsheet  provides database support via DBI (the Database independent interface for Perl)  supports HTML output using an HTML template processor and an editable template file  overwrite, append and prepend output modes for all supported formats  can be scheduled to execute wrappers automatically (e.g. using cron in GNU/Linux)  it is free and open source, distributed under the GNU General Public License (GPL) Version 3!

DEiXToBot

A Mechanize agent (essentially a browser emulator) capable of extracting data of interest.

 Flexible and efficient.

 Allows extensive customization.

 Supports multiple patterns on a single page and combination of their results.

 Allows post-processing of the extracted data and enables you to transform it to any format you wish.

 Programming skills required though to utilize it.

Corgialenios Library use case

From HTML unstructured data

To ESE format!

DEiXTo Services

We can definitely help you to:

 transform the contents of your digital library into OAI-PMH or another suitable format  quickly populate product catalogues with full specifications   search various web resources in real time and extract the results returned prepare large, focused datasets for scientific tasks (i.e. data mining)   monitor prices of the competition

Happy DEiXTo users!

For further information, please visit http://deixto.com