login
Solutions

Web Data Extraction

The Data You Need is Out There

In today's information economy, it's commonplace for companies to rely on data from the Web for critical business applications — from monitoring competitive strategies, pricing, and market trends to staying abreast of regulatory changes and compliance requirements.

Collecting this information manually, however, can be time-intensive and expensive. A recent study by IDC suggested that information workers spend as much as half of their time searching and gathering information, at an annual cost of $26,700 per worker.

QL2 Delivers Your Data

As the market leader in web data extraction, QL2 offers a practical, cost-effective alternative to copying, pasting and reformatting information by hand. QL2 fully automates the process of extracting information from any website — even if the information you need is behind a subscription log-in or advanced search form.

Using QL2's flagship product, WebQL®, or any QL2 solution you can deploy intelligent agents — highly trained robotic software — to automatically fetch information you need from the Web. These intelligent agents navigate complex websites, log-in to subscription and password-protected sites, fill out forms and input specific criteria to generate dynamic web pages. In essence, intelligent agents can reach any content that an employee with a web browser can, but in an entirely automated fashion.

Supports Any Input Source or Output Format

Once the content of interest has been pinpointed, QL2 Software extracts the data regardless of format — Word documents, emails, spreadsheets, databases, PowerPoint files, HTML, images and PDFs. QL2 can also add structure to the data it collects and output the information in an actionable format such as a spreadsheet, database, or XML feed so you can sort, filter and query it with ease.

Smarter Than the Competition

Competing technologies often claim to have similar capabilities, but many fall short. These technologies are considered screen scrapers or macro recorders.

Screen scrapers rely on absolute positioning to locate specific content and can break easily with the slightest change to a website. While these tools might seem quick and inexpensive to deploy, they can end up costing a considerable amount to maintain and re-tool over time. WebQL's data extraction techniques are more sophisticated and adaptive making it less susceptible to breaking.

Other extraction tools work like a macro recorder, where the program observes a user's actions during a web session and creates a script to carry out the same steps automatically. This is fine for simple tasks, but it can be difficult or impossible to generalize or extend a macro to accomplish more complicated tasks. Needless to say, WebQL can perform those complex tasks. With its user-friendly graphical interface and SQL-like syntax, WebQL makes it easy to program sophisticated intelligent agents that can successfully complete even the most advanced web extraction missions.

Scalable

Some competitor products can only support a handful of agents per CPU, and process each fetch sequentially. This can result in an inordinate load on your computing resources and require a hardware investment. QL2's technology and solutions can handle high volumes with 20-50 intelligent agents per CPU and an unlimited number of simultaneous fetches.

Not only does QL2 conserve internal resources, it can also regulate its hit rate to avoid inordinately taxing the websites it visits. Being unobtrusive alleviates the stress on networks and on the sites you are visiting.

The Ability to Remain Anonymous

QL2 recognizes that remaining unnoticed to protect your identity can be especially important when conducting surveillance, auditing compliance, or monitoring competitors. That's why QL2 offers anonymization, so your page requests can't be tracked back to the source, and no one but you knows what data you're collecting. To learn more about anonymization, visit the Hosted Services section.


Recent research shows that knowledge workers spend more time re-creating existing information than they do turning out information that does not already exist. Some studies suggest that 90% of the time that knowledge workers spend in creating new report or other products is spent in recreating information that already exists.

— Special IDC Report:
"Enterprise Search Technology"