Extract Data

Estimated reading: 7 minutes 9 views

This activity helps the user extract data from web applications that have a similar layout and structure across multiple pages.

Pre-requisites

This feature will be available from the designer version (25.5.0.36) and activity version (2.3.8)

Limitations

1. When extracting data from multiple pages, the activity will prompt you to spy on the ‘Next’ button element on the webpage to dynamically extract data from multiple pages. This button should have a static XPath; currently, dynamic XPath is not supported.
2. The surrounding elements are not supported, meaning the extraction is focused solely on the specified elements within the defined hierarchy and pattern.
3. This activity does not support lazy loading functionality, which means that it does not handle the automatic loading of content as the user scrolls down a webpage.
4. This activity functions similarly to “HTMLTabletoDatatable” where the table is not structured properly.

How does extraction process happens?

Recognizing Patterns: The process involves identifying specific patterns or structures in the HTML code of a webpage. These patterns are consistent across similar pages, making data extraction efficient.

Example: Extracting Product Data: Consider extracting product details from an e-commerce site. The activity uses RobilitySpy to spot patterns like:

  • Product Titles: Found in heading tags like <h1> or <h2>.
  • Prices: Located in designated price elements such as <span> or <div>.
  • Descriptions: Often within <p> tags or other identifiable elements.

Setting Up Data Extraction: Configure the “Extract Data” activity to target these elements. For example, create a pattern to capture <h1> tags for product titles, matching the HTML structure.

Data Retrieval: Once patterns are set, the activity retrieves data and stores it in a DataTable variable for further use.

Properties

INPUT

BrowserType: *Gets auto filled, once the element is indicated on the web page using Robility Spy. Here the browser type will be displayed.

BrowserVersion: *Gets auto filled, once the element is indicated on the web page using Robility Spy. Indicates the version of the browser in use.

DelayAfter: It helps the user to add a delay to start the execution of the further activity. The format of the delay here is milliseconds. By default, it will be set to 300. When the option is left blank, no delay is considered. 

DelayBefore: It helps the user to add a delay before the execution of the activities. The format of the delay here is milliseconds. By default, it will be set to 200. When the option is left blank, no delay is considered.

ExecuteBy:* Gets auto filled, once the element is indicated on the web page using Robility Spy. This contains the set of attributes for the specific spied element.

FramePath: *Gets auto filled, once the element is indicated on the web page using Robility Spy. The frame path is auto filled only if the selected element has a frame ID in the webpage.

URL: Gets auto filled, once the element is indicated on the web page using Robility Spy. Indicates the URL in which this activity is performed. 

WaitForReady: Runs the activity once the webpage loading matches the wait for ready state.
None – It will perform the activity functionality without checking the state of the browser.
Interactive– The activity will be performed once the web element is found even if the webpage is still in loading state.
Complete– The activity will proceed to the next step only if the web page is completely loaded.

WaitTime: It helps the user to add a delay to start the execution of the further activity. The format of the delay here is milliseconds. By default, it will be set to 30000. When the option is left blank, no delay is considered.

InputNextElement

DelayBetweenPages: It helps the user to add a delay between each page during the extraction of the data. The format of the delay here is milliseconds. By default, it will be set to 1000. When the option is left blank, no delay is considered.

NextLinkExecuteBy: Indicates the set of attributes of the “Next” button element detected on the webpage. It gets auto filled once the element is saved on the Robility Spy window.

When extracting data from multiple pages, the activity will prompt you to spy on the ‘Next’ button element on the webpage to dynamically extract data from multiple pages. It is optional to extract the pages from the subsequent pages. (Refer the use case).

MISC

Display Name: Displays the name of the activity. The activity name can be customized which will help in troubleshooting.

IsTable: This parameter will be enabled when the detected element is a table element. It gets auto filled from the extraction wizard and specifies the value either as True or False.

True: Indicates that the extracted element includes table element.
False: Indicates that the extracted element does not include table element.

By default, the value is set to “False”.

SkipOnError: Specify the Boolean value as “True or False.”
True: Continues to execute the workflow irrespective of any error thrown.
False: Stops the workflow if it throws any error.
None: If the option is specified as blank, by default the activity will perform the “False” action.

Version: It specifies the version of the web automation feature in use.

OPTIONS

Limit to extract: Specifies to choose the limit to extract on the webpage. Choose the options from the dropdown. 
Max Rows: Extracts maximum rows available on the webpage. 
Max Pages: Extracts the data from all the pages upto its maximum.  

Number of Items: Specifies to provide the maximum number of items from the page and it accepts values in “Int32” datatype. 

TableStructure: *Specifies the format of the table resulting from data extraction on the web application. It comprises two columns, one for the extracted text and another for the corresponding URL. By default, the “Extract Text” option is selected and mandatory, while the “Extract URL” option is optional.

OUTPUT

Datatable: *It helps to view the output of the activity as the extracted data from the website in a table format. You can customize the structure of this table output either in the “Extraction Wizard” or through the “TableStructure” property. It returns value in “Datatable” datatype. 

Result: It helps to view the execution state of the activity. It returns the values in Boolean format.
True: It indicates that the activity has been executed successfully without any error.
False: It indicates that the activity has been unsuccessful due to some unexceptional error thrown.

* Represents mandatory field to execute the workflow.

Share this Doc

Extract Data

Or copy link

CONTENTS