Sheets Workflows with rtrvr.ai
rtrvr.ai's "Sheets Workflow" feature provides a powerful way to automate complex tasks directly from a Google Sheet. By feeding it a sheet with a column of URLs or search queries, you can instruct rtrvr.ai to perform a series of actions on each row, extract data, generate content, and update the sheet with the results. This capability enables you to create sophisticated, multi-step workflows that can be visualized as a Directed Acyclic Graph (DAG).
Core Concepts
Here's a breakdown of the key concepts behind Sheets Workflows:
- →Input Column: A designated column in your Google Sheet containing either URLs or search queries that rtrvr.ai will process row by row.
- →Predefined Workflow Types: Choose from built-in templates like 'Profiles', 'Ecommerce', 'Credit Card', 'Property Listing' for common data extraction tasks.
- →Custom Workflow: Define your own data extraction schema using natural language to fit your specific needs.
- →Search Workflow: Instead of URLs, provide Google search queries and define actions to be performed on the search results (e.g., "Search the given name and click on the first search result and extract name, education, experience").
- →Multi-Step Workflows: Create complex workflows by adding multiple steps, each potentially operating on a different sheet or leveraging output from prior steps.
- →Context Columns: Optionally add context from other columns in the same sheet to inform rtrvr.ai's actions for each row.
- →Context Recording: Ground the workflow with prior action recordings to guide rtrvr.ai's behavior on each opened page.
Configuration Options
Customize your Sheets Workflows with these powerful configuration options:
- →First row has column headers: Choose whether to write the generated schema headers to the first row of the output sheet.
- →Delete tabs after execution: Automatically close the tabs opened by rtrvr.ai after the workflow completes.
- →Max # of Tabs to Open in Parallel: Set the maximum number of tabs in Settings Panel that rtrvr.ai will open simultaneously. URLs from the input column will be processed in batches of this size. For example, with a max of 20 parallel tabs and 100 URLs, rtrvr.ai will open 5 batches of 20 tabs each.
- →RegEx URL Substitution: Modify the URLs in the input column using regular expressions (e.g., prefix all URLs with "archive.is/").
- →Context Columns: Select additional columns from the sheet to provide context for each row's processing.
- →Context Recording: Use a prior recording to guide rtrvr.ai's actions within the workflow.
- →Prior Step Output (Context): In multi-step workflows, use the output of a previous step as context for subsequent steps (must use the same sheet for this).
Example Use Cases
- →LinkedIn Profile Scraping:
Input a column of LinkedIn profile URLs. Select the 'Profiles' workflow type. rtrvr.ai will open each profile, extract relevant data (name, job title, experience, etc.), and populate the sheet with the results.
- →Automated Email Outreach:
Combine profile scraping with content generation. After extracting data from LinkedIn profiles, add a step to generate personalized intro emails based on the extracted information and write them back into the sheet.
- →E-commerce Product Research:
Use a column of Amazon product URLs and the 'Ecommerce' workflow type to extract product details, prices, and reviews for market analysis.
- →Lead Generation from Search:
Provide a column of search queries like "Marketing Managers in San Francisco". Use the 'Search' workflow type and define a prompt like "Click on the first LinkedIn result and extract name, company, and email".
Advanced Use Cases: Explore, Research, and Infer
- →Explore Sheets Workflow:
Given a column of paginated URLs in your sheet, the Explore Sheets Workflow allows you to efficiently extract data across multiple pages. rtrvr.ai will open a new sheet tab and systematically process each URL in the input column.
For each URL, it extracts the desired information. Furthermore, if necessary for deeper data collection, it can perform deep crawling by automatically opening listings or linked pages as new tabs to gather more granular details. This is ideal for scenarios like scraping product listings from e-commerce sites or extracting articles from paginated search results.
- →Research Sheets Workflow:
The Research Sheets Workflow is designed for data enrichment and comparative research tasks. For each row in your sheet, you provide a primary URL (e.g., a company page) and a "Research Starting URL" (e.g., LinkedIn, Crunchbase, etc.).
rtrvr.ai opens both URLs for each row and conducts research across this corpus of pages based on your defined prompt. This is perfect for gathering comprehensive information about entities, such as competitive analysis, lead enrichment (finding employee counts, funding rounds, etc.), or in-depth profile research by combining multiple sources.
- →Infer Sheets Workflow:
The Infer Sheets Workflow streamlines direct LLM (Language Model) inference on data within your sheet. You can provide input columns and define a prompt to be applied to each row individually or to the entire dataset at once, depending on your needs.
This workflow is versatile for tasks like content generation, sentiment analysis, data classification, or summarization. It allows you to leverage the power of AI directly within your spreadsheets for quick insights and data manipulation without extensive web interaction.
Example Sheets Workflows Output Table
Here is an example Google Sheet output with a sheet tab demonstrating the output for each possibe workflow:
Tips for Using Sheets Workflows
- →Start Small: Begin with simple workflows and gradually add complexity as you become more comfortable.
- →Use Context Wisely: Leverage context columns and prior step outputs to create more intelligent and adaptive workflows.
- →Test and Refine: Run your workflows on a small sample of data first to ensure they are working as expected before processing large datasets.
- →Understand the DAG: Visualize your multi-step workflows as a Directed Acyclic Graph to better understand the flow of data and dependencies between steps.