Filtering and Transforming Data - Narrative I/O Knowledge Base

This guide covers the fundamentals of querying data with NQL: selecting columns, filtering rows with WHERE clauses, and transforming data using expressions and functions.

Prerequisites

A Narrative I/O account with at least one dataset
Basic familiarity with SQL syntax

What you’ll learn

How to select specific columns and use aliases
How to filter data with comparison and logical operators
How to transform data using expressions and functions
How to work with complex types (arrays, structs, maps)

Selecting columns

Basic column selection

Select specific columns from a dataset:

SELECT
  user_id,
  email,
  created_at
FROM company_data."123"

Using column aliases

Rename columns in your output with AS:

SELECT
  user_id AS id,
  email AS contact_email,
  created_at AS signup_date
FROM company_data."123"

Fully qualified column names

When joining datasets or for clarity, use fully qualified names:

SELECT
  company_data."123".user_id,
  company_data."123".email
FROM company_data."123"

Always use fully qualified column names in joins to avoid ambiguity.

NQL requires explicit column lists. Wildcards (SELECT *) are not supported. See Explicit Column Selection for why and how to list columns explicitly.

Select distinct values

Remove duplicates with DISTINCT:

SELECT DISTINCT category
FROM company_data."123"

Filtering with WHERE

Comparison filters

Filter rows using comparison operators:

SELECT user_id, email, age, status
FROM company_data."123"
WHERE
  age >= 18
  AND age < 65
  AND status = 'active'

Operator	Meaning
`=`	Equal to
`<>`	Not equal to
`<`	Less than
`>`	Greater than
`<=`	Less than or equal
`>=`	Greater than or equal

Combining conditions

Use AND, OR, and NOT to combine conditions:

SELECT user_id, email, status, region
FROM company_data."123"
WHERE
  (status = 'active' OR status = 'pending')
  AND NOT is_test_account
  AND region = 'US'

Use parentheses to control evaluation order. AND is evaluated before OR.

IN and NOT IN

Check membership in a list:

SELECT user_id, category, source, created_at
FROM company_data."123"
WHERE
  category IN ('electronics', 'clothing', 'home')
  AND source NOT IN ('test', 'internal')

BETWEEN

Filter within a range (inclusive):

SELECT user_id, email, age, score
FROM company_data."123"
WHERE
  age BETWEEN 25 AND 45
  AND score BETWEEN 80 AND 100

NULL checks

Check for missing values:

SELECT user_id, email, phone
FROM company_data."123"
WHERE
  email IS NOT NULL
  AND phone IS NOT NULL
  AND secondary_email IS NULL

Don’t use = NULL or <> NULL. These always return NULL, not true/false. Use IS NULL and IS NOT NULL.

Pattern matching with LIKE

Match string patterns using LIKE wildcards:

SELECT user_id, email, name, code
FROM company_data."123"
WHERE
  email LIKE '%@gmail.com'      -- Ends with @gmail.com
  AND name LIKE 'John%'          -- Starts with John
  AND code LIKE 'A__'            -- A followed by exactly 2 characters

Wildcard	Matches
`%`	Zero or more characters
`_`	Exactly one character

For case-insensitive matching, normalize the case first:

WHERE LOWER(email) LIKE '%@gmail.com'

Date and time filtering

Current date comparisons

Filter relative to the current date:

SELECT user_id, event_date, created_at
FROM company_data."123"
WHERE
  -- Last 30 days
  event_date >= CURRENT_DATE - INTERVAL '30' DAY

  -- Last 3 months
  AND created_at >= CURRENT_DATE - INTERVAL '3' MONTH

Date ranges

Filter within specific date ranges:

SELECT user_id, email, event_date
FROM company_data."123"
WHERE
  event_date BETWEEN '2024-01-01' AND '2024-12-31'

Extracting date parts

Filter by specific date components:

SELECT user_id, event_date, event_type
FROM company_data."123"
WHERE
  EXTRACT(YEAR FROM event_date) = 2024
  AND EXTRACT(MONTH FROM event_date) IN (1, 2, 3)
  AND EXTRACT(DAYOFWEEK FROM event_date) = 1  -- Sunday

Price filtering

Control costs by filtering on the special _price_cpm_usd column:

SELECT user_id, email, category
FROM company_data."123"
WHERE
  category = 'premium'
  AND _price_cpm_usd <= 1.00  -- Max $1 per 1000 rows

Always include price filtering to control data acquisition costs, especially when creating materialized views.

Transforming data

Expressions in SELECT

Compute values using expressions:

SELECT
  quantity * unit_price AS line_total,
  (quantity * unit_price) * 1.0825 AS total_with_tax,
  first_name || ' ' || last_name AS full_name
FROM company_data."123"

CASE expressions

Apply conditional logic:

SELECT
  user_id,
  CASE
    WHEN lifetime_value > 1000 THEN 'platinum'
    WHEN lifetime_value > 500 THEN 'gold'
    WHEN lifetime_value > 100 THEN 'silver'
    ELSE 'bronze'
  END AS tier
FROM company_data."123"

String transformations

Clean and format text:

SELECT
  UPPER(name) AS uppercase_name,
  LOWER(email) AS lowercase_email,
  TRIM(user_input) AS cleaned_input,
  CONCAT(city, ', ', state) AS location,
  SUBSTRING(phone, 1, 3) AS area_code
FROM company_data."123"

Type casting

Convert between types:

SELECT
  CAST(string_number AS LONG) AS numeric_value,
  CAST(timestamp_value AS DATE) AS date_only,
  CAST(amount AS DECIMAL) AS precise_amount
FROM company_data."123"

Working with complex types

Accessing array elements

Use bracket notation with zero-based indexing:

SELECT
  identifiers[0] AS first_id,
  identifiers[1] AS second_id,
  tags[0] AS primary_tag
FROM company_data."123"

Accessing struct fields

Use dot notation:

SELECT
  person.name,
  person.age,
  address.city,
  address.postal_code
FROM company_data."123"

Nested access

Combine notation for deeply nested data:

SELECT
  -- Array of structs
  identifiers[0].type AS first_id_type,
  identifiers[0].value AS first_id_value,

  -- Struct with nested struct
  user.preferences.language,
  user.preferences.timezone
FROM company_data."123"

Accessing map values

Use ELEMENT_AT for maps:

SELECT
  ELEMENT_AT(properties, 'color') AS color,
  ELEMENT_AT(metadata, 'source') AS source
FROM company_data."123"

Expanding arrays with UNNEST

Convert array elements to rows:

SELECT
  t.user_id,
  tag
FROM company_data."123" t, UNNEST(t.tags) AS tag

This creates one row per array element. If a user has tags ['a', 'b', 'c'], the result includes three rows for that user.

Aggregating data

Basic aggregations

Compute summary statistics:

SELECT
  COUNT(1) AS total_rows,
  APPROX_COUNT_DISTINCT(user_id) AS unique_users,
  SUM(amount) AS total_amount,
  AVG(score) AS average_score,
  MIN(created_at) AS earliest,
  MAX(created_at) AS latest
FROM company_data."123"

Grouping

Aggregate by categories:

SELECT
  category,
  COUNT(1) AS count,
  SUM(amount) AS total,
  AVG(amount) AS average
FROM company_data."123"
GROUP BY category

Filtering aggregated results

Use HAVING to filter after aggregation:

SELECT
  category,
  COUNT(1) AS count
FROM company_data."123"
GROUP BY category
HAVING COUNT(1) > 100

Deduplication with QUALIFY

Remove duplicates based on a window function:

SELECT
  user_id,
  email,
  last_seen
FROM company_data."123"
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY user_id
  ORDER BY last_seen DESC
) = 1

This keeps only the most recent record for each user.

Putting it together

Here’s a complete example combining multiple techniques:

SELECT
  user_id,
  LOWER(email) AS normalized_email,
  CASE
    WHEN age < 25 THEN '18-24'
    WHEN age < 35 THEN '25-34'
    WHEN age < 45 THEN '35-44'
    ELSE '45+'
  END AS age_bracket,
  identifiers[0].type AS primary_id_type,
  created_at
FROM company_data."123"
WHERE
  email IS NOT NULL
  AND status = 'active'
  AND created_at >= CURRENT_DATE - INTERVAL '90' DAY
  AND _price_cpm_usd <= 0.50
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY user_id
  ORDER BY created_at DESC
) = 1
ORDER BY created_at DESC
LIMIT 1000 ROWS

Joining Datasets

Combine data from multiple sources

NQL Syntax Reference

Complete syntax documentation

Operators Reference

All comparison and logical operators

Functions Reference

String, date, and transformation functions

Overview

Platform

Data Ingestion

Querying with NQL

Data Collaboration

Data Activation

Connectors

Compliance

Rosetta Stone

Account Settings

Workflows

Webhooks

Tools

SDKs

​Prerequisites

​What you’ll learn

​Selecting columns

​Basic column selection

​Using column aliases

​Fully qualified column names

​Select distinct values

​Filtering with WHERE

​Comparison filters

​Combining conditions

​IN and NOT IN

​BETWEEN

​NULL checks

​Pattern matching with LIKE

​Date and time filtering

​Current date comparisons

​Date ranges

​Extracting date parts

​Price filtering

​Transforming data

​Expressions in SELECT

​CASE expressions

​String transformations

​Type casting

​Working with complex types

​Accessing array elements

​Accessing struct fields

​Nested access

​Accessing map values

​Expanding arrays with UNNEST

​Aggregating data

​Basic aggregations

​Grouping

​Filtering aggregated results

​Deduplication with QUALIFY

​Putting it together

​Related content

Joining Datasets

NQL Syntax Reference

Operators Reference

Functions Reference

Prerequisites

What you’ll learn

Selecting columns

Basic column selection

Using column aliases

Fully qualified column names

Select distinct values

Filtering with WHERE

Comparison filters

Combining conditions

IN and NOT IN

BETWEEN

NULL checks

Pattern matching with LIKE

Date and time filtering

Current date comparisons

Date ranges

Extracting date parts

Price filtering

Transforming data

Expressions in SELECT

CASE expressions

String transformations

Type casting

Working with complex types

Accessing array elements

Accessing struct fields

Nested access

Accessing map values

Expanding arrays with UNNEST

Aggregating data

Basic aggregations

Grouping

Filtering aggregated results

Deduplication with QUALIFY

Putting it together

Related content