Skip to main content
This guide covers the fundamentals of querying data with NQL: selecting columns, filtering rows with WHERE clauses, and transforming data using expressions and functions.

Prerequisites

  • A Narrative I/O account with at least one dataset
  • Basic familiarity with SQL syntax

What you’ll learn

  • How to select specific columns and use aliases
  • How to filter data with comparison and logical operators
  • How to transform data using expressions and functions
  • How to work with complex types (arrays, structs, maps)

Selecting columns

Basic column selection

Select specific columns from a dataset:
SELECT
  user_id,
  email,
  created_at
FROM company_data."123"

Using column aliases

Rename columns in your output with AS:
SELECT
  user_id AS id,
  email AS contact_email,
  created_at AS signup_date
FROM company_data."123"

Fully qualified column names

When joining datasets or for clarity, use fully qualified names:
SELECT
  company_data."123".user_id,
  company_data."123".email
FROM company_data."123"
Always use fully qualified column names in joins to avoid ambiguity.

Select all columns

Use * to select all columns (use sparingly):
SELECT *
FROM company_data."123"
LIMIT 100 ROWS

Select distinct values

Remove duplicates with DISTINCT:
SELECT DISTINCT category
FROM company_data."123"

Filtering with WHERE

Comparison filters

Filter rows using comparison operators:
SELECT *
FROM company_data."123"
WHERE
  age >= 18
  AND age < 65
  AND status = 'active'
OperatorMeaning
=Equal to
<>Not equal to
<Less than
>Greater than
<=Less than or equal
>=Greater than or equal

Combining conditions

Use AND, OR, and NOT to combine conditions:
SELECT *
FROM company_data."123"
WHERE
  (status = 'active' OR status = 'pending')
  AND NOT is_test_account
  AND region = 'US'
Use parentheses to control evaluation order. AND is evaluated before OR.

IN and NOT IN

Check membership in a list:
SELECT *
FROM company_data."123"
WHERE
  category IN ('electronics', 'clothing', 'home')
  AND source NOT IN ('test', 'internal')

BETWEEN

Filter within a range (inclusive):
SELECT *
FROM company_data."123"
WHERE
  age BETWEEN 25 AND 45
  AND score BETWEEN 80 AND 100

NULL checks

Check for missing values:
SELECT *
FROM company_data."123"
WHERE
  email IS NOT NULL
  AND phone IS NOT NULL
  AND secondary_email IS NULL
Don’t use = NULL or <> NULL. These always return NULL, not true/false. Use IS NULL and IS NOT NULL.

Pattern matching with LIKE

Match string patterns using wildcards:
SELECT *
FROM company_data."123"
WHERE
  email LIKE '%@gmail.com'      -- Ends with @gmail.com
  AND name LIKE 'John%'          -- Starts with John
  AND code LIKE 'A__'            -- A followed by exactly 2 characters
WildcardMatches
%Zero or more characters
_Exactly one character
For case-insensitive matching, normalize the case first:
WHERE LOWER(email) LIKE '%@gmail.com'

Date and time filtering

Current date comparisons

Filter relative to the current date:
SELECT *
FROM company_data."123"
WHERE
  -- Last 30 days
  event_date >= CURRENT_DATE - INTERVAL '30' DAY

  -- Last 3 months
  AND created_at >= CURRENT_DATE - INTERVAL '3' MONTH

Date ranges

Filter within specific date ranges:
SELECT *
FROM company_data."123"
WHERE
  event_date BETWEEN '2024-01-01' AND '2024-12-31'

Extracting date parts

Filter by specific date components:
SELECT *
FROM company_data."123"
WHERE
  EXTRACT(YEAR FROM event_date) = 2024
  AND EXTRACT(MONTH FROM event_date) IN (1, 2, 3)
  AND EXTRACT(DAYOFWEEK FROM event_date) = 1  -- Sunday

Price filtering

Control costs by filtering on the special _price_cpm_usd column:
SELECT *
FROM company_data."123"
WHERE
  category = 'premium'
  AND _price_cpm_usd <= 1.00  -- Max $1 per 1000 rows
Always include price filtering to control data acquisition costs, especially when creating materialized views.

Transforming data

Expressions in SELECT

Compute values using expressions:
SELECT
  quantity * unit_price AS line_total,
  (quantity * unit_price) * 1.0825 AS total_with_tax,
  first_name || ' ' || last_name AS full_name
FROM company_data."123"

CASE expressions

Apply conditional logic:
SELECT
  user_id,
  CASE
    WHEN lifetime_value > 1000 THEN 'platinum'
    WHEN lifetime_value > 500 THEN 'gold'
    WHEN lifetime_value > 100 THEN 'silver'
    ELSE 'bronze'
  END AS tier
FROM company_data."123"

String transformations

Clean and format text:
SELECT
  UPPER(name) AS uppercase_name,
  LOWER(email) AS lowercase_email,
  TRIM(user_input) AS cleaned_input,
  CONCAT(city, ', ', state) AS location,
  SUBSTRING(phone, 1, 3) AS area_code
FROM company_data."123"

Type casting

Convert between types:
SELECT
  CAST(string_number AS LONG) AS numeric_value,
  CAST(timestamp_value AS DATE) AS date_only,
  CAST(amount AS DECIMAL) AS precise_amount
FROM company_data."123"

Working with complex types

Accessing array elements

Use bracket notation with zero-based indexing:
SELECT
  identifiers[0] AS first_id,
  identifiers[1] AS second_id,
  tags[0] AS primary_tag
FROM company_data."123"

Accessing struct fields

Use dot notation:
SELECT
  person.name,
  person.age,
  address.city,
  address.postal_code
FROM company_data."123"

Nested access

Combine notation for deeply nested data:
SELECT
  -- Array of structs
  identifiers[0].type AS first_id_type,
  identifiers[0].value AS first_id_value,

  -- Struct with nested struct
  user.preferences.language,
  user.preferences.timezone
FROM company_data."123"

Accessing map values

Use ELEMENT_AT for maps:
SELECT
  ELEMENT_AT(properties, 'color') AS color,
  ELEMENT_AT(metadata, 'source') AS source
FROM company_data."123"

Expanding arrays with UNNEST

Convert array elements to rows:
SELECT
  t.user_id,
  tag
FROM company_data."123" t, UNNEST(t.tags) AS tag
This creates one row per array element. If a user has tags ['a', 'b', 'c'], the result includes three rows for that user.

Aggregating data

Basic aggregations

Compute summary statistics:
SELECT
  COUNT(*) AS total_rows,
  COUNT(DISTINCT user_id) AS unique_users,
  SUM(amount) AS total_amount,
  AVG(score) AS average_score,
  MIN(created_at) AS earliest,
  MAX(created_at) AS latest
FROM company_data."123"

Grouping

Aggregate by categories:
SELECT
  category,
  COUNT(*) AS count,
  SUM(amount) AS total,
  AVG(amount) AS average
FROM company_data."123"
GROUP BY category

Filtering aggregated results

Use HAVING to filter after aggregation:
SELECT
  category,
  COUNT(*) AS count
FROM company_data."123"
GROUP BY category
HAVING COUNT(*) > 100

Deduplication with QUALIFY

Remove duplicates based on a window function:
SELECT
  user_id,
  email,
  last_seen
FROM company_data."123"
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY user_id
  ORDER BY last_seen DESC
) = 1
This keeps only the most recent record for each user.

Putting it together

Here’s a complete example combining multiple techniques:
SELECT
  user_id,
  LOWER(email) AS normalized_email,
  CASE
    WHEN age < 25 THEN '18-24'
    WHEN age < 35 THEN '25-34'
    WHEN age < 45 THEN '35-44'
    ELSE '45+'
  END AS age_bracket,
  identifiers[0].type AS primary_id_type,
  created_at
FROM company_data."123"
WHERE
  email IS NOT NULL
  AND status = 'active'
  AND created_at >= CURRENT_DATE - INTERVAL '90' DAY
  AND _price_cpm_usd <= 0.50
QUALIFY ROW_NUMBER() OVER (
  PARTITION BY user_id
  ORDER BY created_at DESC
) = 1
ORDER BY created_at DESC
LIMIT 1000 ROWS