This guide covers the fundamentals of querying data with NQL: selecting columns, filtering rows with WHERE clauses, and transforming data using expressions and functions.
Prerequisites
- A Narrative I/O account with at least one dataset
- Basic familiarity with SQL syntax
What you’ll learn
- How to select specific columns and use aliases
- How to filter data with comparison and logical operators
- How to transform data using expressions and functions
- How to work with complex types (arrays, structs, maps)
Selecting columns
Basic column selection
Select specific columns from a dataset:
SELECT
user_id,
email,
created_at
FROM company_data."123"
Using column aliases
Rename columns in your output with AS:
SELECT
user_id AS id,
email AS contact_email,
created_at AS signup_date
FROM company_data."123"
Fully qualified column names
When joining datasets or for clarity, use fully qualified names:
SELECT
company_data."123".user_id,
company_data."123".email
FROM company_data."123"
Always use fully qualified column names in joins to avoid ambiguity.
Select all columns
Use * to select all columns (use sparingly):
SELECT *
FROM company_data."123"
LIMIT 100 ROWS
Select distinct values
Remove duplicates with DISTINCT:
SELECT DISTINCT category
FROM company_data."123"
Filtering with WHERE
Comparison filters
Filter rows using comparison operators:
SELECT *
FROM company_data."123"
WHERE
age >= 18
AND age < 65
AND status = 'active'
| Operator | Meaning |
|---|
= | Equal to |
<> | Not equal to |
< | Less than |
> | Greater than |
<= | Less than or equal |
>= | Greater than or equal |
Combining conditions
Use AND, OR, and NOT to combine conditions:
SELECT *
FROM company_data."123"
WHERE
(status = 'active' OR status = 'pending')
AND NOT is_test_account
AND region = 'US'
Use parentheses to control evaluation order. AND is evaluated before OR.
IN and NOT IN
Check membership in a list:
SELECT *
FROM company_data."123"
WHERE
category IN ('electronics', 'clothing', 'home')
AND source NOT IN ('test', 'internal')
BETWEEN
Filter within a range (inclusive):
SELECT *
FROM company_data."123"
WHERE
age BETWEEN 25 AND 45
AND score BETWEEN 80 AND 100
NULL checks
Check for missing values:
SELECT *
FROM company_data."123"
WHERE
email IS NOT NULL
AND phone IS NOT NULL
AND secondary_email IS NULL
Don’t use = NULL or <> NULL. These always return NULL, not true/false. Use IS NULL and IS NOT NULL.
Pattern matching with LIKE
Match string patterns using wildcards:
SELECT *
FROM company_data."123"
WHERE
email LIKE '%@gmail.com' -- Ends with @gmail.com
AND name LIKE 'John%' -- Starts with John
AND code LIKE 'A__' -- A followed by exactly 2 characters
| Wildcard | Matches |
|---|
% | Zero or more characters |
_ | Exactly one character |
For case-insensitive matching, normalize the case first:
WHERE LOWER(email) LIKE '%@gmail.com'
Date and time filtering
Current date comparisons
Filter relative to the current date:
SELECT *
FROM company_data."123"
WHERE
-- Last 30 days
event_date >= CURRENT_DATE - INTERVAL '30' DAY
-- Last 3 months
AND created_at >= CURRENT_DATE - INTERVAL '3' MONTH
Date ranges
Filter within specific date ranges:
SELECT *
FROM company_data."123"
WHERE
event_date BETWEEN '2024-01-01' AND '2024-12-31'
Filter by specific date components:
SELECT *
FROM company_data."123"
WHERE
EXTRACT(YEAR FROM event_date) = 2024
AND EXTRACT(MONTH FROM event_date) IN (1, 2, 3)
AND EXTRACT(DAYOFWEEK FROM event_date) = 1 -- Sunday
Price filtering
Control costs by filtering on the special _price_cpm_usd column:
SELECT *
FROM company_data."123"
WHERE
category = 'premium'
AND _price_cpm_usd <= 1.00 -- Max $1 per 1000 rows
Always include price filtering to control data acquisition costs, especially when creating materialized views.
Expressions in SELECT
Compute values using expressions:
SELECT
quantity * unit_price AS line_total,
(quantity * unit_price) * 1.0825 AS total_with_tax,
first_name || ' ' || last_name AS full_name
FROM company_data."123"
CASE expressions
Apply conditional logic:
SELECT
user_id,
CASE
WHEN lifetime_value > 1000 THEN 'platinum'
WHEN lifetime_value > 500 THEN 'gold'
WHEN lifetime_value > 100 THEN 'silver'
ELSE 'bronze'
END AS tier
FROM company_data."123"
Clean and format text:
SELECT
UPPER(name) AS uppercase_name,
LOWER(email) AS lowercase_email,
TRIM(user_input) AS cleaned_input,
CONCAT(city, ', ', state) AS location,
SUBSTRING(phone, 1, 3) AS area_code
FROM company_data."123"
Type casting
Convert between types:
SELECT
CAST(string_number AS LONG) AS numeric_value,
CAST(timestamp_value AS DATE) AS date_only,
CAST(amount AS DECIMAL) AS precise_amount
FROM company_data."123"
Working with complex types
Accessing array elements
Use bracket notation with zero-based indexing:
SELECT
identifiers[0] AS first_id,
identifiers[1] AS second_id,
tags[0] AS primary_tag
FROM company_data."123"
Accessing struct fields
Use dot notation:
SELECT
person.name,
person.age,
address.city,
address.postal_code
FROM company_data."123"
Nested access
Combine notation for deeply nested data:
SELECT
-- Array of structs
identifiers[0].type AS first_id_type,
identifiers[0].value AS first_id_value,
-- Struct with nested struct
user.preferences.language,
user.preferences.timezone
FROM company_data."123"
Accessing map values
Use ELEMENT_AT for maps:
SELECT
ELEMENT_AT(properties, 'color') AS color,
ELEMENT_AT(metadata, 'source') AS source
FROM company_data."123"
Expanding arrays with UNNEST
Convert array elements to rows:
SELECT
t.user_id,
tag
FROM company_data."123" t, UNNEST(t.tags) AS tag
This creates one row per array element. If a user has tags ['a', 'b', 'c'], the result includes three rows for that user.
Aggregating data
Basic aggregations
Compute summary statistics:
SELECT
COUNT(*) AS total_rows,
COUNT(DISTINCT user_id) AS unique_users,
SUM(amount) AS total_amount,
AVG(score) AS average_score,
MIN(created_at) AS earliest,
MAX(created_at) AS latest
FROM company_data."123"
Grouping
Aggregate by categories:
SELECT
category,
COUNT(*) AS count,
SUM(amount) AS total,
AVG(amount) AS average
FROM company_data."123"
GROUP BY category
Filtering aggregated results
Use HAVING to filter after aggregation:
SELECT
category,
COUNT(*) AS count
FROM company_data."123"
GROUP BY category
HAVING COUNT(*) > 100
Deduplication with QUALIFY
Remove duplicates based on a window function:
SELECT
user_id,
email,
last_seen
FROM company_data."123"
QUALIFY ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY last_seen DESC
) = 1
This keeps only the most recent record for each user.
Putting it together
Here’s a complete example combining multiple techniques:
SELECT
user_id,
LOWER(email) AS normalized_email,
CASE
WHEN age < 25 THEN '18-24'
WHEN age < 35 THEN '25-34'
WHEN age < 45 THEN '35-44'
ELSE '45+'
END AS age_bracket,
identifiers[0].type AS primary_id_type,
created_at
FROM company_data."123"
WHERE
email IS NOT NULL
AND status = 'active'
AND created_at >= CURRENT_DATE - INTERVAL '90' DAY
AND _price_cpm_usd <= 0.50
QUALIFY ROW_NUMBER() OVER (
PARTITION BY user_id
ORDER BY created_at DESC
) = 1
ORDER BY created_at DESC
LIMIT 1000 ROWS
Related content