Looping Over Data with Iterators
Anytime we want to do anything with multiple columns, We're going to use something called an iterator. An iterator basically is like a for loop in a programming language or a cursor in SQL. It processes and expression and it does it row by row. And this runs counter to how most things work with the DAX engine because those things run column by column.
Iterators
Iterators are one of the more advanced features of the DAX engine. They allow you to use multiple columns in an aggregation. They also allow for a certain row by row operations.
For example, let's say you wanted to list all of the vendors that are available based on the sales detail information. Well you would concatenate x to join together all those vendor names and you would have to go row by row. So you're going to need an iterator.
What is an Iterator?
We talked before about how DAX engine is optimized for working with columns, not rows. However, there are many situations where that just isn't going to be enough. For those situations we need something called an iterator. So what makes an iterator unique?
First, there's times when you just have to process things in a row by row manner. And as a result of that, performance for iterators is worse than functions that take a columnar approach. SUMX, which is an iterator, is going to be less performant operation than just SUM.
That being said, don't be afraid to use iterators because they're impossible to avoid when you're working with the DAX engine. You should only be cautious if you're forced to start nesting them or you have to use them all over the place.
One of the biggest benefits they provide is they allow for the use of multiple columns expression. If you want to avoid cluttering your data model with calculated columns, you're going to need to use an iterator.
- Work row by row
- Worse performance
- Allow for multi-column expressions
How Do We Know If We're Dealing With An Iterator?
While iterators aren't all the same, many of them do follow the same pattern. The first and most important thing is that the first parameter is a table. An iterator needs to know what it's iterating through, what it is looping through? You can't make any assumptions with that. You have to tell it explicitly, here is the table you're going to loop through. You can sometimes create a virtual table to your specification.
The second parameter is an expression, so it takes in a table, it says here's what we're going to loop through, and then it takes in the expression and says for each row evaluate this expression with a row context on just this row, nothing else, just this row.
Iterator Example
The FILTER Function is an iterator. The FILTER function takes in a table, it takes in an expression that it evaluates on each row, and then the FILTER function returns a table that's been filtered out.
There's another way to identify an iterator, many of them end in the letter X. Now you may be thinking, well there's no FILTERX, and that's true. Not all iterators follow this pattern, but a number of them do. SO instead of SUM you have SUMX, instead of MAX you have MAXX. And instead of CONCATENATE you have CONCATENATEX.
- Takes in a table parameter
- Takes in an expression
- Many end with "X"
Iterator Syntax Example
So let's say we want to take the average of gross sales, but we didn't want to create a calculated column because that'll take up more RAM and it might clutter up our data model.
Well here's an example of average gross sales using an iterator.
Average Gross Sales = AVERAGEX( Category, [Quantity] * [Price] )
So on the left I've got AVERAGEX
and that's going to do practically the same thing as AVERAGE
, but is is going to iterate. So the next piece is the first parameter, it's Category, it's a table. I'm passing a table and saying for every row, calculate this thing and then average it together. And so the second parameter is Quantity times Price. And so what's going to do is it's going to go row by row through Category. For each row it's going to multiple Quantity times Price and then it's going to take the average of those calculated values.
So here's a perfect example of how we could produce a similar result without having to make a calculated column in between.