Examples
Most simple pattern matching query
I believe the following query (or a variation thereof) is the most simple possible query using the match_recognize
clause.
The query defines one pattern variable (ptrn
) which matches all records (because 1
is always equal to 1
).
The pattern
clause stipulates to match all records that are mapped to ptrn
, i. e. all records in this query.
select
*
from
dual
match_recognize (
measures dummy as col
pattern(ptrn)
define
ptrn as 1=1
);
Find rows 'between' a starting and ending row
The following example selects the value of txt
that appears between rows whose txt
value is BEGIN SECTION
and END SECTION
(ordered by nr
).
The define
clause uses first_line as prev(first_line.txt) = 'BEGIN SECTION'
to define a pattern variable named first_line
which maps to all records whose txt
value of the previous (prev(…)
) record is BEGIN SECTION
.
Similarly, a pattern variable named last_line
is defined.
The pattern
specifies to start with a mandatory first_line
, have any number of any_line
and end with a mandatory last_line
.
Note that any_line
is not defined in the define
clause and thus any record is mapped to any_line
.
Because we want all txt
values of the matched records, we specify all rows per match
.
Finally, we need to distinguish the matched sections (i. e. matched patterns) and therefore assign the value of match_number()
to the alias mnr
.
with d as (
select 1 nr, 'some text' txt from dual union all
select 2 nr, 'more text' txt from dual union all
select 3 nr, 'BEGIN SECTION' txt from dual union all
select 4 nr, 'hello' txt from dual union all
select 5 nr, 'world.' txt from dual union all
select 6 nr, 'END SECTION' txt from dual union all
select 7 nr, 'uninteresing text' txt from dual union all
select 8 nr, 'Really - this is' txt from dual union all
select 9 nr, 'still uninteresting' txt from dual union all
select 10 nr, null txt from dual union all
select 11 nr, 'still uninteresting' txt from dual union all
select 12 nr, 'BEGIN SECTION' txt from dual union all
select 13 nr, 'the number' txt from dual union all
select 14 nr, 'associated with' txt from dual union all
select 15 nr, 'THE answer is...' txt from dual union all
select 16 nr, '42' txt from dual union all
select 17 nr, 'END SECTION' txt from dual union all
select 18 nr, 'That, my friend, is' txt from dual union all
select 19 nr, 'THE END' txt from dual
)
select
m.mnr,
m.txt,
m.nr
from
d
match_recognize (
order by nr
measures match_number() as mnr
all rows per match
pattern (first_line any_line*? last_line)
define
first_line as prev(first_line.txt) = 'BEGIN SECTION',
last_line as next( last_line.txt) = 'END SECTION'
) m;
--
-- MNR TXT NR
-- ---------- ------------------- ----------
-- 1 hello 4
-- 1 world. 5
-- 2 the number 13
-- 2 associated with 14
-- 2 THE answer is... 15
-- 2 42 16
Stock prices
This is a variation of the «stock price» example that is typically used to explain the match_recognize
clause. The select statement is suppposed to find prices at their (local) highs and bottoms.
First, we need a table and some data. The column dt
is supposed to be date, but to make things a bit easier, it's just a number:
create table tq84_stock_price (
company varchar2(5),
dt number,
price number
);
insert into tq84_stock_price values ('ABC', 1, 7);
insert into tq84_stock_price values ('ABC', 2, 9);
insert into tq84_stock_price values ('ABC', 3, 10);
insert into tq84_stock_price values ('ABC', 4, 10);
insert into tq84_stock_price values ('ABC', 5, 11);
insert into tq84_stock_price values ('ABC', 6, 8);
insert into tq84_stock_price values ('ABC', 7, 7);
insert into tq84_stock_price values ('ABC', 8, 6);
insert into tq84_stock_price values ('ABC', 9, 6);
insert into tq84_stock_price values ('ABC', 10, 8);
insert into tq84_stock_price values ('ABC', 11, 9);
insert into tq84_stock_price values ('DEF', 1, 6);
insert into tq84_stock_price values ('DEF', 2, 5);
insert into tq84_stock_price values ('DEF', 3, 7);
insert into tq84_stock_price values ('DEF', 4, 8);
insert into tq84_stock_price values ('DEF', 5, 10);
insert into tq84_stock_price values ('DEF', 6, 9);
insert into tq84_stock_price values ('DEF', 7, 10);
insert into tq84_stock_price values ('DEF', 8, 12);
insert into tq84_stock_price values ('DEF', 9, 7);
insert into tq84_stock_price values ('DEF', 10, 8);
insert into tq84_stock_price values ('DEF', 11, 9);
insert into tq84_stock_price values ('GHI', 1, 9);
insert into tq84_stock_price values ('GHI', 2, 8);
insert into tq84_stock_price values ('GHI', 3, 7);
insert into tq84_stock_price values ('GHI', 4, 6);
insert into tq84_stock_price values ('GHI', 5, 7);
insert into tq84_stock_price values ('GHI', 6, 8);
insert into tq84_stock_price values ('GHI', 7, 7);
insert into tq84_stock_price values ('GHI', 8, 6);
insert into tq84_stock_price values ('GHI', 9, 7);
insert into tq84_stock_price values ('GHI', 10, 8);
insert into tq84_stock_price values ('GHI', 11, 7);
The following query is a poor man's attempt to visualize each company's stock price. It can be seen, for example, that DEF has a bottom prices at days 2, 6 and 9 and high prices at days 5, 8 and 11:
column abc format a20
column def format a20
column ghi format a20
select
dt,
max(lpad('o', case when company = 'ABC' then price end)) abc,
max(lpad('o', case when company = 'DEF' then price end)) def,
max(lpad('o', case when company = 'GHI' then price end)) ghi
from
tq84_stock_price
group by
dt
order by
dt
--
-- DT ABC DEF GHI
-- --- -------------------- -------------------- --------------------
-- 1 o o o
-- 2 o o o
-- 3 o o o
-- 4 o o o
-- 5 o o o
-- 6 o o o
-- 7 o o o
-- 8 o o o
-- 9 o o o
-- 10 o o o
-- 11 o o o
;
Now for the select statement.
NOTE: ABC is not reported at all because the price at dt=1
is not recognized as bottom date, nor is the price at dt=11
recognized as high date
select *
from
tq84_stock_price
match_recognize (
partition by company
order by dt
measures
strt.dt as start_dt,
last (down.dt) as bottom_dt,
last (up .dt) as top_dt
one row per match
after match skip to last up
pattern (strt down+ up+)
define
down as down.price < prev(down.price),
up as up .price > prev(up .price)
) mr
order by
mr.company,
mr.start_dt
;
--
-- COMPA START_DT BOTTOM_DT TOP_DT
-- ----- ---------- ---------- ----------
-- DEF 1 2 5
-- DEF 5 6 8
-- DEF 8 9 11
-- GHI 1 4 6
-- GHI 6 8 10
Find foo bar baz
Find records with foo
, followed by records with bar
, followed by records with baz
.
create table tq84_t (
id number,
word varchar2(10)
);
insert into tq84_t values ( 1, 'abc');
insert into tq84_t values ( 2, 'foo');
insert into tq84_t values ( 3, 'def');
insert into tq84_t values ( 4, 'bar');
insert into tq84_t values ( 5, 'baz');
insert into tq84_t values ( 6, 'ghi');
insert into tq84_t values ( 7, 'bar');
insert into tq84_t values ( 8, 'jkl');
insert into tq84_t values ( 9, 'foo');
insert into tq84_t values (10, 'bar');
insert into tq84_t values (11, 'baz');
insert into tq84_t values (12, 'baz');
insert into tq84_t values (13, 'mno');
insert into tq84_t values (14, 'pqr');
insert into tq84_t values (15, 'foo');
insert into tq84_t values (16, 'bar');
insert into tq84_t values (17, 'baz');
insert into tq84_t values (18, 'stu');
insert into tq84_t values (19, 'foo');
insert into tq84_t values (20, 'bar');
insert into tq84_t values (21, 'vwx');
insert into tq84_t values (22, 'baz');
select *
from
tq84_t
match_recognize (
order by
id
measures
foo.id as start_id,
match_number() as match_counter
pattern (
foo bar baz
)
define
foo as foo.word = 'foo',
bar as bar.word = 'bar',
baz as baz.word = 'baz'
)
order by
start_id;
--
-- START_ID MATCH_COUNTER
-- ---------- -------------
-- 9 1
-- 15 2
drop table tq84_t purge;
Find a sequence of the same
Find the first id of sequence of the same words:
create table tq84_t (
id number,
word varchar2(10)
);
insert into tq84_t values ( 1, 'ab' );
insert into tq84_t values ( 2, 'ab' );
insert into tq84_t values ( 3, 'cde' );
insert into tq84_t values ( 4, 'cde' );
insert into tq84_t values ( 5, 'cde' );
insert into tq84_t values ( 6, 'cde' );
insert into tq84_t values ( 7, 'ab' );
insert into tq84_t values ( 8, 'cde' );
insert into tq84_t values ( 9, 'fghij' );
insert into tq84_t values (10, 'fghij' );
insert into tq84_t values (11, 'fghij' );
insert into tq84_t values (12, 'fghij' );
insert into tq84_t values (13, 'fghij' );
insert into tq84_t values (14, 'fghij' );
insert into tq84_t values (15, 'fghij' );
insert into tq84_t values (16, 'ab' );
insert into tq84_t values (17, 'ab' );
insert into tq84_t values (18, 'klmnopq');
insert into tq84_t values (19, 'cde' );
insert into tq84_t values (20, 'cde' );
select
count_same_words cnt,
id_start,
word
from
tq84_t
match_recognize (
order by
id
measures
a_word.word as word,
first(a_word.id) as id_start,
-- match_number() as match_counter,
count(*) as count_same_words
one row per match
pattern (
a_word* another_word
)
define
a_word as a_word.word = first(a_word.word),
another_word as another_word.word != a_word.word
)
order by
count_same_words desc;
--
-- CNT ID_START WORD
-- ---------- ---------- ----------
-- 7 10 fghij
-- 4 4 cde
-- 3 1 ab
-- 2 17 ab
-- 2 8 cde
drop table tq84_t purge;
Find three in a row
create table tq84_t (
id number,
word varchar2(10)
);
insert into tq84_t values ( 1, 'abc');
insert into tq84_t values ( 2, 'def');
insert into tq84_t values ( 3, 'def');
insert into tq84_t values ( 4, 'ghi');
insert into tq84_t values ( 5, 'jkl');
insert into tq84_t values ( 6, 'jkl');
insert into tq84_t values ( 7, 'jkl');
insert into tq84_t values ( 8, 'mno');
insert into tq84_t values ( 9, 'pqr');
insert into tq84_t values (10, 'pqr');
insert into tq84_t values (11, 'pqr');
insert into tq84_t values (12, 'pqr');
insert into tq84_t values (13, 'stu');
insert into tq84_t values (14, 'vwx');
insert into tq84_t values (15, 'vwx');
insert into tq84_t values (16, 'vwx');
insert into tq84_t values (17, 'yz.');
select * from tq84_t
match_recognize (
order by
id
measures
first(rec.id) as start_id,
rec.id as end_id,
match_number() as match_counter
pattern (
rec{3}
)
define
rec as rec.word = first(rec.word)
)
order by
start_id;
--
-- START_ID END_ID MATCH_COUNTER
-- ---------- ---------- -------------
-- 5 7 1
-- 9 11 2
-- 14 16 3
drop table tq84_t purge;
TODO
Running semantics is used while the rows are being matched into a pattern. When the complete match is established, final semantics becomes possible.
This explains, why final semantics is only possible in the measures
clause.
Row pattern navigation operations:
-
prev(val[, phsical-offset])
, next(val [, physical-offset])
. (prev
and next
always have running semantics)
-
first(val[, logical-offset])
, last(val [, logical-offset])
. (first
and last
support running and final semantics)
match_number()
assigns a sequential number, starting with 0
, to each match within a partition.
classifier()
returns a text that corresponds to the pattern variable to which a row is mapped.
AFTER MATCH SKIP
-
TO NEXT ROW
-
PAST LAST ROW
-
TO [FIRST|LAST] pattern_variable
(TO pattern_variable
is equal to TO LAST pattern_variable
)
The universal row pattern variable references the entire set of all rows in a match.
Expressions in the measures
and define
clauses have the same syntax and semantics except that the measures
claus also supports «final semantics»
mrvar