Oracle SQL: MATCH_RECOGNIZE clause

Examples

Most simple pattern matching query

I believe the following query (or a variation thereof) is the most simple possible query using the match_recognize clause.

The query defines one pattern variable (ptrn) which matches all records (because 1 is always equal to 1).

The pattern clause stipulates to match all records that are mapped to ptrn, i. e. all records in this query.

Finally, it adds a measures clause to prevent an ORA-30732: table contains no user-visible columns error:

select
   *
from
   dual
match_recognize (
   measures dummy as col
   pattern(ptrn)
   define
      ptrn as 1=1
);

See also the related example abusing match_recognize to select the first row which uses the pattern ptrn+ to match the entire result set and then applies first(…) to the selected columns to return a table's first record.

Find rows 'between' a starting and ending row

The following example selects the value of txt that appears between rows whose txt value is BEGIN SECTION and END SECTION (ordered by nr).

The define clause uses first_line as prev(first_line.txt) = 'BEGIN SECTION' to define a pattern variable named first_line which maps to all records whose txt value of the previous (prev(…)) record is BEGIN SECTION.

Similarly, a pattern variable named last_line is defined.

The pattern specifies to start with a mandatory first_line, have any number of any_line and end with a mandatory last_line.

Note that any_line is not defined in the define clause and thus any record is mapped to any_line.

Because we want all txt values of the matched records, we specify all rows per match.

Finally, we need to distinguish the matched sections (i. e. matched patterns) and therefore assign the value of match_number() to the alias mnr.

with d as (
   select  1 nr, 'some text'              txt from dual union all
   select  2 nr, 'more text'              txt from dual union all
   select  3 nr, 'BEGIN SECTION'          txt from dual union all
   select  4 nr, 'hello'                  txt from dual union all
   select  5 nr, 'world.'                 txt from dual union all
   select  6 nr, 'END SECTION'            txt from dual union all
   select  7 nr, 'uninteresing text'      txt from dual union all
   select  8 nr, 'Really - this is'       txt from dual union all
   select  9 nr, 'still uninteresting'    txt from dual union all
   select 10 nr,  null                    txt from dual union all 
   select 11 nr, 'still uninteresting'    txt from dual union all  
   select 12 nr, 'BEGIN SECTION'          txt from dual union all  
   select 13 nr, 'the number'             txt from dual union all  
   select 14 nr, 'associated with'        txt from dual union all  
   select 15 nr, 'THE answer is...'       txt from dual union all  
   select 16 nr, '42'                     txt from dual union all     
   select 17 nr, 'END SECTION'            txt from dual union all     
   select 18 nr, 'That, my friend, is'    txt from dual union all     
   select 19 nr, 'THE END'                txt from dual
)
select
   m.mnr,
   m.txt,
   m.nr
from
   d
match_recognize (
      order by nr
      measures match_number() as mnr     
      all rows per match
      pattern (first_line any_line*? last_line)
      define
         first_line as prev(first_line.txt) = 'BEGIN SECTION',
         last_line  as next( last_line.txt) = 'END SECTION'
) m;
--
--        MNR TXT                         NR
-- ---------- ------------------- ----------
--          1 hello                        4
--          1 world.                       5
--          2 the number                  13
--          2 associated with             14
--          2 THE answer is...            15
--          2 42                          16

Stock prices

This is a variation of the «stock price» example that is typically used to explain the match_recognize clause. The select statement is suppposed to find prices at their (local) highs and bottoms.

First, we need a table and some data. The column dt is supposed to be date, but to make things a bit easier, it's just a number:

create table tq84_stock_price (
  company varchar2(5),
  dt      number,
  price   number
);

insert into tq84_stock_price values ('ABC',  1,   7);
insert into tq84_stock_price values ('ABC',  2,   9);
insert into tq84_stock_price values ('ABC',  3,  10);
insert into tq84_stock_price values ('ABC',  4,  10);
insert into tq84_stock_price values ('ABC',  5,  11);
insert into tq84_stock_price values ('ABC',  6,   8);
insert into tq84_stock_price values ('ABC',  7,   7);
insert into tq84_stock_price values ('ABC',  8,   6);
insert into tq84_stock_price values ('ABC',  9,   6);
insert into tq84_stock_price values ('ABC', 10,   8);
insert into tq84_stock_price values ('ABC', 11,   9);

insert into tq84_stock_price values ('DEF',  1,   6);
insert into tq84_stock_price values ('DEF',  2,   5);
insert into tq84_stock_price values ('DEF',  3,   7);
insert into tq84_stock_price values ('DEF',  4,   8);
insert into tq84_stock_price values ('DEF',  5,  10);
insert into tq84_stock_price values ('DEF',  6,   9);
insert into tq84_stock_price values ('DEF',  7,  10);
insert into tq84_stock_price values ('DEF',  8,  12);
insert into tq84_stock_price values ('DEF',  9,   7);
insert into tq84_stock_price values ('DEF', 10,   8);
insert into tq84_stock_price values ('DEF', 11,   9);

insert into tq84_stock_price values ('GHI',  1,   9);
insert into tq84_stock_price values ('GHI',  2,   8);
insert into tq84_stock_price values ('GHI',  3,   7);
insert into tq84_stock_price values ('GHI',  4,   6);
insert into tq84_stock_price values ('GHI',  5,   7);
insert into tq84_stock_price values ('GHI',  6,   8);
insert into tq84_stock_price values ('GHI',  7,   7);
insert into tq84_stock_price values ('GHI',  8,   6);
insert into tq84_stock_price values ('GHI',  9,   7);
insert into tq84_stock_price values ('GHI', 10,   8);
insert into tq84_stock_price values ('GHI', 11,   7);

Github repository Oracle-Patterns, path: /SQL/select/match_recognize/stock-price/data.sql

The following query is a poor man's attempt to visualize each company's stock price. It can be seen, for example, that DEF has a bottom prices at days 2, 6 and 9 and high prices at days 5, 8 and 11:

column abc format a20
column def format a20
column ghi format a20

select
  dt,
  max(lpad('o', case when company = 'ABC' then price end))  abc,
  max(lpad('o', case when company = 'DEF' then price end))  def,
  max(lpad('o', case when company = 'GHI' then price end))  ghi
from
  tq84_stock_price
group by
  dt
order by
  dt
--
--  DT ABC                  DEF                  GHI
-- --- -------------------- -------------------- --------------------
--   1       o                   o                       o
--   2         o                o                       o
--   3          o                 o                    o
--   4          o                  o                  o
--   5           o                   o                 o
--   6        o                     o                   o
--   7       o                       o                 o
--   8      o                          o              o
--   9      o                     o                    o
--  10        o                    o                    o
--  11         o                    o                  o
;

Github repository Oracle-Patterns, path: /SQL/select/match_recognize/stock-price/visualization.sql

Now for the select statement.

NOTE: ABC is not reported at all because the price at dt=1 is not recognized as bottom date, nor is the price at dt=11 recognized as high date

select *
from
  tq84_stock_price
  match_recognize (
    partition by company
    order     by dt
    measures
             strt.dt  as start_dt,
      last  (down.dt) as bottom_dt,
      last  (up  .dt) as top_dt
    one row per match
    after match skip to last up
    pattern (strt down+ up+)
    define
      down as down.price < prev(down.price),
      up   as up  .price > prev(up  .price)
  ) mr
order by
  mr.company,
  mr.start_dt
;
--
-- COMPA   START_DT  BOTTOM_DT     TOP_DT
-- ----- ---------- ---------- ----------
-- DEF            1          2          5
-- DEF            5          6          8
-- DEF            8          9         11
-- GHI            1          4          6
-- GHI            6          8         10

Github repository Oracle-Patterns, path: /SQL/select/match_recognize/stock-price/select.sql

Find foo bar baz

Find records with foo, followed by records with bar, followed by records with baz.

create table tq84_t (
  id      number,
  word    varchar2(10)
);


insert into tq84_t values ( 1, 'abc');
insert into tq84_t values ( 2, 'foo');
insert into tq84_t values ( 3, 'def');
insert into tq84_t values ( 4, 'bar');
insert into tq84_t values ( 5, 'baz');
insert into tq84_t values ( 6, 'ghi');
insert into tq84_t values ( 7, 'bar');
insert into tq84_t values ( 8, 'jkl');
insert into tq84_t values ( 9, 'foo');
insert into tq84_t values (10, 'bar');
insert into tq84_t values (11, 'baz');
insert into tq84_t values (12, 'baz');
insert into tq84_t values (13, 'mno');
insert into tq84_t values (14, 'pqr');
insert into tq84_t values (15, 'foo');
insert into tq84_t values (16, 'bar');
insert into tq84_t values (17, 'baz');
insert into tq84_t values (18, 'stu');
insert into tq84_t values (19, 'foo');
insert into tq84_t values (20, 'bar');
insert into tq84_t values (21, 'vwx');
insert into tq84_t values (22, 'baz');


select *
from
  tq84_t
match_recognize (
  order by
    id
  measures
    foo.id         as start_id,
    match_number() as match_counter
  pattern (
    foo bar baz
  )
  define   
    foo as foo.word = 'foo',
    bar as bar.word = 'bar',
    baz as baz.word = 'baz'
)
order by
  start_id;
--
--   START_ID MATCH_COUNTER
-- ---------- -------------
--          9             1
--         15             2

drop table tq84_t purge;

Github repository Oracle-Patterns, path: /SQL/select/match_recognize/find-foo-bar-baz.sql

Find a sequence of the same

Find the first id of sequence of the same words:

create table tq84_t (
   id      number,
   word    varchar2(10)
);

insert into tq84_t values ( 1, 'ab'     );
insert into tq84_t values ( 2, 'ab'     );
insert into tq84_t values ( 3, 'cde'    );
insert into tq84_t values ( 4, 'cde'    );
insert into tq84_t values ( 5, 'cde'    );
insert into tq84_t values ( 6, 'cde'    );
insert into tq84_t values ( 7, 'ab'     );
insert into tq84_t values ( 8, 'cde'    );
insert into tq84_t values ( 9, 'fghij'  );
insert into tq84_t values (10, 'fghij'  );
insert into tq84_t values (11, 'fghij'  );
insert into tq84_t values (12, 'fghij'  );
insert into tq84_t values (13, 'fghij'  );
insert into tq84_t values (14, 'fghij'  );
insert into tq84_t values (15, 'fghij'  );
insert into tq84_t values (16, 'ab'     );
insert into tq84_t values (17, 'ab'     );
insert into tq84_t values (18, 'klmnopq');
insert into tq84_t values (19, 'cde'    );
insert into tq84_t values (20, 'cde'    );

select
   count_same_words  cnt,
   id_start,
   word
from
   tq84_t
match_recognize (
   order by
     id 
   measures
     a_word.word      as word,
     first(a_word.id) as id_start,
--   match_number()   as match_counter,
     count(*)         as count_same_words
  one row per match
  pattern (
     a_word* another_word
  )
  define
    a_word           as a_word.word        = first(a_word.word),
    another_word     as another_word.word != a_word.word
)
order by
  count_same_words desc;
--
--        CNT   ID_START WORD      
-- ---------- ---------- ----------
--          7         10 fghij     
--          4          4 cde       
--          3          1 ab        
--          2         17 ab        
--          2          8 cde     

drop table tq84_t purge;

Github repository Oracle-Patterns, path: /SQL/select/match_recognize/find-sequence-of-same.sql

Find three in a row

create table tq84_t (
   id      number,
   word    varchar2(10)
);


insert into tq84_t values ( 1, 'abc');

insert into tq84_t values ( 2, 'def');
insert into tq84_t values ( 3, 'def');

insert into tq84_t values ( 4, 'ghi');

insert into tq84_t values ( 5, 'jkl');
insert into tq84_t values ( 6, 'jkl');
insert into tq84_t values ( 7, 'jkl');

insert into tq84_t values ( 8, 'mno');

insert into tq84_t values ( 9, 'pqr');
insert into tq84_t values (10, 'pqr');
insert into tq84_t values (11, 'pqr');
insert into tq84_t values (12, 'pqr');

insert into tq84_t values (13, 'stu');

insert into tq84_t values (14, 'vwx');
insert into tq84_t values (15, 'vwx');
insert into tq84_t values (16, 'vwx');

insert into tq84_t values (17, 'yz.');


select * from tq84_t
match_recognize (

  order by
     id

  measures
     first(rec.id)  as start_id,
     rec.id         as end_id,
     match_number() as match_counter

  pattern (
     rec{3}

  )

  define   
     rec as rec.word = first(rec.word)
)
order by
  start_id;
-- 
--   START_ID     END_ID MATCH_COUNTER
-- ---------- ---------- -------------
--          5          7             1
--          9         11             2
--         14         16             3

drop table tq84_t purge;

Github repository Oracle-Patterns, path: /SQL/select/match_recognize/find-three-in-a-row.sql

TODO

Running semantics is used while the rows are being matched into a pattern. When the complete match is established, final semantics becomes possible.

This explains, why final semantics is only possible in the measures clause.

Row pattern navigation operations:

prev(val[, phsical-offset]), next(val [, physical-offset]). (prev and next always have running semantics)
first(val[, logical-offset]), last(val [, logical-offset]). (first and last support running and final semantics)

match_number() assigns a sequential number, starting with 0, to each match within a partition.

classifier() returns a text that corresponds to the pattern variable to which a row is mapped.

AFTER MATCH SKIP

TO NEXT ROW
PAST LAST ROW
TO [FIRST|LAST] pattern_variable (TO pattern_variable is equal to TO LAST pattern_variable)

The universal row pattern variable references the entire set of all rows in a match.

Expressions in the measures and define clauses have the same syntax and semantics except that the measures claus also supports «final semantics»

mrvar

`partition by`	Used to partition the input data. Each partition is treated separately
`order by`	How records in each partition are ordered.
`define`	Definition of pattern variables. All rows that meet the specified conditions are mapped to the defined row pattern variable. All pattern variables must be used in the `pattern` clause, otherwise, Oracle throws ORA-62503: illegal variable definition.
`pattern`	The pattern, expressed with pattern variables and regular expression syntax. All pattern variables must be used in the `pattern` clause (but no variables defined in `subset` must be used)
`measures`	Expressions which can be used in other portions of the SQL statement. Special functions like `last()` and `first()` can be used to refer to a value of the first or last record of matched sequence of records.