Search notes:

R function: cut

cut assigns each element of a vector to a «bin». The sizes (start and end values) of each bin is indicated with the second argument.
In the following example, the vector v containes elements between 2 and 19.
Each of these elements is assigned a bin, the first of which spans 0…5, the second of which spans 5…10 etc (according to the second parameter).
The returned value (c) is a vector (actually: a factor) that contains the bin for each element of v. c has the same length as v.
The first element (8) falls between (5,10], the second element (13) falls between (10,15] etc.
v <- c( 8, 13, 19, 3, 14, 7, 6, 12, 18, 9, 7, 14, 2, 3, 8, 11, 17)

c <- cut(v, c(0, 5, 10, 15, 20))

c
#
#  [1] (5,10]  (10,15] (15,20] (0,5]   (10,15] (5,10]  (5,10]  (10,15] (15,20]
# [10] (5,10]  (5,10]  (10,15] (0,5]   (0,5]   (5,10]  (10,15] (15,20]
# Levels: (0,5] (5,10] (10,15] (15,20]

typeof(c)
# [1] "integer"
Github repository about-r, path: /functions/cut/basic.R

The returned factor and its levels

cut returns a factor whose (default) levels look like (start,end]). Although this notation looks mathemtical, the levels are in fact characters:
f <- cut(1:4, 2*0:2)
class(f)
#
# [1] "factor"

levels(f)
#
# [1] "(0,2]" "(2,4]"

typeof(attr(f, 'levels'))
#
# [1] "character"
Github repository about-r, path: /functions/cut/factor.R

Using non-default labels

The levels of the returned factor can be named with the labels argument. The following example assigns a (textual) code (low, medium, high) to a (numerical) rating in the range between 0 and 10.
The data frame is used to print the rating and its code side by side:
rating <- c( 7, 3, 2, 6, 9, 8, 5)
code   <- cut(rating, c(0, 3.5,  6.5, 10), labels = c('low', 'medium', 'high'))

data.frame (
  rating,
  code
)
#
#   rating   code
# 1      7   high
# 2      3    low
# 3      2    low
# 4      6 medium
# 5      9   high
# 6      8   high
# 7      5 medium
Github repository about-r, path: /functions/cut/labels.R

Using no labels at all

If the argument labels is set to FALSE, cut does not return a factor but an integer vector (that indicates the bucket or bin of the interval into which a number is assigned):
n      <- c (876, 54, 3, 10, 100, 99, 999)
digits <- cut(n, c(1, 9, 99, 999), labels=FALSE)

digits
#
# [1] 3 2 1 2 3 2 3

class(digits)
#
# [1] "integer"
Github repository about-r, path: /functions/cut/labels-FALSE.r

Getting the bucket number from a factor

It is also possible to get the bucket number by coercing the returned factor with as.integer (See also Applying as.integer on a factor).
bucket <- cut( c( 4, 7 ), c(0, 5, 10) )

bucket
#
# [1] (0,5]  (5,10]
# Levels: (0,5] (5,10]

as.integer(bucket)
#
# [1] 1 2
Github repository about-r, path: /functions/cut/as.integer.R

Distributing the vector's elements into n equally sized intervals

A special case is if the second parameter is a single number (a vector of length 1). In this case, cut will assign the input vector's elements into the specified number of equally sized intervals:
n <- c ( 1, 250, 251, 500, 501, 750, 750, 999, 1000)
interval <- cut(n, 4)

levels(interval)
#
# [1] "(0.001,251]" "(251,500]"   "(500,750]"   "(750,1e+03]"

as.integer(interval)
#
# [1] 1 1 2 2 3 3 3 4 4
Github repository about-r, path: /functions/cut/n-intervals.R

See also

Index to (some) R functions

Index