Java Regular Expressions – 3

Character Classes

If we browse through java regular expressions, we immediately find a table summarizing regular expression constructs. Yes, that’s what we are going to see here.

In below table, the left-hand column specifies the regular expression constructs, while the right-hand column describes the conditions under which each construct will match.

ConstructDescription
[abc]
a, b, or c (simple class)
[^abc]
Any character except a, b, or c (negation)
[a-zA-Z]
a through z, or A through Z, inclusive (range)
[a-d[m-p]]
a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]
d, e, or f (intersection)
[a-z&&[^bc]
a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]]
a through z, and not m through p: [a-lq-z] (subtraction)

Note: For the sake of simplicity I am not showing the code but only the console screen. You can refer the previous part for sample code.

Simple classes – [abc]

This expression will match for each character mentioned in square bracket. Here we used [abcde], so it replaces a, b,c,d,e into *.

Input String: Tech.Bruiser
Regular Expression: [abcde]
Replacement String: *
Output String: T**h.Bruis*r

Negation – [^abc]

To match all characters except those listed, insert the “^” metacharacter at the beginning of the character class.

Input String: Tech.Bruiser
Regular Expression: [^abcde]
Replacement String: *
Output String: *ec*******e*

Ranges – [a-zA-Z]

To match a range of characters we can use [a-zA-Z] for alphabets and [1-9] for numeric.

Input String: Tech.Bruiser
Regular Expression: [a-e]
Replacement String: *
Output String: T**h.Bruis*r
Input String: Tech.Bruiser
Regular Expression: [^a-e]
Replacement String: *
Output String: *ec*******e*
Input String: Tech.Bruiser673
Regular Expression: [5-7]
Replacement String: *
Output String: Tech.Bruiser**3
Input String: Tech.Bruiser673
Regular Expression: [^5-7]
Replacement String: *
Output String: ************67*

Unions – [a-d[m-p]]

What if I want to match 2 or more ranges to be matched ? Then union the right option for that.

Input String: Tech.Bruiser673
Regular Expression: [a-c[S-U]]
Replacement String: *
Output String: *e*h.Bruiser673
Input String: Tech.Bruiser6738
Regular Expression: [1-3[5-7]]
Replacement String: *
Output String: Tech.Bruiser***8
Regular Expression: [1-3[A-C]]
Replacement String: *
Output String: Tech.*ruiser67*8

Intersections – [a-z&&[def]

To match characters common between 2 ranges. For [0-5&&[3-9]] , the common values are 3,4 and 5.

Input String: Tech.Bruiser6738
Regular Expression: [0-9&&[345]]
Replacement String: *
Output String: Tech.Bruiser67*8
Input String: Tech.Bruiser6738
Regular Expression: [a-z&&[c-e]]
Replacement String: *
Output String: T**h.Bruis*r6738
Input String: Tech.Bruiser6738
Regular Expression: [0-9&&[3-5]]
Replacement String: *
Output String: Tech.Bruiser67*8

Subtraction – [a-z&&[^m-p]]

To match characters which are not common in given characters set. For [0-9&&[^345]], the matching characters are 0,1,2,6,7,8 and 9.

Input String: Tech.Bruiser6738
Regular Expression: [0-9&&[^345]]
Replacement String: *
Output String: Tech.Bruiser**3*