Java Regular Expressions – 2

String Literals

The most basic form of pattern matching is form of String matching. The pattern here is nothing but a simple string search we want to perform on the given input string.

From here, let’s take String replace method which accepts a regularExpression and replacementString as arguments.

public static void main(String[] args) {
		String inputString = "Tech Bruiser";
		String regExp = "Tech";
		String replacementString = "Hello";
		String outputString = inputString.replaceAll(regExp, replacementString);
		displayInConsole(inputString, regExp, replacementString, outputString);
	}
Input String: Tech Bruiser
Regular Expression: Tech
Replacement String: Hello
Output String: Hello Bruiser

Metacharacters

These are the special characters that affect the way a pattern is matched. From the previous example, lets change the regular expression as “Tech.” and look at the results,

public static void main(String[] args) {
		String inputString = "Tech Bruiser";
		String regExp = "Tech.";
		String replacementString = "Hello";
		String outputString = inputString.replaceAll(regExp, replacementString);
		displayInConsole(inputString, regExp, replacementString, outputString);
	}
Input String: Tech Bruiser
Regular Expression: Tech.
Replacement String: Hello
Output String: HelloBruiser

Did you notice ? We still get same results even though the input didn’t contain any dot “.”. Because, the dot is a metacharacter – a character with special meaning interpreted by the matcher. Here metacharacter “.” means “any character” which is why the match succeeds in this example.

The supported metacharacters are <([{\^-=$!|]})?*+.>

What if I want to search a character but it is in metacharacter list ?

In this case,

  • precede the metacharacter with a backslash, or
  • enclose it within \Q (which starts the quote) and \E (which ends it).
public static void main(String[] args) {
		String inputString = "Tech.Bruiser";
		String regExp = "\\.";//OR "\\Q."
		String replacementString = " ";
		String outputString = inputString.replaceAll(regExp, replacementString);
		displayInConsole(inputString, regExp, replacementString, outputString);
	}
	
Input String: Tech.Bruiser
Regular Expression: \.
Replacement String:  
Output String: Tech Bruiser

Wait, it is mentioned to precede with a backslash, but I used twice, why ?

because in Java’s strings, \ is the escape character used to denote special characters (example: tabs, new lines, etc.) and if a string contains a \ then it must itself be escaped, by prepending another \ in front of it. Hence, \\. or \\Q.