Java Regular Expressions: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 80: Line 80:
The regular expression may define ''capturing groups'', which can be retrieved via the group(), groupCount(), group(int index) and group(String name) state accessors.
The regular expression may define ''capturing groups'', which can be retrieved via the group(), groupCount(), group(int index) and group(String name) state accessors.


Group 0 denotes the entire pattern, so m.group(0) is equivalent to m.group().
Group 0 denotes the entire pattern, so m.group(0) is equivalent to m.group(). If the match was successful but the group specified failed to match any part of the input sequence, then null is returned.


If the match was successful but the group specified failed to match any part of the input sequence, then null is returned.
The example below is attempting to match words that include (or not) a sequence of "a"s. The words are separated by colons. When we encounter a match, we display the state of the matcher, including the capturing groups.


An example is available below:
<syntaxhighlight lang='java'>
Pattern PATTERN = Pattern.compile("[b-z]+(a*)[b-z]+:");
 
String argument="blah:blaaaaaah:blh:";
Matcher m = PATTERN.matcher(argument);
 
int i = 1;
 
while(m.find()) {
 
  System.out.println("match " + (i ++) + ":");
  System.out.println("      match starts at: " + m.start());
  System.out.println("        match ends at: " + m.end());
  System.out.println(" group count for match: " + m.groupCount());
  System.out.println("    group(0) for match: " + m.group(0));
  System.out.println("    group(1) for match: " + m.group(1));
}
</syntaxhighlight>


==Replacing Matched Sequences==
==Replacing Matched Sequences==

Revision as of 22:56, 29 July 2017

External

Internal

Overview

Regular expressions can be used in Java via the String API or java.util.regex API.

java.util.regex API

The default sequence for using regular expressions consists in building a Pattern instance, which then can be matched against multiple strings by applying it via Matcher instances. The Pattern instance contains a compiled representation of the regular expression. The Matcher uses the Pattern, but encapsulates all the state required to perform matching against a String, so the Pattern can be shared by multiple Matchers. The Matcher instances are not thread safe, see Concurrent Usage Considerations below.

public class Example {

  public static final Pattern PATTERN = Pattern.compile("red");

  ...

  public void useRegex(String argument) {

      Matcher m = PATTRN.matcher(argument);

      ...

  }

Once built, a Matcher instance can be used to match or find.

Matcher.matches()

The Matcher.matches() method attempts to match the entire input sequence against the pattern. The result of the invocation is binary, the entire input sequence either matches the regular expression or not. In the context of the above example,

String argument = "red";
Matcher m = PATTRN.matcher(argument);
m.matches();

returns true, while

String argument = "credential";
Matcher m = PATTRN.matcher(argument);
m.matches();

returns false.

Matcher.find()

Matcher.find() can be used to repeatedly scan the input sequence, and it will look for the next subsequence that matches the pattern. The whole input sequence does not need to match the patter for find() to return true, it is sufficient if a subsequence of it does. The typical way find() is used is shown below:

Matcher m = PATTERN.matcher(argument);

int i = 1;

while(m.find()) {

    int s = m.start();
    int e = m.end();

    System.out.println("matching subsequence " + i + " starts at " + s + " and ends at " + e);

    i ++;
}

Note that the initial state of the Matcher instance is undefined, and an attempt to use state access methods like start(), end() will throw an IllegalStateException "No match available".

Using Groups

The regular expression may define capturing groups, which can be retrieved via the group(), groupCount(), group(int index) and group(String name) state accessors.

Group 0 denotes the entire pattern, so m.group(0) is equivalent to m.group(). If the match was successful but the group specified failed to match any part of the input sequence, then null is returned.

The example below is attempting to match words that include (or not) a sequence of "a"s. The words are separated by colons. When we encounter a match, we display the state of the matcher, including the capturing groups.

Pattern PATTERN = Pattern.compile("[b-z]+(a*)[b-z]+:");

String argument="blah:blaaaaaah:blh:";
Matcher m = PATTERN.matcher(argument);

int i = 1;

while(m.find()) {

  System.out.println("match " + (i ++) + ":");
  System.out.println("       match starts at: " + m.start());
  System.out.println("         match ends at: " + m.end());
  System.out.println(" group count for match: " + m.groupCount());
  System.out.println("    group(0) for match: " + m.group(0));
  System.out.println("    group(1) for match: " + m.group(1));
}

Replacing Matched Sequences

The Matcher class exposes API for replacing matched subsequences with new strings whose contents can, be computed from the match result. Those methods are Matcher.replaceAll(), Matcher.appendReplacement() and Matcher.appendTail().

Matcher.lookingAt()

java.util.regex Examples

Working code is available here:

https://github.com/NovaOrdis/playground/tree/master/java/regex/simplest

java.langString API

String s = "...";
s.matches(...);

While convenient in some cases, the String API also delegates to the java.util.regex API via the Pattern.matches() call. This method is not efficient when used repeatedly, because it internally builds a Pattern instance on each invocation. If matching against the same regular expression is to be done repeatedly, java.util.regex API is preferred.

Concurrent Usage Considerations


Matcher instances are NOT thread safe, create a matcher per thread

Regular Expression Syntax

Greedy Matching

Quantifiers are by default greedy. To turn them into reluctant qualifiers, append an "?" at the end of the qualifier.