读书笔记之Java编程思想第13章-字符串

Java 的基本理念是“结构不佳的代码不能运行”。

try-catch-finally代码块中，无论异常是否抛出，finally代码块总能够执行

String对象是不可变的，String类中每一个看起来会修改String值得方法，实际上都是创建了一个全新的String对象，以包含修改后的字符串内容，而最初的String对象丝毫未动。

 package com.exam.cn;

public class Immutable {
public static String upcase(String s){
s.toUpperCase();
System.out.println(s);
return s.toUpperCase();
}
public static void main(String[] args) {
String q="howdy";
System.out.println(q);
String qq=upcase(q);
System.out.println(qq);
System.out.println(q);
}
}
输出结果
howdy
howdy
HOWDY
howdy

当把q传给upcase方法时，实际传递的是一个引用的拷贝，其实，每当把String对象作为方法的参数时，都会复制一份引用，而该引用所指的对象其实一直待在单一的物理位置上，从未动过。回到upcase()的定义，传入其中的引用有了名字s，只有upcase()方法运行的时候，局部引用s才存在。一旦upcase()运行结束，s就消失了。当然了，upcase()的返回值，其实只是最终结果的引用。这足以说明，upcase返回的引用，已经指向了一个新的对象，而原本的q则还在原地。

StringBuilder是Java SE5引入的，在这之前Java用的是StringBuffer。后者是线程安全的，因此开销也会大些。

正则表达式：一般来说，正则表达式就是以某种方式来描述字符串，在Java中，\\的意思是“我要插入一个正则表达式的反斜线，所以其后的字符具有特殊的意义。”例如，如果你想表示一位数字，那么正则表达式应该是\\d。如果你想插入一个普通的反斜线，则应该这样\\\。不过换行和制表符之类的东西只需要使用单反斜线：\n\t。

要表示“一个或者多个之前的表达式”，应该使用+。所以要表示“可能有一个负号，后面跟着一位或多位数字”，可以这样：-?\\d+

应用正则表达式的最简单的途径，就是String类内建的功能。例如，你可以检查一个String是否匹配如上所述的正则表达式：

 package com.exam.cn;

public class IntegerMatch {
public static void main(String[] args) {
System.out.println("-1234".matches("-?\\d+"));
System.out.println("5678".matches("-?\\d+"));
System.out.println("+911".matches("-?\\d+"));
System.out.println("+911".matches("(-|\\+)?\\d+"));
}
}
输出结果：
true
true
false
true

前两个字符串满足对应的正则表达式，匹配成功。第三个字符串开头有一个+，它也是一个合法的整数，但与对应的正则表达式却不匹配。因此，我们的正则表达式应该描述为：“可能以一个加号或减号开头”。在正则表达式中，括号有着将表达式分组的效果，而竖直线|则表示或操作。也就是：(-|\\+)?，这个正则表达式表示字符串的起始字符可能是一个-或者+，或二者皆没有（后面跟着？修饰符）。以为字符+在正则表达式中有特殊的意义，所以必须使用\\将其转义，使之成为表达式中的一个普通字符。String类还自带了一个非常有用的正则表达式工具——split()方法，其功能是“将字符串从正则表达式匹配的地方切开。”

 package com.exam.cn;

public class Splitting {
public static String knights = "Then, when you hava found the shrubery,you must cut "
+ "down the mightiest tree in the forest... whth... a herring";

public static void split(String regex) {
String[] array = knights.split(regex);
if (array != null && array.length > 0) {
for (int i = 0; i < array.length; i++) {
System.out.print("[" + i + "]" + "=" + array[i]);
}
System.out.println();
}
}

public static void main(String[] args) {
split(" ");
split("\\W+");
split("n\\W+");
}
}
输出结果
[0]=Then,[1]=when[2]=you[3]=hava[4]=found[5]=the[6]=shrubery,you[7]=must[8]=cut[9]=down[10]=the[11]=mightiest[12]=tree[13]=in[14]=the[15]=forest...[16]=whth...[17]=a[18]=herring
[0]=Then[1]=when[2]=you[3]=hava[4]=found[5]=the[6]=shrubery[7]=you[8]=must[9]=cut[10]=down[11]=the[12]=mightiest[13]=tree[14]=in[15]=the[16]=forest[17]=whth[18]=a[19]=herring
[0]=The[1]=whe[2]=you hava found the shrubery,you must cut dow[3]=the mightiest tree i[4]=the forest... whth... a herring

首先看第一个语句，这里用的是普通字符作为正则表达式，按照空格来划分字符串。

第二个和第三个split()都用到了\W,他的意思是非单词字符（如果W小写，\w则表示一个单词字符）。通过第二个例子可以看到，它将标点字符删除了。第三个split()表示“字母n后面跟着一个或多个非单词字符。”可以看到，在原始字符串中，与正则表达式匹配的部分，在最终结果中都不存在了。

String类自带的正则表达式工具是“替换”，你可以只替换正则表达式第一个匹配的子串，或是替换所有匹配的地方。

 package com.exam.cn;

public class Replacing {
static String s="Then, when you hava found the shrubbery,you must cut "
+ "down the mightiest tree in the forest... whth... a herring";
public static void main(String[] args) {
System.out.println(s);
System.out.println(s.replaceFirst("f\\w+", "located"));
System.out.println(s.replaceAll("shrubbery|tree|herring", "banana"));
}
}
输出结果:
Then, when you hava found the shrubbery,you must cut down the mightiest tree in the forest... whth... a herring
Then, when you hava located the shrubbery,you must cut down the mightiest tree in the forest... whth... a herring
Then, when you hava found the banana,you must cut down the mightiest banana in the forest... whth... a banana

第一个表达式要匹配的是，以字母f开头，后面跟一个或多个字母（注意这里的w是小写字母）。并且只替换掉第一个匹配的部分，所以“found”被替换成“located”。

第二个表达式要匹配的是三个单词中的任意一个，因为他们以竖直线分隔表示或，并且替换所有匹配的部分。

字符类
.	任意字符
\W	非词字符,等价于[^\w]
\w	词字符[a-zA-Z0-9]
\D	非数字[^0-9]
\d	数字[0-9]
\S	非空白符(^\s)
\s	空白符（空格，tab，换行，换页和回车）
[abc[hij]]	任意a,b,c,h,i和j字符（与a\|b\|c\|h\|i\|j作用相同）（合并）
[^abc]	除了a、b、c之外的任何字符（否定）
[abc]	包含a、b和c的任何字符（和a\|b\|c作用相同）

作为演示，下面的每一个正则表达式都能成功匹配字符系列“Rudolph”:

 package com.exam.cn;

public class Rudolph {
public static void main(String[] args) {
for(String  Pattern :new String[]{"Rudolph","[rR]udolph","[rR][aeiou][a-z]ol.*","R.*"

})
System.out.println("Rudolph".matches(pattern));
}
}
输出结果:
true
true
true
true

量词描述了一个模式吸收输入文本的方式
X?+	一个或零个X
X*+	零个或多个X
X++	一个或多个X
X{n}+	恰好n次X
X{n,}+	至少n次X
X{n,m}+	X至少n次，且不超过m次

表达式X通常必须用括号括起来，以便它能够按照我们所期望的效果去执行。例如：

abc+

看起来它似乎应该匹配1个或多个abc序列，如果我们把它应用于输入字符串abcabcabc，则实际上会获得3个匹配。然而，这个表达式实际上表示的是：匹配ab，后面跟随1个或多个c。要表明匹配1个或多个完整的abc字符串，我们必须这样表示：

(abc)+

多数正则表达式操作都接受CharSequence类型的参数。一般来说，比起功能有限的String类，我们更愿意构造功能强大的正则表达式对象。

 package com.exam.cn;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class TestRegularExpression {
public static void main(String[] args) {
args = new String[] { "abcabcabcdefabc", "abc+" ,"(abc)+","(abc){2,}"};
for (String arg : args) {
System.out.println("正则表达式是:\"" + arg + "\"");
Pattern pattern = Pattern.compile(arg);
Matcher m = pattern.matcher(args[0]);
while (m.find()) {
System.out.println("匹配内容 \"" + m.group() + "\" at 位置区间" + m.start() + "-" + (m.end() - 1));
}
}

}
}
输出结果：
正则表达式是:"abcabcabcdefabc"
匹配内容 "abcabcabcdefabc" at 位置区间0-14
正则表达式是:"abc+"
匹配内容 "abc" at 位置区间0-2
匹配内容 "abc" at 位置区间3-5
匹配内容 "abc" at 位置区间6-8
匹配内容 "abc" at 位置区间12-14
正则表达式是:"(abc)+"
匹配内容 "abcabcabc" at 位置区间0-8
匹配内容 "abc" at 位置区间12-14
正则表达式是:"(abc){2,}"
匹配内容 "abcabcabc" at 位置区间0-8

Pattern对象表示编译后的正则表达式。Pattern类还提供了static方法；

 static boolean matches(String regex,CharSequence input)

该方法用以检查regex是否匹配整个CharSequence类型的input参数。编译后的Pattern对象还提供了split()方法，它从匹配了regex的地方分割输入字符串，返回分割后的String数组。通过调用Pattern.matcher()方法，并传入一个字符串参数，我们得到了一个Matcher对象。使用Matcher上的方法，我们将能够判断各种不同类型的匹配是否成功：

 package com.exam.cn;

import java.util.Arrays;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PatternTest {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("def");

String[] a = pattern.split("abcabc2abc3defabcdefabc4abcdefabcawvaabc");
System.out.println(Arrays.asList(a));
Matcher m = pattern.matcher("abcabc2abc3defabcdefabc4abcdefabcawvaabc");
boolean b = m.lookingAt();
"abcabc2abc3defabcdefabc4abcdefabcawvaabc".startsWith("def");
System.out.println(b);
}
}
输出结果：
[abcabc2abc3, abc, abc4abc, abcawvaabc]
false

 boolean matches();
boolean lookingAt();
boolean find();
boolean find(int start);

其中的matches()方法用来判断整个输入字符串是否匹配正则表达式模式，而lookingAt()则用来判断该字符串（不必是整个字符串）的起始部分是否能够匹配模式，与startsWith相似

 package com.exam.cn;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Finding {
public static void main(String[] args) {
Matcher m=Pattern.compile("\\w").matcher("Evening is full of the linnet's wings");
while (m.find())
System.out.print(m.group()+" ");
System.out.println();
int i=0;
while (m.find(i)) {
System.out.print(m.group()+" ");
i++;
}
}
}
输出结果：
E v e n i n g i s f u l l o f t h e l i n n e t s w i n g s 
E v e n i n g i i s f f u l l o o f t t h e l l i n n e t s s w w i n g s

模式\\w+将字符串划分为单词。find像迭代器那样前向遍历输入字符串。而第二个find()能够接受一个整数作为参数，该整数表示字符串中字符的位置，并以其作为搜索的起点。

split()方法将输入字符串断开成字符串对象数组，断开边界由下列正则表达式确定：

 package com.exam.cn;

import java.util.Arrays;
import java.util.regex.Pattern;

public class SplitDemo {
public static void main(String[] args) {
String input="This!!unusual use!!of exclamation!!points!!abc";
System.out.println(Arrays.toString(Pattern.compile("!!").split(input)));
System.out.println(Arrays.toString(Pattern.compile("!!").split(input,3)));
}
}
输出结果:
[This, unusual use, of exclamation, points, abc]
[This, unusual use, of exclamation!!points!!abc]

第二种形式的split方法split(CharSequence input,int input)可以限制将输入分割成字符串的数量。正则表达式特别便于替换文本，它提供了许多方法：replaceFirst(String replacement)替换第一个匹配成功的部分，replaceAll(String replacement)替换全部匹配,appendReplacement(StringBuffer sbuf,String replacement)执行渐进式的替换，而不是像replaceFirst()和replaceAll()那样只替换第一个匹配或全部匹配。这是一个非常重要的方法。它允许你调用其他方法来生成或处理replacement（replaceFirst()和replaceAll()则只能使用一个固定的字符串），使你能够以编程的方式将目标分割成组，从而具备更强大的替换功能。appendTail(StringBuffer sbuf),在执行了一次或多次appendReplacement()之后，调用此方法可以将输入字符串余下的部分复制到sbuf中。

 package com.exam.cn;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import net.mindview.util.TextFile;
/*!
Here's a block of text to use as input to the regular expression matcher.Note that we'll first extract the 
 block of text by looking for  the special delimiters,then process the extracted block.
!*/public class TheReplacements {
public static void main(String[] args) {
String s = TextFile.read("src/com/exam/cn/TheReplacements.java");
// System.out.println(s);
Matcher mInput = Pattern.compile("/\\*!(.*)!\\*/", Pattern.DOTALL).matcher(s);
if (mInput.find())
s = mInput.group(1);
System.out.println("---------------------------------------------------------------------");
System.out.println(s);
System.out.println("---------------------------------------------------------------------");
s = s.replaceAll(" {2,}", " ");
s = s.replaceAll("(?m)^ +", "");
System.out.println(s);
s = s.replaceFirst("[aeiou]", "(VOWEL1)");
StringBuffer sBuffer = new StringBuffer();
Pattern p = Pattern.compile("[aeiou]");
Matcher m = p.matcher(s);
while (m.find())
m.appendReplacement(sBuffer, m.group().toUpperCase());
m.appendTail(sBuffer);
System.out.println(sBuffer);
}
}
输出结果:
---------------------------------------------------------------------

Here's a block of text to use as input to the regular expression matcher.Note that we'll first extract the 
 block of text by looking for  the special delimiters,then process the extracted block.

---------------------------------------------------------------------

Here's a block of text to use as input to the regular expression matcher.Note that we'll first extract the 
block of text by looking for the special delimiters,then process the extracted block.


H(VOWEL1)rE's A blOck Of tExt tO UsE As InpUt tO thE rEgUlAr ExprEssIOn mAtchEr.NOtE thAt wE'll fIrst ExtrAct thE 
blOck Of tExt by lOOkIng fOr thE spEcIAl dElImItErs,thEn prOcEss thE ExtrActEd blOck.

此处使用TextFile类打开并读入文件，该类在net.mindview.util工具包中。下载地址：提取码qwer。static read()方法读入整个文件，将其内容作为String对象返回。mInput用以匹配在/*!和!*/之间的所有文字（注意分组的括号）。接下来，将存在两个或者两个以上空格的地方，缩减为一个空格，并且删除每行开头部分的所有空格（为了使每一行都达到这个效果，而不仅仅只是删除文本开头部分的空格，这里特意打开了多行状态）。这两个替换操作所使用的replaceAlll()是String对象自带的方法，在这里，使用此方法更方便。注意，因为这两个替换操作都只使用了一次replaceAll()，所以，与其编译为Pattern，不如直接使用String的replaceAll()方法，而且开销更小些。

appendReplacement()方法允许你在执行替换的过程中，操作用来替换的字符串，在这个例子中，先构造了sbuf用来保存最终结果，然后用group选择一个组，并对其进行处理，将正则表达式找到的元音字母转换成大写字母，一般情况下，你应该遍历执行所有的替换操作，然后再调用appendTail()方法，但是，如果你想模拟replaceFirst(或替换n次)的行为，那就只需执行一次替换，然后调用appendTail()方法，将剩余未处理的部分存入sbuf即可。

 package com.exam.cn;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Resetting {
public static void main(String[] args) {
Matcher m = Pattern.compile("[frb][aiu][gx]").matcher("fix the rug with bags");
while (m.find())
System.out.println(m.group() + " ");
System.out.println();
m.reset("fix the rig with rags");
while (m.find())
System.out.println(m.group() + " ");
}
}
输出结果：
fix 
rug 
bag 
fix 
rig 
rag

使用不带参数的reset()方法，可以将Matcher()对象重新设置到当前字符序列的起始位置。到目前为止，我们看到的例子都是将正则表达式应用于静态的字符串。下面的例子将向你演示，如何应用正则表达式在一个文件中进行搜索匹配操作。

 package com.exam.cn;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

import net.mindview.util.TextFile;

public class JGrep {
public static void main(String[] args) {
String[] args1=new String[]{
"src/com/exam/cn/JGrep.java","\\b[Ssct]\\w+"
};
Pattern p=Pattern.compile(args1[1]);
int index=0;
Matcher m=p.matcher("");
for(String line:new TextFile(args1[0])){
System.out.println(line);
m.reset(line);
while (m.find()) 
System.out.println(index++ + ": " +m.group() + ": "+m.start());
};
}
}
输出结果：
0: com: 8
1: cn: 17
2: class: 7
3: static: 8
4: String: 25
5: String: 2
6: String: 21
7: src: 5
8: com: 9
9: cn: 18
10: Ssct: 38
11: compile: 20
12: String: 6
13: System: 4
14: start: 58

通过net.mindview.util.TextFile对象将文件打开，读入所有的行后，并存在一个ArrayList中。因此，可以用循环来迭代遍历TextFile对象中的所有行。虽然也可以在for循环内部创建新的Matcher对象，但是在循环外创建一个空的Matcher对象，然后用reset()方法每次为Matcher加载一行输入，这种处理会有一定的性能优化。最后用find()搜索结果。这里读入的测试参数是JGrep.java文件，然后搜索以[Ssct]开头的单词。

 package com.exam.cn;

import java.util.Scanner;

public class ScannerDelimiter {
public static void main(String[] args) {
Scanner scanner = new Scanner("12,42,78,99,42");
scanner.useDelimiter("\\s*,\\s*");
while (scanner.hasNextInt()) {
System.out.println(scanner.nextInt());
}
}
}
输出结果:
12
42
78
99
42

这个例子使用逗号（包括逗号前后任意的空白字符）作为定界符，同样的技术也可以用来读取逗号分隔的文件。我们可以用useDelimiter()来设置定界符，同时还有一个delimiter()方法，用来返回当前正在作为定界符使用的Pattern对象。

除了能够扫描基本类型之外，你还可以使用自定义的正则表达式进行扫描，这在扫描复杂数据的时候非常有用。下面的例子将扫描一个防火墙日志文件中记录的威胁数据：

 package com.exam.cn;

import java.util.Scanner;
import java.util.regex.MatchResult;

public class ThreatAnalyzer {
static String threatData="58.27.82.161@02/10/2005\n204.45.234.40@02/11/2005\n58.27.82.161@02/11/2005\n58.27.82.161@02/11/2005\n58.27.82.161@02/11/2005\n"
+ "[Next log section with different data format]";
public static void main(String[] args) {
Scanner scanner=new Scanner(threatData);
String pattern="(\\d+[.]\\d+[.]\\d+[.]\\d+)@(\\d{2}/\\d{2}/\\d{4})";
while (scanner.hasNext(pattern)) {
scanner.next(pattern);
MatchResult match=scanner.match();
String ip=match.group(1);
String date=match.group(2);
System.out.format("Threat on %s from %s\n", date,ip);

}
}
;
}
输出结果:
Threat on 02/10/2005 from 58.27.82.161
Threat on 02/11/2005 from 204.45.234.40
Threat on 02/11/2005 from 58.27.82.161
Threat on 02/11/2005 from 58.27.82.161
Threat on 02/11/2005 from 58.27.82.161

智云一二三科技

读书笔记之Java编程思想第13章-字符串

关于作者: 智云科技

给这篇文章的作者打赏

关于作者: 智云科技

相关文章

Map和List的几种遍历方式

全网大佬都在用的Java+Python这两套视频学习教程，学习很重要

JavaWeb快速进阶全套教程(程序员必备2020版)：视频+笔记+源码

热门文章

1分享新浪图床上传接口源码

2PHP简单实现路由Route功能

3Tideways、xhprof 和 xhgui 打造 PHP 非侵入式监控平台

4centos系统如何查看是否安装了mysql

5curl 工具简述