首页 > java字符串按空格分割,但不包含字符串中的空格,这个正则表达式怎么写?

java字符串按空格分割,但不包含字符串中的空格,这个正则表达式怎么写?

示例字符串

String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 +0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\"";
String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\"";
String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\"";
String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 +0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\"";

这个是一个访问日志,其中四条记录,其余记录也是四类中的一个,我希望分割之后的结果是:

127.0.0.1
-
[05/Nov/2015:15:07:18 +0800]
"GET /accounts/*** HTTP/1.1"
200
2426
"-"
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"
0.031
0.031
"192.168.222.251"

我最初的实现
考虑到字符串中包含的字符串的复杂性以及不可预测性,我决定获取字符串中的双引号的内容,代码如下:

    public static void main(String[] args) {
        String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 +0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\"";
        String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\"";
        String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\"";
        String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 +0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\"";
        Pattern p = Pattern.compile("\"[\\w\\s\\p{Punct}&&[^\"]]*\"");
        List<String> lines = new ArrayList<String>();
        lines.add(text1);
        lines.add(text2);
        lines.add(text3);
        lines.add(text4);
        for (String str : lines) {
            System.out.println("****************************************");
            Matcher matcher = p.matcher(str);
            while (matcher.find()) {
                System.out.println(matcher.group());
            }
        }
    }

输出结果如下:

****************************************
"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1"
"-"
"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36"
"192.168.222.251"
****************************************
"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1"
"-"
"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30"
"192.168.222.35"
****************************************
"GET /favicon.ico HTTP/1.1"
"-"
"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30"
"192.168.222.35"
****************************************
"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1"
"-"
"Mozilla/4.0"
"101.226.62.82"

就这样我获取到字符串,其余部分的内容在根据subString()去截取。

显然这个做法不是最佳实践。后来我的leader看了之后,他说没必要这么复杂,果然没过几分钟,给我写了一个新的正则表达式。

改进的实现

    public static void main(String[] args) {
        String text1 = "127.0.0.1 - - [05/Nov/2015:15:06:34 +0800] \"GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 HTTP/1.1\" 200 2426 \"-\" \"Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.63 Safari/537.36\" 0.012 0.012 \"192.168.222.251\"";
        String text2 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /accounts/54fd0571e4b055a0030461fb HTTP/1.1\" 200 814 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.005 0.005 \"192.168.222.35\"";
        String text3 = "127.0.0.1 - - [05/Nov/2015:15:24:40 +0800] \"GET /favicon.ico HTTP/1.1\" 404 992 \"-\" \"Mozilla/5.0 (Linux; U; Android 4.4.2; zh-CN; HUAWEI P7-L07 Build/HuaweiP7-L07) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 UCBrowser/10.8.0.654 U3/0.8.0 Mobile Safari/534.30\" 0.040 0.040 \"192.168.222.35\"";
        String text4 = "127.0.0.1 - - [05/Nov/2015:23:55:11 +0800] \"POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 HTTP/1.1\" 200 298 \"-\" \"Mozilla/4.0\" 0.019 0.019 \"101.226.62.82\"";
        Pattern p = Pattern.compile(
                "^([\\d.]+) (\\S+) (\\S+) \\[(.+)\\] \"(GET|POST|DELETE|PUT|HEAD) (\\S+) (\\S+)\" (\\d+) (\\d+) \"(\\S+)\" \"(.+)\" (\\S+) (\\S+) \"([\\d.]+)\"");
        List<String> lines = new ArrayList<String>();
        lines.add(text1);
        lines.add(text2);
        lines.add(text3);
        lines.add(text4);
        for (String line : lines) {
            System.out.println("****************************************");
            Matcher matcher = p.matcher(line);
            if (matcher.find()) {
                System.out.print(matcher.group(4) + " ");
                System.out.print(matcher.group(5) + " ");
                System.out.print(matcher.group(6) + " ");
                System.out.print(matcher.group(8) + " ");
                System.out.println(matcher.group(14));
            }
        }
    }

输出结果:

****************************************
05/Nov/2015:15:06:34 +0800 GET /accounts/accountIds/54d9c155e4b0abe717853ee1,55bb3f44e4b059498d77ae86,54dab42de4b07ae8cd725287,561ca2a6e4b08acc10be9a71 200 192.168.222.251
****************************************
05/Nov/2015:15:24:40 +0800 GET /accounts/54fd0571e4b055a0030461fb 200 192.168.222.35
****************************************
05/Nov/2015:15:24:40 +0800 GET /favicon.ico 404 192.168.222.35
****************************************
05/Nov/2015:23:55:11 +0800 POST /wechat/wx6559dc399869bc69?signature=52205b5eb43b04d686ab6722f819e6e051d2c7b0&timestamp=1446738911&nonce=1501185542&encrypt_type=aes&msg_signature=9247d3d7cd562f862f9a7111f413f37cdca5c872 200 101.226.62.82

现在可以做到想取哪部分group结果就取哪部分。

【热门文章】
【热门文章】