使用awk统计log信息

假设我们在应用中会将执行时间超过设定阀值的函数都打印出来,内容如下
2006-01-11 07:52:00,222 FATAL – Call class com.cyber.ldap.LdapMemberDAO.getMemberById take 312 ms.
2006-01-11 07:52:00,404 ERROR – Call class com.cyber.dao.ArticleSearchImpl.getArticle take 180 ms.
2006-01-11 07:52:00,536 ERROR – Call class com.cyber.member.ProfileManager.getMemberById take 632 ms.
2006-01-11 07:52:13,193 FATAL – Call class com.cyber.dao.IbatisArticleDAO.getArticleWithDetail take 644 ms.
2006-01-11 07:52:13,339 FATAL – Call class com.cyber.dao.IbatisCommentDAO.getComments take 123 ms.
2006-01-11 07:52:00,222 FATAL – Call class com.cyber.ldap.LdapMemberDAO.getMemberById take 434 ms.
2006-01-11 07:52:00,404 ERROR – Call class com.cyber.dao.ArticleSearchImpl.getArticle take 642 ms.
2006-01-11 07:52:00,536 ERROR – Call class com.cyber.member.ProfileManager.getMemberById take 1234 ms.
2006-01-11 07:52:13,193 FATAL – Call class com.cyber.dao.IbatisArticleDAO.getArticleWithDetail take 44 ms.
2006-01-11 07:52:13,339 FATAL – Call class com.cyber.dao.IbatisCommentDAO.getComments take 1231 ms.

现在我们需要统计这些执行慢的方法的平均执行时间及执行次数,好找到问题来进行性能调优,所以写了一段简单的awk脚本
BEGIN{c=0;n=0;cnt=0;print “total\tcnt\tavg\tmethod”;}
{ if(n==0){
    n=1;na=$1;c=$2;cnt=1;
  }else if(n==1){
    if(na==$1){
       c+=$2;
       cnt++;
    }else {
       avg=c/cnt;
       print c”\t”cnt”\t”avg”\t”na;
       na=$1;c=$2;cnt=1;
    }
  }
}
将这段程序保存到文件stat.awk里(随便你了)

然后执行cat performance.log | awk ‘{print $7″ “$9}’|sort -k 1 |awk -f stat.awk |sort -nrk 3
就可以得到结果了
2708    2       677     com.cyber.dao.IbatisCommentDAO.getComments
1644    2       411     com.cyber.dao.ArticleSearchImpl.getArticle
3732    2       373     com.cyber.member.ProfileManager.getMemberById
1492    2       373     com.cyber.ldap.LdapMemberDAO.getMemberById
1376    2       344     com.cyber.dao.IbatisArticleDAO.getArticleWithDetail

很简单,但是很有用

还是稍微解释一下吧
1. cat performance.log不用说了吧,就是输出performance.log的内容
2. awk ‘{print $7″ “$9}’ 就是找出一行:
    2006-01-11 07:52:00,222 FATAL – Call class com.cyber.ldap.LdapMemberDAO.getMemberById take 312 ms.
   中方法名和时间这两个字段
3. sort -k 1 表示将结果按第一列排序
4. awk -f stat.awk 就是执行我们上面的程序,统计出每个方法的执行次数,总执行时间和平均执行时间,并打印出来
5. sort -nrk 3 表示按第三列以数字的方式倒序排列出来

清楚了吧,呵呵

Advertisements

3 Comments »

  1. hq00e said

    将stat.awk改成这样试试:

    BEGIN{ OFS=”\t”; print “total\tcnt\tavg\tmethod” }
    { a[$1]++; b[$1]+=$2}
    END{ for (i in a) print b[i], a[i], b[i]/a[i], i }

  2. argan said

    恩,是的,改了一下,简洁多了,呵呵,我的脚本都是好不容易折腾出来的

    只要 cat performance.log | awk -f pstat.awk |sort -nrk3 就可以了
    前面的一些排序什么的处理都不需要了

  3. hq00e said

    来到这里就觉得似曾相识,看到评论才知道原来我有来过: )

RSS feed for comments on this post · TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: