今天接到了一个任务对于有一定规则的日志提取其中的a 字段并进行去重处理,主要用到了awk, 特此记录一下。
“112.65.201.58” – “-” – “[28/Feb/2017:00:08:21 +0800]” – “GET /track_proxy?tid=dc-811&cid=148820998091312764&dr=https%3A%2F%2Funitradeprod.alipay.com%2Facq%2FcashierReturn.htm%3Fsign%3DK1iSL1gljThca54X9aqL9TtzAbX82IDE0IXFEUvH7LSmdw06OpwU9sKt74VQ8Q%25253D%25253D%26outTradeNo%3DOBS000036105%26pid%3D2088121814027143%26type%3D1&sr=1920*1080&vp=1730*863&de=UTF-8&sd=24-bit&ul=zh-cn&je=0&fl=24.0%20r0&t=pulse&ni=1&dl=https%3A%2F%2Fwww.ikea-sh.cn%2Fcheckout%2Fmultipage%2Fsuccess%2F&dt=%E7%BB%93%E7%AE%97%E6%88%90%E5%8A%9F&ub=0-0-0-0-0-0-0-0&z=621680108 HTTP/1.1” – “-” – “200” – “43” – “zh-CN,zh;q=0.8” – “https://www.ikea-sh.cn/checkout/multipage/success/” – “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36” – “-” -“a=cNuYc0APJ027; tsc=3_5891d9c8_5891d9c8_0_1”
“180.161.162.72” – “-” – “[28/Feb/2017:00:08:24 +0800]” – “GET /track_proxy?tid=dc-811&cid=148801148530168382&dr=https%3A%2F%2Funitradeprod.alipay.com%2Facq%2FcashierReturn.htm%3Fsign%3DK1iSL1gljThca54X9aqL9TtzAbX82IDE0IXFEuBQR0W2GmKy97vlJebyYape0w%25253D%25253D%26outTradeNo%3DOBS000036106%26pid%3D2088121814027143%26type%3D1&sr=1440*900&vp=1307*760&de=UTF-8&sd=24-bit&ul=zh-cn&je=1&fl=24.0%20r0&t=pageview&ni=0&dl=https%3A%2F%2Fwww.ikea-sh.cn%2Fcheckout%2Fmultipage%2Fsuccess%2F&dt=%E7%BB%93%E7%AE%97%E6%88%90%E5%8A%9F&ub=0-0-0-0-0-0-0-0&z=1103637723 HTTP/1.1” – “-” – “200” – “43” – “zh-cn” – “https://www.ikea-sh.cn/ch
eckout/multipage/success/” – “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/602.4.8 (KHTML, like Gecko) Version/10.0.3 Safari/602.4.8” – “-” – “__ipdx=180.161.176.176; exptime=1488949463; geocode=1156310000;a=7huFc0QM6Ox5;sm=ts:1488210369,dm:www.ikea.com,ca:2037034,sp:74iZL; tsc=3_584cc619_58b449c1_28_31; syn=1_aa5c6481_58b29351_58b29351_1”
下面是我书写的语句,主要用到了awk 与 sort fileName | unique
提取 a字段后面的值,批量处理多个文件
#!/bin/bash
for((i=1;i<=28;i++));do
if (( $i < 10 )); then
zcat collect.cn.ms.com_2017020$i.log.gz | grep “id=dc-811” | grep “https://www.ikea-sh.cn/checkout/multipage/success/” >> /data/mission/site_811_2017020$i
echo “done grep 2017020$i”
else
zcat collect.cn.ms.com_201702$i.log.gz | grep “id=dc-811” | grep “https://www.ikea-sh.cn/checkout/multipage/success/” >> /data/mission/site_811_201702$i
echo “done greo 201702$i”
fi
done
将多个文件汇总,并做去重处理
#!/bin/bash
sourceFile=(`ls site*`)
for (( i=0 ; i
cat ${sourceFile[$i]} | awk -F ‘” – “‘ ‘{ print $12}’ | awk -F ” a=” ‘{print $2}’ | awk -F “; ” ‘{print $1}’ | awk -F ‘”‘ ‘{print $1}’ >> all
done
sort all | uniq >> uniq_id