前几天饭总在采集某网站数据时候被它的验证码拦住了,在范总自己折腾了一个晚上未果后问我有没做过类似事情。之前自己也很少去研究验证码东西,都是找一个顶着用就是,从来没去搞过破解什么的。Google一番,原来BMP图像直接可以读取,一个像素是对应着一个数据的。因此思路出来了。BMP图像的来源及介绍看
这里
,而用C同学想研究它的数据存储结构的看
这里
。
BMP图像包括这几类数据:文件头、位图信息头、颜色信息和位图数据四部分组成。既然位图数据即图形数据能读取出来,那么任何一张验证码图片里面的某一个Code也就能读出了,因为它本身也是由程序生成,那么我们就一定也能从图片反过来分割图片后找出图片里各个字符以及他们在BMP图像里面的一些参数出来。比如我找到的如下:
上面这些数据除了BMP图片本身的高度和宽度可以程序读出来以外,其他的都必须自己去分析需要破解的验证图片得出,而且所有参数单位为像素(px)。比如我做的图片分析如下:
有上面这张图片的分析我想大家应该大体都知道怎么去做验证了。我的思路步骤为下:
1.收集某验证码BMP图片若干,然后仔细用PS或者其他图片处理工具打开,做我上面这样分析,得到该图像的。
2.有了以上分析得到的图片特征数据,我们就可以很快的写出以下功能函数。我是写了一个类,包含方法接口如下:
3.我们现在能得到一个BMP图片里的所有某个元素图像的数据了,我们得到的位图数据是这样的:每个像素的数据都是存储着他的RGB颜色值。比如255255255白色。有这些数据,我们经过处理就可以做匹配认证了。首先这一步我们需得到验证码图片的所有元素的数据模板。比如某套验证码图片的有5个字符图像,包括0-9 A-Z这么多,那么我们可以先用getIndex(n)读某一张上某一个index的元素图像数据,因此我们还得先收集满若干原始图片,并保证这些图片里元素图片全部都出现过,这样我们就能采集到一个完整的元素图像数据库。这里提醒下,一般验证码图片都有些干扰像素啊什么的,我们可以这样处理,根据不同的BMP图像里面防干扰的条文来做,设置一个中间点,低于的全设置为0,而高于的都设置为1.那么我们最终一个像素的数据将是(111、110、001、101…)。采集元素位图数据库是一个体力活,要用到眼睛、N张图片、手、以及不断去修改读取图片的路劲、参数等O(∩_∩)O~
4.有的元素图像数据库,我们现在就可以高调的接受破解某张验证图片了,读取图片,然后一个循环读取它的元素图像,并与元素图像数据库里面的去对比,判断是否相同或者高深点算法判断下有多大相似度,并果断认定它是哪一个Code,一次循环下来就出结果啦,呵呵!
上完整代码
:
readerFile.class.php
class readBmp{
/**
* 图片文件
* @var string
*/
var $_bmpfile = ”;
var $bWidth = 0;
var $bHeight = 0;
/**
* 图片宽度
* @var int
*/
var $iWidth = 0;
/**
* 图片高度
* @var int
*/
var $bmpHeight = 0;
/**
* 图片BMP数据起始像素位置
* @var int
*/
var $bmpOffSet = 0;
/**
* 图片数据,bmp格式文件读取
* @var int array
*/
var $bmpData = array();
/**
* 数据元素左边距
*
* @var int
*/
var $toLeftBorder = 0;
/**
* 元素右边距
* @var int
*/
var $toRightBorder = 0;
/**
* 元素上边距
* @var int
*/
var $toTopBorder = 0;
/**
* 元素下边距
* @var int
*/
var $toBottomBorder = 0;
/**
* 元素宽度
* @var int
*/
var $itemWidth = 0;
/**
* 元素高度
* @var int
*/
var $itemHeight = 0;
/**
* 元素间距
* @var int
*/
var $itemSpace = 0;
var $keyLib = array(
0 => ‘0000001111000000000011111111000000111100011111000011000000011100001100000000110000110000000011000011000000001100001100000000110000110000000011000011000000001100001100000000110000110000000011000011100000011100001111100111100000001111111000000000001110000000’,
0 => ‘0000001111000000000011111111000000111100011111000011000000011100001100000000110000110000000011000011000000001100001100000000110000110000000011000011000000001100001100000000110000110000000011000011100000011100001111100111100000001111111000000000001110000000’,
1 => ‘0000111111110000000011111110000000000001100000000000000110000000000000011000000000000001100000000000000110000000000000011000000000000001100000000000000110000000000000011000000000001101100000000000111110000000000001111000000000000011100000000000000100000000’,
2 => “0111111111111000011111111111000000111100000000000000111000000000000001110000000000000011110000000000000111100000000000000111100000000000001111000000000000001110001100000000011000111000000011000001111000011000000011110111100000000011111100000000000110000000”,
3 => “0000001110000000000011111111000000111100111111000111000000011100010000000000110000000000000011100000000000000110000000000111111000000111111111000000011110000000000000111100000000000000111000000000000001110000000000000011110000111111111111100011111111111100”,
4 => “0000000011000000000000001100000000000000110000000000000011000000001111111111110000111111111110000011100011000000000110001100000000011100110000000000110011000000000010001100000000000000110000000000000011000000000000001100000000000000110000000000000010000000”,
5 => “0011111111110000001111111111100000000000000111000000000000001110000000000000011000000000000001100000000000000110000000000011110001111111111110000111111111000000011000000000000001100000000000000110000000000000011000000000000001111111111110000111111111110000”,
6 => “0000001111000000000011111111000000111100011111000011000000011100001100000000110000110000000011000011000000001100001111100011110000111111111110000011001111000000001100000000000000110000000000000011100000011100001111100111100000001111111000000000001110000000”,
7 => “0000110000000000000011100000000000000110000000000000011100000000000000111000000000000001110000000000000011000000000000001110000000000000011100000000000000110000110000000011100011000000000111001100000000001110110000000000011011111111111111111111111111111110”,
8 => “0000000110000000000011111111000000011110111110000001100000011000000110000001100000011000001110000001111001110000000011111100000000000111111000000001110011110000000110000011000000011000001110000001100000011000000111111111000000001111111000000000000100000000”,
9 => “0000001111000000000011111111000000111100011111000011000000011100000000000000110000000000000011000000011111001100001111111111110000111000011111000011000000001100001100000000110000110000000011000011100000011100001111100111100000001111111000000000001110000000”
);
public function config($data){
$this->bWidth = $data[‘bWidth’];
$this->bHeight = $data[‘bHeight’];
$this->itemHeight = $data[‘itemHeight’];
$this->itemWidth = $data[‘itemWidth’];
$this->toLeftBorder = $data[‘toLeftBorder’];
$this->toRightBorder = $data[‘toRightBorder’];
$this->toTopBorder = $data[‘toTopBorder’];
$this->toBottomBorder = $data[‘toBottomBorder’];
$this->itemSpace = $data[‘itemSpace’];
$this->bmpOffSet = $data[‘bmpOffSet’];
}
public function setFile($file){
$this->_bmpfile = $file;
}
public function readData(){
$data = readBmpFile($this->_bmpfile);
$temp = array();
foreach ($data as $v){
if($v == ‘255255255’){
$temp[] = 0;
} else {
$temp[] = 1;
}
}
$this->bmpData = $temp;
}
function readIndex($n){
if(count($this->bmpData) == 0) $this->readData();
//共多少个像素数据
$cnt = count($this->bmpData);
$w = $this->itemWidth;
$h = $this->itemHeight;
$result = ”;
for($i = 0; $i < $h; $i++){
$_x = (($i + $this->toTopBorder) * $this->bWidth) + $this->toLeftBorder + ($n – 1) * ($w + $this->itemSpace);
$line = ”;
for($j = 0; $j < $w; $j++){
$result .= $this->bmpData[$_x + $j];
$line .= $this->bmpData[$_x + $j];
}
//echo $line;
}
return $result;
}
function display(){
$i = 0;
foreach ($this->bmpData as $v){
echo $v;
$i ++ ;
if($i % $this->bWidth == 0){
echo “<br/>”;
}
}
}
function getCode(){
$result = ”;
for($i = 1; $i <= 5; $i ++){
$numdata = ”;
$numdata = $this->readIndex($i);
for($j = 0; $j <= 9; $j ++){
if(substr($this->keyLib[$j], 10) === substr($numdata,10)){
$result .= $j;
break;
}
}
}
return $result;
}
}
/**
* @exception 函数修改自网上某位前辈的 imagecreatefrombmp($file) 我去掉了其中图像处理的以及加入适当代码得到我们想要的数据格式,因为我们只需要图像的数据.
* @author Larro
* @param File $file
* @return array
*/
function readBmpFile($file){
global $CurrentBit,$echoMode;
$f = fopen($file,”r”);
$Header = fread($f,2);
if($Header == “BM”)
{
$Size = freaddword($f);
$Reserved1 = freadword($f);
$Reserved2 = freadword($f);
$FirstByteOfImage = freaddword($f);
$SizeBITMAPINFOHEADER = freaddword($f);
$Width = freaddword($f);
$Height = freaddword($f);
$biPlanes = freadword($f);
$biBitCount = freadword($f);
$RLECompression = freaddword($f);
$WidthxHeight = freaddword($f);
$biXPelsPerMeter = freaddword($f);
$biYPelsPerMeter = freaddword($f);
$NumberOfPalettesUsed = freaddword($f);
$NumberOfImportantColors = freaddword($f);
if($biBitCount < 24)
{
$Colors=pow(2, $biBitCount);
for($p=0; $p<$Colors; $p++)
{
$B = freadbyte($f);
$G = freadbyte($f);
$R = freadbyte($f);
$Reserved=freadbyte($f);
$imageData[] = sprintf(“%03d”,$R).sprintf(“%03d”,$G).sprintf(“%03d”,$B);
}
if($RLECompression==0)
{
$Zbytek=(4-ceil(($Width/(8/$biBitCount)))%4)%4;
for($y = $Height-1;$y>=0;$y–)
{
$CurrentBit=0;
for($x=0; $x<$Width; $x++)
{
$C=freadbits($f,$biBitCount);
}
if($CurrentBit!=0) {
freadbyte($f);
}
for($g=0;$g<$Zbytek;$g++)
freadbyte($f);
}
}
}
if($RLECompression==1) //$BI_RLE8,图像为256色模式, 则采用RLE8压缩方式
{
$y=$Height;
$pocetb=0;
while(true)
{
$y–;
$prefix=freadbyte($f);
$suffix=freadbyte($f);
$pocetb+=2;
$echoit=false;
if($echoit)echo “Prefix: $prefix Suffix: $suffix<BR>”;
if(($prefix==0)and($suffix==1)) break;
if(feof($f)) break;
while(!(($prefix==0)and($suffix==0)))
{
if($prefix==0)
{
$pocet=$suffix;
$Data.=fread($f,$pocet);
$pocetb+=$pocet;
if($pocetb%2==1) {freadbyte($f); $pocetb++;}
}
if($prefix>0)
{
$pocet=$prefix;
for($r=0;$r<$pocet;$r++)
$Data.=chr($suffix);
}
$prefix=freadbyte($f);
$suffix=freadbyte($f);
$pocetb+=2;
if($echoit) echo “Prefix: $prefix Suffix: $suffix<BR>”;
}
$Data=””;
}
}
if($RLECompression==2) //$BI_RLE4,压缩方式,如果图像为16色模式,则采用RLE4压缩方式
{
$y=$Height;
$pocetb=0;
while(true)
{
//break;
$y–;
$prefix=freadbyte($f);
$suffix=freadbyte($f);
$pocetb+=2;
$echoit=false;
if($echoit)echo “Prefix: $prefix Suffix: $suffix<BR>”;
if(($prefix==0)and($suffix==1)) break;
if(feof($f)) break;
while(!(($prefix==0)and($suffix==0)))
{
if($prefix == 0)
{
$pocet = $suffix;
$CurrentBit=0;
for($h=0;$h<$pocet;$h++)
$Data .= chr(freadbits($f,4));
if($CurrentBit!=0) freadbits($f,4);
$pocetb += ceil(($pocet/2));
if($pocetb%2 == 1) {
freadbyte($f); $pocetb++;
}
}
if($prefix>0)
{
$pocet=$prefix;
$i = 0;
for($r = 0;$r<$pocet;$r++)
{
if($i%2==0)
{
$Data.=chr($suffix%16);
}
else
{
$Data.=chr(floor($suffix/16));
}
$i++;
}
}
$prefix=freadbyte($f);
$suffix=freadbyte($f);
$pocetb+=2;
if($echoit) echo “Prefix: $prefix Suffix: $suffix<BR>”;
}
$Data=””;
}
}
if($biBitCount==24)
{
$Zbytek=$Width%4;
for($y=$Height-1;$y>=0;$y–)
{
for($x=0;$x<$Width;$x++)
{
$B=freadbyte($f);
$G=freadbyte($f);
$R=freadbyte($f);
$imageData[] = sprintf(“%03d”,$R).sprintf(“%03d”,$G).sprintf(“%03d”,$B);
}
for($z=0;$z<$Zbytek;$z++)
freadbyte($f);
}
}
return $imageData;
}
fclose($f);
}
function freadbyte($f){
return ord(fread($f,1));
}
function freadword($f){
$b1=freadbyte($f);
$b2=freadbyte($f);
return $b2*256+$b1;
}
function freaddword($f){
$b1=freadword($f);
$b2=freadword($f);
return $b2*65536+$b1;
}
其中readFile.class.php里面的$keyLib属性为上面提到的元素图像数据库。为自己动手收集的数据!这里是我要验证的图片的数据库。可以另外做成配置文件形式。
do.php
// 包含处理类
include_once( ‘readBmp.class.php’ );
$bmp = new readBmp;
//设置文件
$bmp->setFile(‘1.bmp’);
//配置 读取BMP数据
$config = array(
‘bWidth’ => 104,
‘bHeight’ => 20,
‘itemHeight’ => 16,
‘itemWidth’ => 16,
‘toLeftBorder’ => 5,
‘toRightBorder’ => 3,
‘toTopBorder’ => 2,
‘toBottomBorder’=>2,
‘itemSpace’ => 4,
‘bmpOffSet’ => 213
);
$bmp->config($config);
//输出查看
$bmp->display();
$code = $bmp->getCode();
echo $code;
晕。。!CSDN不能上传BMP格式的,于是我把后缀改下,你们懂的!