RANSAC简介
RANSAC(RAndom SAmple Consensus,随机采样一致)算法是从一组含有“外点”(outliers)的数据中正确估计数学模型参数的迭代算法。“外点”一般指的的数据中的噪声,比如说匹配中的误匹配和估计曲线中的离群点。所以,RANSAC也是一种“外点”检测算法。RANSAC算法是一种不确定算法,它只能在一种概率下产生结果,并且这个概率会随着迭代次数的增加而加大(之后会解释为什么这个算法是这样的)。RANSAC算最早是由Fischler和Bolles在SRI上提出用来解决LDP(Location Determination Proble)问题的。
对于RANSAC算法来说一个基本的假设就是数据是由“内点”和“外点”组成的。“内点”就是组成模型参数的数据,“外点”就是不适合模型的数据。同时RANSAC假设:在给定一组含有少部分“内点”的数据,存在一个程序可以估计出符合“内点”的模型。
算法基本思想和流程
RANSAC是通过反复选择数据集去估计出模型,一直迭代到估计出认为比较好的模型。
具体的实现步骤可以分为以下几步:
- 选择出可以估计出模型的最小数据集;(对于直线拟合来说就是两个点,对于计算Homography矩阵就是4个点)
- 使用这个数据集来计算出数据模型;
- 将所有数据带入这个模型,计算出“内点”的数目;(累加在一定误差范围内的适合当前迭代推出模型的数据)
- 比较当前模型和之前推出的最好的模型的“内点“的数量,记录最大“内点”数的模型参数和“内点”数;
- 重复1-4步,直到迭代结束或者当前模型已经足够好了(“内点数目大于一定数量”)。
# -*- coding: utf-8 -*-
"""
Created on Mon Jul 30 20:07:19 2018
@author: Yuki
"""
import random
import numpy as np
from matplotlib import pyplot as plt
# Magic Numbers
# Controls the inlier range
THRESHOLD = 0.1
# Finds random potential fit lines
def RANSAC(data):
n = len(data)
# Another magic number
NUM_TRIALS = n // 2
best_in_count = 0
for i in range(0, NUM_TRIALS):
r = random.sample(data, 2)
r = np.array(r)
# linear regression on two points will just give the line through both points
m, b = lin_reg(r)
# finds the line with the most inliers
in_count = 0
for j in data:
# if the distance between the line and point is less than or equal to THRESHOLD it is an inlier
if abs(j[1] - ((m * j[0]) + b)) <= THRESHOLD:
in_count = in_count + 1
# Tracks the best fit line so far
if in_count > best_in_count:
best_in_count = in_count
best_m = m
best_b = b
# record both inliers and outliers to make end graph pretty
in_line = []
out_line = []
for j in data:
if abs(j[1] - ((best_m * j[0]) + best_b)) <= THRESHOLD:
in_line.append(j)
else:
out_line.append(j)
# returns two lists, inliers and outliers
return in_line, out_line
# performs the linear regression as described on the assignment sheet
def lin_reg(data):
n = float(len(data))
x_sum = 0.0
y_sum = 0.0
# averages the x and y values
for i in data:
x_sum = x_sum + i[0]
y_sum = y_sum + i[1]
x_average = x_sum / n
y_average = y_sum / n
# initializes slope numerator and denominator
# note denominator should not be zero with data
m_numerator = 0.0
m_denominator = 0.0
# calculates the slope
for i in data:
m_numerator = m_numerator + ((i[0] - x_average)*(i[1] - y_average))
m_denominator = m_denominator + ((i[0] - x_average)*(i[0] - x_average))
m = m_numerator / m_denominator
# finds the intercept
b = y_average - (m * x_average)
# returns slope and intercept
return m, b
def plot_best_fit(data):
# Get our inlier and outlier points
in_line, out_line = RANSAC(data)
# find the best fit line for inliers
m, b = lin_reg(in_line)
# This was the hardest part
# Could not find a function that would make a non line segment so I just covered our domain
# Admittedly with potential error on giant domains
x_min = 100000.0
x_max = -100000.0
for i in data:
if i[0] > x_max:
x_max = i[0]
if i[0] < x_min:
x_min = i[0]
domain = [x_min, x_max]
line_points = [m * i + b for i in domain]
line_points_top= [m * i + 0.5 * b for i in domain]
line_points_bottom = [m * i + 1.2 * b for i in domain]
# Plot the inliers as blue dots
in_line = np.array(in_line)
x, y = in_line.T
plt.scatter(x, y)
# plot the outliers as red x's
# if statement for if outliers is empty, which it is for the easy case
if out_line != []:
out_line = np.array(out_line)
x, y = out_line.T
plt.scatter(x, y, s=30, c='r', marker='x')
# plot our best fit line
plt.plot(domain, line_points, '-')
plt.plot(domain, line_points_bottom, '-')
plt.plot(domain, line_points_top, '-')
plt.gca().invert_yaxis()
# show the plot
plt.title("Road-Line-Estimation")
plt.xlabel('1/X')
plt.ylabel('Laser')
plt.show()
# return slope and intercept for answers
return m, b
# ----------------------------------------------------------------------------------------------------
#测试栗子
'''
data = []
with open('noisy_data_medium.txt') as file:
# Creates 2D array to hold the data
for l in file:
data.append(l.split())
# removes comma from first entry
for i in data:
i[0] = float(i[0][:-1])
i[1] = float(i[1])
# function also returns slop and intercept should you want them
m, b = plot_best_fit(data)
print(m, b)
'''
版权声明:本文为u014694994原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。