This post is about the popular outlier rejection algorithm RANSAC. It stands for RANdom SAmple Consensus. It is widely used in computer vision, with one of the application being in rejection of false feature matches in a pair of images from a stereo camera set.
Suppose you have been given a dataset and you want to fit a mathematical model on it. We now assume that this data has certain inliers and some outliers. Inliers refer to the data points whose presence can be explained with the help of a mathematical model, while outliers are data points whose presence can never be explained via any reasonable mathematical model. Usually their presence in the dataset deteriorates the quality of the mathematical model that we can fit to the data. For best results, we should ignore these outliers while estimating the parameters of our mathematical model. RANSAC helps us in identifying these points so that we can obtain a better fir for the inliers.
Note that even the inliers do not exactly fit the mathematical model as they might have some noise, but the outliers either have an extremely large amount of noise or they are obtained due to faults in measurement, or because of problems in the sensor from which we are obtaining the data.
The Algorithm
The Input
- Data points
- Some parametrized model (we need to estimate the parameters for this model)
- Some confidence parameters
Algo
- A set points from the original dataset are randomly selected, and are assumed to be the inliers.
- Parameters are estimated to fit to this hypothetical inlier set.
- Every point that was not a part of this hypothetical inlier set is tested against the mathematical model that we just fit.
- The points that fit the model become a part of the consensus set. The model is good if a particular number of points have been classified as part of the consensus set.
- This model is then re-estimated using all the members of a consensus set.
- The above process is repeated a fixed number of times, and the model with the largest consensus set is kept.
How many times do we repeat?
It is possible to theoretically determine the fixed number of iterations ‘k’ which are needed, if we have an estimate of the percentage of outliers present in the data.