Hull[1] noted that many problems exist when trying to empirically test Black-Scholes and other option pricing models. One has to account for the fact that while all option pricing models assume that markets are perfectly efficient; in fact, they often do not. Also, stock price volatility, a key component in nearly all option pricing models, is very difficult to measure. Most formulas just assume the variance in the stock price to be the implied volatility; in truth, that is not always the case. Also, it is difficult to ensure that stock and option prices are synchronous; that is, that the last option trade corresponds with the last stock trade. For example, the last option trade of the day for MSFT 50 calls may occur at 1:00 PM, when MSFT is trading at $50, while the last stock trade may occur at 4:00 PM, when the stock is higher or lower, which would effect how the option pricing models would theoretically price the option.
Black and Scholes (1972) originally tested whether their model would work in practice. They would purchase undervalued options and sell overpriced options. In the long-term, they indeed did make money, but if one took into account transaction costs, they concluded that only market-makers would have the ability to profit from this; the market seems to be efficient enough to avoid such arbitrage opportunities.