We evaluate the performance of recent empirical magnetic field models (Tsyganenko, 1996, 2002a, 2002b; Tsyganenko and Sitnov, 2005, hereafter referred to as T96, T02 and TS05, respectively) during magnetic storm times including both pre- and post-storm intervals. The model outputs are compared with GOES observations of the magnetic field at geosynchronous orbit. In the case of a major magnetic storm, the T96 and T02 models predict anomalously strong negative Bz at geostationary orbit on the nightside due to input values exceeding the model limits, whereas a comprehensive magnetic field data survey using GOES does not support that prediction. On the basis of additional comparisons using 52 storm events, we discuss the strengths and limitations of each model. Furthermore, we quantify the performance of individual models at predicting geostationary magnetic fields as a function of local time, Dst, and storm phase. Compared to the earlier models (T96 and T02), the most recent storm-time model (TS05) has the best overall performance across the entire range of local times, storm levels, and storm phases at geostationary orbit. The field residuals between TS05 and GOES are small (≤3 nT) compared to the intrinsic short time-scale magnetic variability of the geostationary environment even during non-storm conditions (∼24 nT). Finally, we demonstrate how field model errors may affect radiation belt studies when estimating electron phase space density.