The speech translation (ST) problem can be formulated as a log-linear model with multiple features that capture different levels of dependency between the input voice observation and the output translations. However, while the log-linear model itself is of discriminative nature, many of the feature functions are derived from generative models, which are usually estimated by conventional maximum likelihood estimation. In this paper, we first present the formulation of the ST problem as a log-linear model with a plurality of feature functions. We then describe a general discriminative learning framework for training these generative features based on a technique called growth transformation (GT). The proposed approach is evaluated on a spoken language translation benchmark test of IWSLT. Our experimental results show that the proposed method leads to significant improvement of translation quality. Fast and stable convergence can also be achieved by the proposed method.