Maximum entropy (MaxEnt) models have been used in many spoken language tasks. The training of a MaxEnt model often involves an iterative procedure that starts from an initial parameterization and gradually updates it towards the optimum. Due to the convexity of its objective function (hence a global optimum on a training set), little attention has been paid to model initialization in MaxEnt training. However, MaxEnt model training often ends early before convergence to the global optimum, and prior distributions with hyper-parameters are often added to the objective function to prevent over-fitting. This paper shows that the initialization and regularization hyper-parameter setting may significantly affect the test set accuracy. It investigates the MaxEnt initialization/regularization based on an n-gram classifier and a TF*IDF weighted vector space model. The theoretically motivated TF*IDF initialization/regularization has achieved significant improvements over the baseline flat initialization/regularization. In contrast, the n-gram based initialization/regularization often does not exhibit significant improvements.