The traditional echo state network (ESN) is a special type of a temporally deep model, the recurrent network (RNN), which carefully designs the recurrent matrix and fixes both the recurrent and input matrices in the RNN. The ESN also adopts the linear output (or readout) units to simplify the leanring of the only output matrix in the RNN. In this paper, we devise a special technique that takes advantage of the linearity in the output units in the ESN to learn the input and recurrent matrices, not carried on earlier ESNs due to the well-known difficulty of their learning. Compared with the technique of BackProp Through Time (BPTT) in learning the general RNNs, our proposed technique makes use of the linearity in the output units to provide constraints among various matrices in the RNN, enabling the computation of the gradients as the learning signal in an analytical form instead of by recursion as in the BPTT. Experimental results on phone state classification show that learning either or both the input and recurrent matrices in the ESN is superior to the traditional ESN without learning them, especially when longer time steps are used in analytically computing the gradients.