Determining the optimal implementation of a quantum gate is critical for designing a quantum computer. We consider the crucial task of efficiently decomposing a general single-qubit quantum gate into a sequence of fault-tolerant quantum operations. For a given single-qubit circuit, we construct an optimal gate sequence consisting of fault-tolerant Hadamard (H) and π/8 rotations (T). Our scheme is based on a novel canonical form for single-qubit quantum circuits and the corresponding rules for exactly reducing a general single-qubit circuit to our canonical form. The result is optimal in the number of T gates. We demonstrate that a precomputed epsilon net of canonical circuits in combination with our scheme lowers the depth of approximation circuits by up to 3 orders of magnitude compared to previously reported results.