We explore the use of neural networks to predict wavelet coefficients for image compression. We show that by reducing the variance of the residual coefficients, the nonlinear prediction can be used to reduce the length of the compressed bitstream. We report results on several network architectures and training methodologies; some pitfalls of the approach are examined and explained. A two layer fully connected network, trained offline, applied to a test set consisting of seven 512 by 768 images from the Kodak database, gives a consequent overall improvement in the bit rate of between 4% and 7%.