A Gauss-Seidel variant is developed which maintains data in the L2 cache memory longer than and runs approximately twice as fast as standard implementations. This variant depends on a decomposition of grid nodes into blocks which fit into cache. We discuss two O(n) algorithms which perform a one-time reordering of the grid nodes and associated operators. Numerical tests demonstrate the speedups possible. A performance analysis tool confirms that our version makes significantly better use of L2 cache than standard versions.