As bayerj points out PCA is method that assumes linear systems where as Autoencoders (AE) do not. If no non-linear function is used in the AE and the number of neurons in the hidden layer is of smaller dimension then that of the input then PCA and AE can yield the same result. Otherwise the AE may find a different subspace.

One thing to note is that the hidden layer in an AE can be of greater dimensionality than that of the input. In such cases AE's may not be doing dimensionality reduction. In this case we perceive them as doing a transformation from one feature space to another wherein the data in the new feature space disentangles factors of variation.

Regarding to your question about whether multiple layers means very complex non-linear in your response to bayerj. Depending on what you mean by "very complex non-linear" this could be true. However depth is really offering better generalization. Many methods require an equal number of samples equal to the number of regions. However it turns out that "a very large number of regions, e.g., $O(2^N)$, can be defined with $O(N)$ examples" according to Bengio et al. This is a result of the complexity in representation that arises from composing lower features from lower layers in the network.

As bayerj points out PCA is method that assumes linear systems where as Autoencoders (AE) do not. If no non-linear function is used in the AE and the number of neurons in the hidden layer is of smaller dimension then that of the input then PCA and AE can yield the same result. Otherwise the AE may find a different subspace.

One thing to note is that the hidden layer in an AE can be of greater dimensionality than that of the input. In such cases AE's may not be doing dimensionality reduction. In this case we perceive them as doing a transformation from one feature space to another wherein the data in the new feature space disentangles factors of variation.

Regarding to your question about whether multiple layers means very complex non-linear in your response to bayerj. Depending on what you mean by "very complex non-linear" this could be true. However depth is really offering better generalization. Many methods require an equal number of samples equal to the number of regions. However it turns out that "a very large number of regions, e.g., $O(2^N)$, can be defined with $O(N)$ examples" according to

Bengio et al. This is a result of the complexity in representation that arises from composing lower features from lower layers in the network.