Recently, as I have described the ParElastic architecture, I have been asked how Sharding is different from a Parallel Database. They are similar concepts, the block diagram looks similar and the confusion is understandable.
It occurs to me that the best answer to the question is this,
A ‘parallel database’ is a database architecture, sharding is
an application architecture.
Put slightly differently, parallelism is a database architecture choice (another choice being Symmetric Multiprocessing or SMP). From the perspective of the database client, what you see is a single database. The fact that data is partitioned and that a collection of servers work collaboratively to process queries is a aspect of the working of the parallel database. A query submitted to a parallel database targets all the data and the result stream is the “answer” to the query.
Sharding on the other hand is an application choice (another choice would be to get a bigger server). From the perspective of the application there are a collection of discrete database servers and the application has within it some logic to determine where to place data, how to direct queries, and in some cases how to integrate the independent result streams from each of the database servers.
At ParElastic we believe that a parallel database architecture allows you, the application developer, to focus on the application and not have to worry about doing things that the database should do!