In Java development, we may encounter the complex set operations. Java alone
is not powerful enough to save programmers' efforts in implementing the
computation details, which is time-consuming and poor in code reuse. In view
of this, programmers usually resort to dynamic calculation script for set
SQL is surely the first kind of script that comes into most programmers'
mind. However, to their disappointments, SQL does not support the explicit
set, and is unable to represent the sets of a set, ordered set, generic set,
and only the result set can be recognized as a set. Therefore, it is only the
subset of the true set. Many operations on sets are hard to implement through
SQL. Moreover, the computation is not limited on database, such as the data
from Excel and even there is no database in the application environment. In
this case, the usage of SQL databa... (more)
The data computation layer in between the data persistent layer and the
application layer is responsible for computing the data from data persistence
layer, and returning the result to the application layer. The data
computation layer of Java aims to reduce the coupling between these two
layers and shift the computational workload from them. The typical
computation layer is characterized with below features:
Ability to compute on the data from arbitrary data persistence layers, not
only databases, but also the non-database Excel, Txt, or XML files. Of all
these computations, the... (more)
The Big Data Real-time Application is a scenario to return the computation
and analysis results in real time even if there are huge amounts of data.
This is an emerging demand on database applications in recent years.
In the past, because there wasn't a lot of data, the computation was simple,
and few parallelisms, the pressure on the database wasn't great. A high-end
or middle-range database server or cluster could allocate enough resources to
meet the demand. Moreover, in order to rapidly and parallel access to the
current business data and the historic data, users also tended t... (more)
The low efficiency of Hadoop computation is an undeniable truth. We believe,
one of the major reasons is that the underlying computational structure of
MapReduce for Hadoop is basically of the external memory computation. The
external memory computation implements the data exchange through the frequent
external memory read/write. Because the efficiency of file I/O is two orders
of magnitude lower than that of memory, the computational performance of
Hadoop is unlikely high.
While for the normal users, they usually have a small size of cluster with
only tens or scores of nodes. T... (more)
The MapReduce of Hadoop is a widely-used parallel computing framework.
However, its code reuse mechanism is inconvenient, and it is quite cumbersome
to pass parameters. Far different from our usual experience of calling the
library function easily, I found both the coder and the caller must bear a
sizable amount of precautions in mind when writing even a short pieces of
program for calling by others.
However, we finally find that esProc could easily realize code reuse in
hadoop. Still a simple and understandable example of grouping and
summarizing, let's check out a solution with... (more)