Tuesday, July 24, 2012
GraphChi, new software for analyzing massive amounts of data on the PC
The enterprise information technology and scientists / engineers in the field of computing has had a good news for research in computer science and computing. Recently, a team of scientists at the Select Lab, Carnegie Mellon University, has successfully built a new software named GraphChi allow the execution of computations involving huge amounts of data (eg as analysis of the search engines or social networking) on a personal computer or laptops. Previously known to handle this type of work we need to take to the cluster computing / supercomputing great, as cloud services Amazon's EC-2 for example.
Computation and Graph GraphChi
First, need to make it clear that GraphChi not a complete solution for all U.S. heavy computing tasks. In fact, the software is designed to be used for calculations related to Graph Computation. However, do not mess with it, Graph Computation has been constantly increasing in number and plays extremely important role in modern times.
So What and Computation Graph in sure why is it important? Effects, such facilities may be quickly and easily visualize in this case. Let's start with a function of Facebook, Top Post. Do you know how to come up with the top post for you, Facebook had to analyze a huge pile of data and related complex relationships between accounts / status / comment ... different? Well, that's Graph Computation - calculations related to the volume of data and relationships between them. So they are important? Most likely answer you agree with me here is "Yes".
Not only that, GraphChi also capable of handling the "streaming graph," a joint Computation Graph relating to the modeling of large networks of information correctly by pointing out the connection between changes of data over time. Information not only be considered in the interaction at the present time but also with what in the past. What a huge volume of work that one can think only supercomputer / cluster that can do this again on your PC, it was amazing.
The working principle of GraphChi
To calculate fast, the computer in the common system must store it in RAM. With Graph Computation, apparently of a PC is not enough RAM to store all necessary information. Hard drive is different, with the development of storage technology, the capacity of these components on the PC now afford to do it. The bad point is the speed of the hard drive read and write data more slowly than their more RAM thus affect the processing of Computation Graph and users are forced to put them on the system cluster / supercomputer. To solve this problem, Aapo Kyrola, a member of the team, has developed an algorithm to access hard drive "less" random than *, optimized for faster and Computation Graph to overcome disadvantages of hard drive in the related computations. This algorithm is the soul of the new software GraphChi.
And the effectiveness of GraphChi ...
Carlos Guestrin, co-executive of Select Lab, said use a Mac Mini running GraphChi, they can analyze the social graph (graph Computation related to social networks) of Twitter in 2010 with about 40 million users and 1.2 billion related connection in 59 minutes . One result has been published previously said to handle the amount of this calculation takes 400 minutes to 1000 cluster system consists of interconnected computers, ie the equivalent of about 400,000 minutes = 6666 hours = 278 days for a computer if working alone.
GraphChi and some application areas
Obviously with the advancement in the analysis related to Graph Computation, the new joint product to develop better web, such as document search, online advertising, navigation tool ... even network security (analysis of the relationship and seek criminal / organized crime ...). With GraphChi, programmers / developers only need to use personal computers to data processing rather than having to write specialized programs for the computer system and a lot of work waiting for their turn.
Another area involves many calculations that Graph Biomedical Computation, computation biology or materials science. In biomedicine, the way the brain works or associated records patient information to accurately identify the disease are related to graph Computation. In biological computation, or materials science simulations of DNA, protein, cell system, the phase transition of the material ... must resort to the complex calculations with large amounts of data processing as well as final results.
Instead of a conclusion "Large" is a concept relative to a landmark as the standard material. While not denying the necessity of the cluster computing / supercomputing but in fact more data "is not too big" as what people imagine. Such tools will allow GraphChi companies / individuals more flexible, cheaper, faster and more convenient in problems related to Graph Computation calculations. Hopefully the future we will also find many useful research like this more.
The interest you can visit the project site and the Select Lab to learn more. < br />
Cluster computing is a term used to refer to a computer system is connected and works together. The computer system is usually controlled by a server and to use them users need to download their work onto the server and lined up waiting their turn. Regarding applications, the cluster is often used in scientific computing and data center runs parallel to the heavy computing tasks.
regular hard drive using random-access method at any point from the disk surface for read and write data.
From: TechnologyReview
...
No comments:
Post a Comment