Big Data is a term used to describe a set of data so vast and so complex that the tools data processing applications, the tradition could not handle.
However, Big Data back containing a lot of valuable information that if extracted successfully, it will help a lot for business, scientific research, predicted the impending epidemic arose and even determining traffic conditions in real time. Thus, these data must be collect, organize, store, search, share in a different way than normal. In this article, invite you along to learn about Big Data, the method that is used to exploit it and it helps for how our lives.
1. The definition of Big Data
As said above, Big Data there is a collection of data that are exceeded ensure of the traditional tools and applications. The size of the Big Data is ever on the increase, and in 2012, it can range from a few tens of terabytes to petabyte (1 petabyte = 1024 terabytes) only for a set of data only.
In 2001, analyst Doug Laney of META Group (now the primary research company Gartner) said that the challenges and opportunities lie in the growth data can be described by three dimensions: the increase of the amount (volume), an increase of velocity (velocity) and the increase of varieties (variety). Now, Gartner, along with many other organizations and companies in the field of information technology continues to use model "3V" to the definition should Big Data. By the year 2012, Gartner added that Big Data in addition to the three properties on the left must "need to handle new forms to help make decisions, explore deeper into things/events and optimizing work processes".
We can take the experiments of the Large Hadron Collider (LHC) in Europe as an example for Big Data. When these experiments are conducted, the results will be recorded by 150 million sensors with data task about 40 million times per second. The result is as if LHC noted most of the results from all of the sensors, the data flow will become extremely large, could reach 150 million petabytes each year, or 500 exabytes per day, 200 times higher than all the other data source on the world pooled type.
In every second like back there to about 600 million collisions between particles, but after sifting back from about 99.999% of the data flow, only 100 are collision range scientists concerned. This means that the governing body must find new measures LHC to manage and handle most of the giant data tangle.
Another example, when Sloan Digital Sky Sruver, a space observatory in New Mexico, began operations in 2000, after a few weeks it was collecting the data is larger than the total amount of data that astronomy has been collected in the past, about 200 GB each night and currently total reached more than 140 terabytes. LSST Observatory to replace the SDSS is expected inauguration in the year 2016, it will collect an equivalent amount of data as above but only within 5 days.
Or as the work of deciphering the human genetically. Before this work take up to 10 years to process, and now people just a week is completed. Also, the center of the NASA climate simulation is contained 32 petabyte data about weather-observation and simulation in supercomputers. The storage of images, text and other multimedia content on Wikipedia as well as noting the user's editing behavior also constitutes a set of Big Data.
2. Some information about Big Data currently
According to Intel documents in 9/2013, today the world is being created 1 petabyte of data in every 11 seconds and it is equivalent to a 13-year-long HD video. The company, the business also owns Big Data of their own, such as the eBay online sales page then use two data centers with a capacity of up to 40 petabytes to contain those queries, search engines, recommending to the customer as well as information about his cargo.
The online retailer Amazon.com to handle millions of daily activities as well as requests from about half-million sales partners. Amazon uses a Linux system and in 2005, they each own the three biggest Linux database in the world with a capacity of 7, work with, 18, 24, and 5TB 7TB.
Similarly, Facebook must also manage 50 billion shots from users who upload to YouTube, or Google to save most of the weekly query and the user's video and many types of other related information.
Also under the SAS group, we have a few interesting statistics about Big Data as follows:
The RFID system (a form of short-range connections, such as the NFC but has more range and is also used in the opening tag of the hotel) to create the amount of data is greater than 1,000 times compared to traditional VAC code
Within 4 hours of the day "Black Friday" in 2012, the Walmart store has to handle more than 10 million cash transaction, i.e. account 5,000 interfaces per second.
UPS courier service receives approximately 39.5 million requests from his customers every day
VISA service handles more than 172.8 million card transactions only within a day.
On Twitter there is 500 million new tweet stream every day, Facebook had 1.15 billion members created a huge tangle of text data, files, videos ...
3. The technology used in Big Data
Big Data is growing demand large that Software AG, Oracle, IBM, Microsoft, SAP, EMC, HP, and Dell has spent more than 15 billion DOLLARS to companies specializing in data analysis and management. In 2010, the Big Industry Data worth more than 100 billion dollars and is growing fast with a speed of 10% per year, twice the total software industry in General.
As said above, Big Data need to harness information technologies is very special because of the huge and complex nature of it. In 2011, the Group proposed the McKinsey analysis technology that can be used with Big Data include crowdsourcing (leverage resources from multiple computing devices worldwide to jointly handle the data), the gene and genetic algorithms, machine learning methods (note only the system has the ability to learn from the data, a branch of artificial intelligence), natural language processing (like Siri or Google Voice Search, but more advanced), signal processing, simulation, time series analysis, modeling, strong server combined together. ... This technique is very complicated so we're not going to say about them.