Types of Data analysis

In the field of statistics , some people divide data analysis into descriptive statistical analysis, exploratory data analysis, and confirmatory data analysis; among them, exploratory data analysis focuses on discovering new features in the data, while confirmatory data analysis is Focus on the verification or falsification of existing assumptions.

Exploratory data analysis

Exploratory data analysis refers to a method of analyzing data in order to form a worthy hypothesis test, which is a supplement to traditional statistical hypothesis testing methods. This method is named by the famous American statistician John Tukey.

Qualitative data analysis

Qualitative data analysis, also known as “qualitative data analysis”, “qualitative research” or “qualitative research data analysis”, refers to the analysis of non-numerical data (or data) such as words, photos, and observations.

Offline data analysis

Offline data analysis is used for more complex and time-consuming data analysis and processing. It is generally built on cloud computing platforms, such as the open source HDFS file system and MapReduce computing framework. The Hadoop cluster consists of hundreds or even thousands of servers, storing several petabytes or even dozens of petabytes of data. Thousands of offline data analysis jobs are running every day, and each job handles hundreds of MB to hundreds of TB or more. The running time is minutes, hours, days or even longer.

Online data analysis

Online data analysis is also called online analysis and processing , which is used to process users’ online requests. It requires relatively high response time (usually no more than a few seconds). Compared with offline data analysis, online data analysis can process user requests in real time, allowing users to change the constraints and restrictions of analysis at any time. Compared with offline data analysis, the amount of data that can be processed by online data analysis is much smaller, but with the development of technology, current online analysis systems have been able to process tens of millions or even hundreds of millions of records in real time. The traditional online data analysis system is built on a data warehouse with a relational database as the core, while the online big data analysis system is built on the NoSQL system of the cloud-computing platform. If there is no online analysis and processing of big data, there will be no way to store and index a huge number of Internet webpages, there will not be today’s efficient search engines, nor will there be microblogs, blogs, and social networks based on big data processing. And so on.