Few years ago, no one would have imagined that the Internet would produce an immense volume of data that would be analyzed and processed to improve knowledge and decision making in companies.
But, the truth is that day after day, the information in digital format doubles at an unimaginable speed, which, generates a fairly high amount of data.
A lot of data so high, it is necessary to resort to the so-called Big Data process or also known as data intelligence, to be able to process and analyze them.
However, to deepen this discipline it is necessary to know and understand its most basic and technical terms. Now a days big data correlate with artificial intelligence in information technology industry.
That is why, in this article, we have developed a glossary of terms so that you are much more familiar with the fascinating world of Big Data:
Big Data Terminology Key Points
1. Data Science (Data Science)
Data Science covers scientific methodology, processes and systems to extract knowledge or have a much deeper understanding of the data, in order to successfully solve the analytical problems that may arise.
2. Data analyst
Is the person who is able to collect and analyze the data using statistical techniques. That will allow you to know the structure of the data to interpret and establish value strategies.
The algorithm is a fundamental pillar in the digital age and allows us to find and express that something we are looking for. In the case of Big Data, it looks for patterns and relationships between variables.
Most of algorithms are created with the sole purpose of automating the processing of the large amount of data that is generated daily.
4. Predictive analytics
As indicated by its name, it is a science that is responsible for predicting the future of a company through its historical data and aims to improve planning and optimize results, through the implementation of techniques based on statistical algorithms such as Predictive modeling
5. Business analytics
It is a technique oriented to the exploration of large volumes of data with a focus on statistical analysis.
That is used to obtain congruent, current and structured information that helps define business decisions to have a competitive advantage over other companies.
6. Business Intelligence (Business Intelligence)
It allows the data obtained to be transformed into structured information, through a set of methodologies that act as a strategic front before the decision-making process in a business.
7. Machine Learning
It is one of the many branches of artificial intelligence, which has as main focus ” teach ” the machines to solve various tasks through data, in order to automate the resolution of a problem without prior and explicit programming of the steps to follow to achieve it.
8. Deep Learning
It belongs to the same set of Machine Learning but, in addition to learning to solve tasks from data, is able to learn to represent the data to reach the solution.
On the other hand, it requires that the data be very well structured to be able to function optimally.
The interesting thing about the latter is that this feature allows Deep Learning to cover problems with a high level of solution for a machine, such as artificial vision and speech recognition.
9. C ++
It is a hybrid programming language that was developed with the intention of extending the C programming language with mechanisms that make it possible to manipulate objects.
This programming language has the distinction of redefining operators and allows creating new types that behave as fundamental types.
10. Data Mining or Data Mining
The main objective of data mining is to extract information from a specific set of data and then transform it into a comprehensible structure for its use.
To use it, it is necessary to use methods of artificial intelligence, machine learning, statistics and database systems.
11. SQL (Structured Query Language)
It is a standardized language that is used in programming and its main function is to define, manage, manipulate and recover data from a relational database.
Its main characteristics are the handling of algebra and relational calculation.
12. NoSQL (Not Only SQL)
Represents database management systems. Its main objective is to solve the Big Data performance problems, since the relational databases are not designed to solve them.
13. Weka (Waikato Environment for Knowledge Analysis)
It is a software platform for automatic learning and data mining.
Weka, contains a collection algorithms for data analysis, predictive models and visualization tools, which are subject to a graphical user interface to access all its functions in a simple way.
It is a multi-paradigm programming language, which allows programmers to handle various programming styles, among which are object-oriented programming, imperative programming and functional programming.
Among its most outstanding features is the dynamic name resolution or also known as methodical link, which links a method and a variable name, while the program is running.
Also, new modules can be written very easily in C or C ++.
15. Internet of things (IoT)
It refers to the digital connection that daily internet objects have with each other.
In other words, they are elements that have unique identifiers and are capable of transferring data through a network, without requiring human interaction in order to collect data for the subsequent verification of the client’s usage patterns.
16. Perl (Practical Extracting and Reporting Language)
It is a scripting language, which is made up of the C languages, Bourne Shell, AWK and to a lesser extent other programming languages.
Its main function is to extract information from text files to generate reports, it has also been used to clean and debug data.
17. Data Warehouse
Like the Data Lake, the Data Warehouse is a store of data and information, which stores data that is clearly necessary to perform analyzes and reports.
This tool, in addition to storing the information purified for later use, also allows you to reserve those queries that you made previously, as well as the analyzes that have already been created.
UIMA (Unstructured Information Management Applications)
It is a software architecture that was created for the development, discovery, composition and expansion of multimodal analytics in order to analyze a set of unstructured information in order to reveal significant data for the end user.
18. Computational Linguistics (Computational Linguistics)
It is a discipline of Artificial Intelligence that is responsible for describing the functioning of natural language so that it can be transformed into executable programs on a computer.
Computational Linguistics is a joint work between a linguist and specialist engineers, who must transform existing voice and text data into a structured language that allows artificial intelligence to understand and process in order to generate a response.
Random Forest (Random Forest)
It is a composition of predictive trees in which each tree is dependent on the values of a randomly tested vector independently.
The objective of this method is to achieve a more predictive analysis in comparison with other learning algorithms.
19. Cloud Computing
It is an accumulation of principles and approaches that allow a user through a network to access computing infrastructure, services, platforms, data and applications from the cloud.
It should be mentioned that clouds are groups of resources that are managed through management and automation software to facilitate access to users, as requested.
To use a cloud computing, it is necessary to have specific operating systems, virtualization software and automation and management tools.
20. Sentiment Analytics (Sentiment Analysis or Opinion Mining)
It is the union between natural language processing, text analysis and computational linguistics to determine the attitude of an interlocutor or user with respect to a specific topic, whether written or spoken.
If you are passionate about Big Data and want to become an expert, be sure to learn, understand and use these basic concepts to perfection.