An Algorithm is a process or a set of rules which are followed in a definite sequence during calculations or problem-solving operations. Algorithms are an ordered set of rules or instructions which determine how a certain task is accomplished in order to achieve the expected outcome.

Performing Numerical Analysis

What is Numerical Analysis
In the modern data analytics, numerical analysis is the use of algorithms to solve continuous mathematical problems arising from the real-world applications. These applications may be associated with fields like natural sciences, medicine, engineering, social sciences, business, and so on. The variables that are manipulated in numerical analysis vary continuously.

Why is Numerical Analysis required
In the 20th century, there has been an exponential rise in computational power and its potential to solve complex mathematical problems. As such, the use of mathematical algorithms for solving highly sophisticated and intricate mathematical models has also increased manifold. The variety and magnitude of data being generated in the world today is beyond human analytical capacity. Hence, numerical analysis is required to tackle this numerical data efficiently and with precision.

How is Numerical Analysis done in rubiscape
In rubiscape, there is a classified set of products to cater to analysis of numerical data. The entire bouquet of algorithms for numerical analysis are placed under rubistudio.
In rubistudio, for numerical analysis, we have

  • Data Preparation section, where there are several algorithms for cleaning and transforming raw data into analyzable data.
  • Statistical Analysis section, where there are several algorithms to perform various statistical operations like correlation detection, hypothesis testing, and so on to draw inferences.
  • Code Fusion section, where users can build their own data analytical models in Python and R and integrate them in rubiscape.

rubistudio provides various algorithms to perform numerical analysis. These algorithms are categorized as,

  • Data Preparation
  • Statistical Analysis
  • Code Fusion

In the task pane, click rubistudio.


Figure: Numerical Analysis

For more information, refer to rubistudio.

Data Preparation

What is Data Preparation
Data preparation is the process of cleaning and transforming raw data into organized data so that it can be processed and further analyzed. In data preparation, data is reformatted, corrected, and combined so that it gets enriched.
Why is Data Preparation required
Data preparation is complex yet essential to create relevant contextual data. This makes the analysis of such data, efficient, and produces reliable and insightful results. In the absence of precise data preparation, we may get a biased data which may result in poor analysis and erroneous results.

How is Data Preparation done in rubiscape
In rubiscape, there is a comprehensive set of algorithms for performing data preparation. They are used singularly or in combination with other algorithms to remove any anomaly in the dataset. Each algorithm has a specific function, which can be used to enhance the data quality. In rubiscape, the user can find the missing values, create additional data columns, merge and join data, and so on.
In rubiscape, the Data Preparation algorithms are,

  • Aggregation
  • Expression
  • Filtering
  • Data Joiner
  • Data Merge
  • Sorting
  • Descriptive Statistics
  • Factor Analysis
  • Missing Value Imputation
  • Outlier Detection
  • PCA

In the task pane, click rubistudio, and then click Data Preparation.


Figure: Data Preparation Algorithms

For more information, refer to Data Preparation Algorithms

Statistical Analysis

What is Statistical Analysis
Statistical analysis is a major component in data analysis that applies statistical tools on the test data, analyzes it, and effectively draws useful inferences and future trends.
Statistical analysis is of two types— descriptive and inferential.
Descriptive statistics simply summarizes the data in a meaningful way, in the form of charts and tables. It is an easy way to understand, interpret, and visualize data.
Inferential statistics helps the user to go beyond this and help to test a hypothesis (null hypothesis) and draw conclusions and insights from the data.

Why is Statistical Analysis required
Statistical analysis is at the core of data analysis. All algorithms and models are built with the intention to execute statistical operations on the data and draw inferences and insights from it.

How is Statistical Analysis done in rubiscape
In rubiscape, there is an exhaustive set of algorithms for statistical analysis of numerical data. There are algorithms for determination of correlation and hypothesis testing. Thus, statistical analysis in rubiscape becomes profound in terms of its descriptive and inferential nature. Users can not only visualize the data, but also draw inferences from it, that can be used for decision-making and predicting future trends.
In rubiscape, the Statistical Analysis algorithms are,

  • Correlation
  • Hypothesis Test

In the task pane, click rubistudio, and then click Data Preparation.


Figure: Statistical Analysis Algorithms

For more information, refer to Statistical Analysis.

Creating your Customized Algorithm

Code fusion in rubiscape gives an ingenious option to the users to build their models in programming languages such as JAVA, R, or Python and integrate them into rubiscape.
Code fusion makes the rubiscape platform more customizable and user-friendly.
The two options available within code fusion are given below.
rubiPython
rubiPython is an option within Code Fusion on the rubiscape platform where users can write their codes using the Python programming language for building models. It is a custom code component that can be used to add Python code to form event and customize the model through Python code. RubiR
rubiR is an option within Code Fusion on the rubiscape platform where users can write their codes using R programming language while building models.
In the task pane, click rubistudio, and then click Code Fusion.

Figure: Code Fusion

For more information, refer to Code Fusion.

Performing Textual Analysis

What is Textual Analysis
Textual analysis is an automated process to interpret textual content and derive meaningful data from it. It is a qualitative analysis performed using AI-powered natural language processing (NLP) tools. These tools help understanding the correlation between the text and the context in which they are produced. There are several trained algorithms in machine learning (ML) for automated analysis of textual data.
Why is Textual Analysis required
Textual analysis is vital in industry and academia to analyse unstructured business and academic data. With the advent of social media, its commercial exploitation by businesses for promotion of their products and also feedback received from customers is increasing every day.
Textual analysis is an effective way to gauge the popularity of the product as well as understanding the sentiment of the potential customers regarding their product.
How is Textual Analysis done in rubiscape
It is not possible to manually read, analyze, and tag textual data because of its humongous size and its dynamic nature. The results delivered will be inconsistent, non-scalable, and enormously time-consuming.
In rubiscape, users can perform precise textual analysis and deliver accurate results in a short period. rubiscape contains an array of algorithms and techniques for all related tasks like pre-processing, clustering, classification, sentiment analysis, and text vectorization. Using dedicated dashboards, we can visualize this data to draw important inferences.
rubitext provides various algorithms to perform textual analysis. These algorithms are categorized as,

  • Classification
  • Clustering
  • Pre Processing
  • Sentiment
  • Text Vectorization

In the task pane, click rubitext.


Figure: Textual Analysis

For more information, refer to rubitext algorithms

Classification

Data classification is the process of tagging and organizing data according to relevant categories. This makes the data secure and searchable. This makes the data easy to locate and retrieve when needed.
Data classification can be content-based, context-based, or user-based.
In rubiscape, the Classification algorithms are,

  • Adaboost
  • Maximum Entropy
  • Naive Bayes
  • Support Vector Machine

In the task pane, click rubitext, and then click Classification.



Figure: Classification Algorithms

For more information, refer to Classification Algorithms

Clustering

Data clustering (or cluster analysis) is a method of dividing the data points into several groups called clusters. All data points within a cluster are mutually similar as compared to data points belonging to different clusters. Thus, clusters are groups segregated based on identical traits.
Clustering can be hard or soft. In hard clustering, a data point either belongs to a cluster completely or not. In soft clustering, the probability of a data point belonging to a cluster is determined.
In rubiscape, the Clustering algorithms are,

  • Centriod Based Clustering
  • Connectivity Based Clustering
  • Density Based Clustering
  • Incremental Learning

In the task pane, click rubitext, and then click Clustering.


Figure: Clustering Algorithms

For more information, refer to Clustering.

Pre Processing

In its general sense, data preprocessing is a data mining technique to transform raw data into useful and analyzable form. It involves data cleaning, data transformation, and data reduction.
With respect to textual analysis, pre-processing involves multiple algorithms dedicated to convert a raw and imprecise data into cleaned and ready-to-analyse data. Each algorithm has its own specific objective to be fulfilled. This can be case conversion, lemmatization, counting word frequency, removal of punctuations, extraction of advanced entity, and so on. These algorithms are either used in singularity or in combination with other algorithms.
In rubiscape, the Pre Processing algorithms are,

  • Case Convertor
  • Custom Words Remover
  • Frequent Words Remover
  • Lemmatizer
  • Punctuation
  • Remover
  • Spelling Corrector
  • Stemmer
  • Advanced Entity Extraction
  • Word Correlation
  • Word Frequency

In the task pane, click rubitext, and then click Pre Processing.


Figure: Pre Processing Algorithms

For more information, refer to Pre-processing Algorithms

Sentiment

A sentiment is a view, opinion, feeling, intention, or emotion which has polarity. The polarity can be positive, negative, or neutral.
Basic sentimental analysis is the mining of textual data to extract subjective information from the source. This helps businesses to understand the social sentiment about their product, service, or brand based upon the monitoring of online conversations.
In this analysis, the algorithm divides the text into sentences and assigns some index of reference to them. The algorithm treats texts as Bags of Words (BOW), where the order of words, and as such the context is ignored. The original text is filtered down to only those words that are thought to carry sentiment.
In rubiscape, the Sentiment algorithms are,

  • Basic Sentiment Analysis

In the task pane, click rubitext, and then click Sentiment.


Figure: Sentiment Algorithms

For more information, refer to Sentiment Analysis.

Text Vectorization

Natural Language Processing requires transforming text into numbers for machines to understand and analyze the text. In NLP, it is required to convert text into a set of real numbers or vectors to extract useful information from the text. This process of converting strings/text into a meaningful array of real numbers (or vectors) is called vectorization.

Text vectorization maps words or phrases as real numbers to corresponding words from a vocabulary to find word predictions and similarities.

Text vectorization in NLP helps to perform the following textual analysis tasks:

  • Extract features for text classification.
  • Compute the occurrence of similar words.
  • Compute the probability of occurrence of similar words.
  • Compute the relevance of features in a text.
  • Predict the next words in a sequence of words.

In Rubiscape, two Text Vectorization algorithms are available.

  • CountVectorizer
  • TF-IDF (Term Frequency-Inverse Document Frequency)

In the task pane, click rubitext, and then click Text Vectorization.

Figure: Text Vectorization Algorithms

For more information, refer to Text Vectorization

Understanding the Algorithm Properties

The Algorithm Properties define dataset and algorithm tasks along with the data fields associated with them. They are displayed to the right of the canvas when you select a reader or an algorithm.

Dataset Properties

Dataset properties are called as Reader Properties. They display Task Name and Data Fields as shown in the figure below. The data fields are the column names present in the dataset. You are required to select the appropriate Data Fields that you want to use for your analysis.
The figure given below is an example of Dataset Properties. Here, COVID – 19 INDIA is the Task Name (which is editable) which is the name of the dataset (also called as Reader) and Sno, Date, Time, and few more are selected as Data Fields. You can select single or multiple data fields as required.

Figure: Dataset Properties

Algorithm Properties

Algorithm Properties are called as Data Management Properties. They show Task Name and Features associated with it. The Features are different data fields that you can select, which depend on the connection of your dataset and algorithm. Also, there are different parameters that you need to provide before you run the algorithm. These parameters are algorithm specific and vary with every algorithm. For more information on algorithms, refer to Algorithm Reference Guide.
The figure given below is an example of Algorithm Properties. It is a simple Sorting algorithm. Here, Sorting is the Task Name (which is editable) which is the name of the algorithm. Features are the data fields that are displayed depending on the connected dataset. Here, Date and Deaths are the data fields selected from the COVID – 19 INDIA dataset.

Figure: Data Management Properties

Using Features, you can select the required data fields that you want to sort. Hover over the feature and use the Gear icon ( ) to select the parameters, which in this case is ascending or descending order.


Figure: Changing Algorithm Parameters for Sorting

(info)Note:

The Data Management Properties depend on the algorithm selected. This is a simple example. More complex list of properties can appear for other algorithms.

Hyper Parameter Optimisation

Hyper Parameter Optimisation is used for better Optimisation of given dataset. This feature provides you with the list of parameters that will give best solution for your dataset among the parameters were provided for optimizing the solution. You can select more than one parameter from the list for Hyper Parameter Optimisation.

Hyper parameter Optimisation can be applied on both Classification as well as Regression algorithms.

To perform the hyper parameter Optimisation, follow the steps given below.

  1. Open a Workbook. Refer to Opening a Workbook.
  2. Build your Algorithm flow in workbook canvas.
  3. Select the algorithm node.

    Figure: Selecting Algorithm

    The Properties pane is displayed on the right side.

  4. Select the Dependent and Independent variables.
  5. Select the rest of the parameters based on the algorithm selected.
  6. Select the Hyper Parameter Optimisation check box.
    Figure: Selecting Hyper Parameter Optimisation
    The list of parameters is displayed.
    There are two categories of Parameters:
    • Default Parameters
    • Parameters to Tune
  7. Click Default Parameters drop-down () to select default parameters.
  8. Select the desired parameters from the list and click Done.
    Figure: Selecting Default Parameters
  9. After selecting the parameters to tune, enter their values.

    (info)

    Notes:

    • For categorical parameters, select the values from the respective drop-down. For numerical parameters, enter the value.
    • For default parameters, you can select only one value for each of the parameters.

      After selecting the parameters to tune, enter their values.

    A sample parameter selection is displayed below.

    Figure: Sample values of Default Parameters
  10. Click Parameters to Tune drop-down () to select default parameters.
  11. Select the desired parameters from the list, and then click Done.
    Figure: Selecting Parameters to Tune
  12. After selecting the parameters to tune, enter their values.

    (info)

    Notes

    • For categorical parameters, select the values from the respective drop-down. For numerical parameters, enter the value.
    • For parameters under Parameters to Tune, you can select more than one values for each of the parameters. You can enter comma separated multiple values.


    A sample parameter selection is displayed below.

    Figure: Sample values of Parameter to Tune
  13. Save the Workbook.
  14. Run the Workbook. Refer to Running a Workbook.

    (info)Note:

    If you select the Hyper Parameter Optimisation check box but run the workbook without selecting the Default Parameters and Parameters To Tune, you will not get results.

    After the workbook execution is complete, a confirmation message is displayed.

  15. Select the algorithm node, click the ellipsis, and then click Explore.
    The result is displayed.

    Figure: Hyperparameter Result

You can change the parameters and Metrix as required and view the results for different values of the selected parameters.

Table of contents