Rubiscape provides a wide range of algorithms to take care of your data analysis requirements. It also provides you the flexibility to create your own customized algorithms using Code Fusion.

Code fusion in Rubiscape gives an ingenious option to build your own models in programming languages such as R and Python and integrate them into Rubiscape. You can write your own code and use it in combination with the algorithms provided on the Rubiscape platform to create your customized models.

You can compare the results achieved using your customized code and the results provided by the model and refine the model to achieve more precision.

It also provides the flexibility to deploy models into production in a few clicks.

Using Code Fusion, you can

  • Create columns which were not originally present in your dataset. (percentage, square, exponential, and so on of the existing columns).
  • Perform ETL tasks – you can clean the data, do a missing value imputation, filter data, and more.
  • Verify the accuracy of existing models and prepare new models if required.

To use Code Fusion, in the task pane, click rubistudio, and then click Code Fusion.

Figure: Code Fusion in RubiStudio

The two options available within Code Fusion are explained in the below sections.

RubiPython

RubiPython is a feature within Code Fusion on the rubiscape platform where you can write your code in the Python programming language for building models.

You can use RubiPython node as individual or you can connect it to your Reader node (dataset) or other algorithm nodes. You can connect multiple predecessors or preceding tasks to a single RubiPython node. Each predecessor task along with all the columns present in the task appear in the RubiPython configuration screen. You can differentiate each predecessor task and its columns by their names in the screen. This is useful in cases where you have large data that you want to combine or split and perform different actions on it.

To use RubiPython, follow the steps given below.

  1. Create your algorithm flow. Refer to Building Algorithm Flow in a Workbook Canvas.
  2. Drag and drop RubiPython on your workbook canvas.
  3. Connect required nodes to RubiPython in your algorithm flow.

    (info)

    Notes:

    • You can connect multiple predecessors to RubiPython. The output of all the predecessors is available for your use in the RubiPython node.
    • If you connect dataset to RubiPython, the fields of the dataset are available as input to RubiPython.
    • If you connect algorithm to RubiPython, the columns in the resultant data are available as input to RubiPython.

    Figure: Selecting RubiPython under Code Fusion
  4. Select RubiPython and in the Properties pane, click Configure.

    Table: Parameters of Functions to Write to TableFigure: Configuring RubiPython

    The RubiPython configuration screen is displayed.

    Figure: RubiPython Configuration

    The fields/icons on the RubiPython configuration screen are described in the table below.

    Table: Description of Fields on RubiPython Configuration Screen
Icon/FieldDescription

Available Input Variables

It displays the features (columns) present in the preceding task’s output. These features are stored in a variable named ‘inputData’, which is of dictionary data type.  

Dictionary is a data type which contains name-value pairs.

You can access the features in the 'inputData' variable and process them to create your custom variables.

You can access all the preceding task names, the features (columns) present in each task, and the index of each row in the task in the 'inputData' variable.

You can access all the preceding task names in the variable named ‘inputData.keys()', which is of dictionary data type.

You can access all the features (columns) present in a particular preceding task and the index of each row in the 'inputData[‘dataset_ name’]' variable.

You can access a single feature (column) from the features present in a particular preceding task and the index of each row in the 'inputData[‘dataset_ name’][‘column_name’]' variable.
Custom Output VariablesIt displays a list of output variables created by you. These variables are stored in the form of a Dictionary.
Add Custom Output VariableIt helps you to create your own custom output variable from the existing variables (Features of the dataset). Refer to Creating Custom Output Variable.
Add Multiple Custom Output VariableIt helps you to create multiple custom output variables at a time from the existing variables (Features of the dataset). Refer to Creating Multiple Custom Output Variables.
DEFINITE OPTION OUTPUT VARIABLE

If selected, only the Custom Output Variables are passed on to successor.

In case of multiple predecessors connected to RubiPython component, this flag is false (not selected) by default. You can select it as per your requirements.
INPUT CARRY FORWARD FLAG

If selected, the variables received as input from RubiPython’s predecessor tasks are passed on to the successor task; not otherwise.

Note: This is possible only when single predecessor is connected to RubiPython; not possible when multiple predecessors are connected to RubiPython.

This flag is disabled in case of multiple predecessors are connected to RubiPython. So, the input variables will not appear in the Data page when you explore RubiPython. Upon exploring RubiPython, you can view the custom output variables for RubiPython in its Data page if you have created the variables.
INPUT SAME AS OUTPUT

If selected, the input to RubiPython node is passed on as is to the successor task; not otherwise.

In case of multiple predecessors connected to RubiPython component, this flag is false (not selected) by default. You can select it as per your requirements.
Python Code EditorThe Python Code Editor helps you to add your customized Python code. Refer to Using Python Code Editor.
TRAINING REQUIREDFunctionality coming soon.

It saves the changes and closes the Python Code Editor.

Creating Custom Output Variable

To create custom variable in your RubiPython code, follow the steps given below.

  1. On the RubiPython screen, click Add Custom Output Variable.
    Create Custom Variable screen is displayed.
  2. Enter a Name for your output variable.
  3. Select the type of your output variable from the Variable Type drop-down. The options are Categorical, Numerical, Interval, and Textual.

    (info)

    Note:

    Make sure the data type of the newly created output variable matches with the data type of the corresponding input variable. If the variable types do not match, the application will give error when you run the algorithm flow.


  4. Select data type of the output variable from the Data Type drop-down. The options are Integer, Textual, Float, and Boolean.
  5. Click Create.
    Figure: Creating Custom Variable in RubiPython

    The output variable is created and is added to the Custom Output Variables list.

Creating Multiple Custom Output Variables

You can create multiple custom output variables by providing a JSON string.

To create multiple custom output variables in your RubiPython code, follow the steps given below.

  1. On the RubiPython screen, click Add Multiple Custom Output Variable.
    Create Custom Variable screen is displayed.
  2. Enter a JSON sting of the format <Variable1 Name>, <Variable Type>, <Data Type>; <Variable2 Name>, <Variable Type>, <Data Type>;….<VariableN Name>, <Variable Type>, <Data Type>.
    For example, to add variables Name and Age of type Text and Integer respectively, the string would be – Name, Textual, Textual; Age, Numerical, Integer.
  3. Click Validate to validate the string.
    Figure: Validating Custom Variable Input String
    If the string is valid, you will get a confirmation message. If the string is invalid, your will get an error message. You can correct the errors and try again.
  4. After you make sure the string is valid, Click Create.
    Figure: Creating Multiple Custom Variables in RubiPython

    The specified output variables are created and are added to the Custom Output Variables list.

    (info)

    Notes:

    • In the JSON file each variable should have three parameters in it as Variable Name, Variable Type, and Data type in string format.
    • If the above string format is not followed in the JSON file it will give an error while validating.
    • The variable name should not be repeated.

Using Python Code Editor

In the Python code editor, you can add your Python code.

  • The input variables are stored in the form of a Dictionary data type inputData.
  • Similarly, newly created variables are stored in another Dictionary type variable output.
  • To use new variables in your code, you first need to create them. Refer to Creating Custom Output Variable.
  • To print to the console, use print2log().

A sample Python code is shown in the image below.

Figure: Sample Python Code – Single Predecessor

In the above code,

  • data = inputData creates a copy of inputData with the name data.
  • print2log(data) prints the contents of inputData to the console log.
  • output = {} creates a new variable named ‘output’ of type dictionary.
  • output['newSepalLength'] = data[‘Iris CSV’]['Sepal.Length'] * 2.54
    Assigns value of Sepal.Length of Iris CSV dataset multiplied by 2.54 to column newSepalLength.

    (info)

    Notes:

    • Make sure the data types of your output and its corresponding input variable are same. If the data types are not same, RubiPython will give error.
    • print2log() is a customized function created by Rubiscape, which internally uses Python print() function.

    A sample Python code to access predecessor tasks is shown in the image below.

Figure: Sample Python Code – Multiple Predecessors

In the above code,

  • print2log(inputData.keys()) prints all the predecessor task names referenced by inputData to the console log or custom component log as dictionary.
  • print2log(inputData) prints the contents (predecessor task names, column names, and index) of inputData to the console log or custom component log.
  • print2log(inputData[‘taskname’]) prints the contents (column names and index) of the predecessor task to the console log or custom component log.
  • print2log(inputData[‘taskname’][‘columnname’]) prints the contents (column name and index) of a particular column of the predecessor task to the console log or custom component log.

Writing Custom Functions to Access Data using RubiPython

You can use custom functions in RubiPython to

  • Read a dataset without connecting to the Reader node.
  • Write data to a file or to an RDBMS table without using the Writer functionality of writing to template file or writing to template table. Refer to Writing to Template File and Writing to Template Table.

The following sections explain these custom functions.

Reading Data from File

A sample Python code to read data from file is shown in the image below.

Figure: Sample Python Code to Read Data from a File

In the above code,

  • getReaderData(“datasetName”,“subdatasetName”) custom function checks the type of the dataset (excel, CSV, text) and accordingly reads the dataset and returns it as an output of the function of type dictionary.
  • print2log(data) prints the Reader output data as dictionary to the console log or custom component log.

    (info)

    Note:

    In the getReaderData custom function, the dataset and the sub-dataset names are the same for all datasets except for RDBMS datasets. In case of RDBMS datasets, sub-dataset name is the name of the table added in the RDBMS dataset.

Writing Data to File

A sample Python code to write data to file is shown in the image below.

Figure: Sample Python Code to Write Data to a File

In the above code, writeDataToFile custom function is used to append a row to the selected Carbo Fitness file (dataset).

  • dataToWrite stores the output data of type dictionary to write to a file.
  • writeDataToFile(dataToWrite,“action”,“delimiter”,“datasetName”) custom function appends or overwrites the output data to the selected file (Excel, or CSV, or Text dataset).

The table below describes the writeDataToFile function and parameters.

Table: Parameters of Functions to Write to File
FunctionParametersRemarks
writeDataToFile(dataToWrite,“action”,“delimiter”,“datasetName”)dataToWrite – Output data of type “dictionary” to be written to the file
action – The action to perform on the file. You can append or overwrite more than one row in the selected dataset.It is of type String and values can be overwrite and append.
delimiter – The character that separate the columns in a dataset.It is of type String and values can be “,” / “|” / “          ” / “ ” (comma, pipe, tab, and space).
datasetName – Name of the dataset file to which you want to write to.It is of type String.

Another example of reading and writing to a file is shown below.

Figure: Sample Python Code to Read and Write Data to a File

Reading Data from Table

You can use getReaderData to read the data from RDBMS Table.

Figure: Sample Python Code to Read Data from Table

In the above code,

  • getReaderData(“datasetName”,“subdatasetName”) custom function checks the type of the dataset and accordingly reads the dataset from the Reader and returns it as an output of the function of type dictionary.
  • print2log(dataToWrite) prints the output data as dictionary to the console log or custom component log.

    (info)

    Note:

    In the getReaderData custom function, the dataset and the sub-dataset names are the same for all datasets except for RDBMS datasets. In case of RDBMS datasets, sub-dataset name is the name of the table added in the RDBMS dataset.

Writing Data to Table

Similarly, you can use writeDataToTable to write the data into RDBMS Table.

Figure: Sample Python Code to Write Data to Table

The table below describes the writeDataToTable function and parameters.

Table: Parameters of Functions to Write to Table
FunctionParametersRemarks
writeDataToTable(dataToWrite, “dropAndCreate”, “datasetName”, “tableName”, “strategy”, keyColumns)




dataToWrite – Output data of type “dictionary” to be written to RDBMS table.


dropAndCreate – Flag which decides whether to delete an existing table and create new or overwrite the existing table.It is of type Boolean. The value can be True or False. The default value is True which deletes the existing rows in the table.

datasetName – Name of the RDBMS dataset that contains the table you want to edit.

It is of type String.
tableName – Name of existing/new table in the RDBMS dataset that you want to edit.It is of type String.

strategy – The database operation you want to perform on the specified table.

It is of type String and possible values are insert, update, and delete.
You can insert, update, or delete more than one row or column in the selected table in the RDBMS dataset.
keyColumns – A list of column names that needs to be defined as primary key or that are already defined as primary key.
  • In case of insert, the keyColumns parameter should be kept empty.
  • You cannot perform update action on columns which are defined as primary key.

(info)

Notes:

  • The dataset to which you are writing, should be already existing. If you want to write to a different dataset, you need to first create the dataset.
  • You can add an existing or a new table in the dataset while creating an RDBMS dataset. You can add more than one table.
  • You can apply these custom functions on all types of datasets present in Rubiscape.
  • Make sure the data types, the number of columns, and the column names of your output and its corresponding input dataset are same, otherwise RubiPython will give error.

Accessing User Activity Log using Code Fusion

This feature allows you to retrieve the logs of the user activities using the custom component – RubiPython. The advantage is the User Activity Log can be captured in a template. You can view and download the user activity log using custom functions provided by Rubiscape.

Viewing User Activity Log

This feature allows the administrator to view the activity log for a particular date.

To view the user activity log, follow the steps given below.

  1. Open the desired workbook.
  2. From the left panel, under rubistudio, drag and drop RubiPython on the workbook canvas.
  3. Select RubiPython.
    Properties are displayed in the right pane.
  4. Click Configure.
    Figure: Configuring RubiPython Node

    Python Code Editor window is displayed.

  5. In the Python Code Editor, enter the Python code.
    A sample code is shown in the figure below.
    Figure: Sample Python Code

    The table given below describes the functions used in the above sample Python code.

    Table: Functions in Python Code

    Commands

    Description

    getUserLogList()

    It is a custom function defined by Rubiscape that gets the user log list.

    print2Log()

    It is a custom function defined by Rubiscape that prints information to Log console.

    retrieveUserLogs()

    It is a custom function defined by Rubiscape that retrieves the user logs and stores them in a user-defined variable.

  6. After completing the code, click Close.
  7. To run the RubiPython node, click the ellipsis ( ) on RubiPython and click Run.

    (info)
    Note
    :

    You can also run the workbook. To run the workbook, select the Run option ( ) on the Function Pane at the top of the Page.

  8. To view the output, click Log.
    Figure: Viewing RubiPython Log

    The Workbook Log is displayed.

  9. To view the user log, click CustomComponentLog.

    The user activity log is displayed as shown in the figure below.

    Figure: RubiPython Log – User Activity Log

    As shown in the above figure, there are various fields in the User Activity Log.
    The table given below describes the fields present on the User Activity Log page.

    Table: Fields of User Activity Log

    Field

    Description

    User Name

    It displays the name of the user who performed the activity.

    User Id

    It displays the id of the user who performed the activity.

    User Tenant Id

    It displays the id of the tenant under which the user is created.

    Timestamp

    It displays the timestamp in the UTC time zone.

    Entity Type

    It displays the type of entity on which the activity was carried out in the application.

    Activity Type

    It displays the type of activity that the user carried out.

    Entity Name

    It displays the name of the entity on which the activity was carried out.

    Entity Key

    It displays the entity key that is unique to every entity.

Writing User Activity Log to Template File

This feature helps to write the user log in a template file. It helps you to download the activity log as a csv file on the local drive.

To write the user activity log to a template file, follow the steps given below.

  1. To navigate to the download option, refer to Viewing User Activity Log, steps 1 to 4.
  2. Write the code in the Python Code Editor.

    (info)

    Notes:

    • In the code, declare a custom output variable for the filename and assign it a name which will be the name of the template file.
    • The user activity log is in dictionary type format. For example, {“User Name”: list of user names, “User Id”: Identification number of the user}, and so on. You can use these fields to extract particular information while creating your template file. For a list of fields of user activity log, refer to Table: Fields of User Activity Log.
    • Create the custom variables of required datatypes corresponding to the fields you want to extract from the logs.
  3. After finishing the code, click Close.
    Figure: Python Code to Write User Activity Log to Template File
  4. From the left pane, under rubistudio, under Writer, drag and drop TemplateFile on the Workbook canvas.

    Figure: Selecting TemplateFile
  5. Connect RubiPython to TemplateFile.
  6. Click TemplateFile.
    Properties are displayed in the right panel.
  7. Enter the Dataset Name for the TemplateFile.

    Figure: Template File Name
  8. To run the RubiPython node, hover over the node, click the ellipsis, and click Run.

    The selected user log fields are extracted in the template file, and a dataset with the given filename is created. The newly created dataset is displayed under Datasets.

    In the above example, we have created a file with the name 23-Dec-2020-Log, and it appears in the Datasets as shown in the figure below.

    Figure: Newly Created Dataset Containing User Activity Log
    This newly created dataset contains the fields specified in the Python code and can be used in other workbooks.

RubiR

RubiR is a feature within Code Fusion on the rubiscape platform where you can write your code in the R programming language for building models.

You can use RubiR node as individual or you can connect it to your Reader node (dataset) or other algorithm nodes. You can connect single predecessor or preceding task to a single RubiR node.

To use RubiR, follow the steps given below.

  1. Create your algorithm flow. Refer to Building Algorithm Flow in a Workbook Canvas.
  2. Drag and drop RubiR on your workbook canvas.
  3. Connect RubiR to the required node in your algorithm flow.
    Figure: Selecting RubiR under Code Fusion
  4. Select RubiR and in the Properties pane, click Configure.
    Figure: Configuring RubiR
    The RubiR configuration screen is displayed.


Figure: RubiR Configuration

In the R Code Editor, you can add your R code.

(info)

Notes:

  • Currently, in RubiR, you cannot access input variables or define custom output variables.
  • You can connect a reader to RubiR, but cannot access any of the features of the reader. You can use RubiR as a single node.

A sample R code is shown in the image below.

Figure: Sample R Code

In the above code,

  • data = inputData creates a copy of inputData with the name data.
  • print(data) prints the contents of inputData to the console log.
  • output = {} Creates a new variable named ‘output’ of type dictionary.
  • output['newSepalLength'] = data[‘Iris CSV’]['Sepal.Length'] * 2.54
    Assigns value of Sepal.Length of Iris CSV dataset multiplied by 2.54 to column newSepalLength.

(info)

Note:

Make sure the data types of your output and its corresponding input variable are same. If the data types are not same, RubiR will give error.

Table of Contents