Hot questions for Using Neural networks in rapidminer

Question:

I have created a neural net model in Rapid miner, but the results are not what I expected, results are related to some sort of intermediate, in order to achieve the final results I need to custom query on the result set generated by the neural net model, now the questions are:

1.How can I query the result set? 
2.Or how can I import that result set of neural net in a database then use read database operator to query it. 
3.Or how can I export the neural net model's result set in a csv file so I can Import it into a database for further processing?

Answer:

When you train a neural net, you first create a model object. What you then need to do is to apply that model on your testing data, which should not be the same data as the one used for training. Take a look at that sample process below (you can also just copy&paste the xml into your RapidMiner process window 1):

For importing the results in a database or a csv file there are special operators for that called either Write CSV or Write Database, for the later you also have to define a connection first under the menu entry Connections -> Manage Database Connections

You can also take a look the training section of the RapidMiner community where there a lot of training videos and related material: Free training material

1:

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
<process expanded="true">
  <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Weighting" width="90" x="45" y="136">
    <parameter key="repository_entry" value="//Samples/data/Weighting"/>
  </operator>
  <operator activated="true" class="split_data" compatibility="8.2.000" expanded="true" height="103" name="Split Data" width="90" x="246" y="136">
    <enumeration key="partitions">
      <parameter key="ratio" value="0.7"/>
      <parameter key="ratio" value="0.3"/>
    </enumeration>
    <description align="center" color="yellow" colored="true" width="126">Split the data into training and a testing set (ratio 70% and 30%)</description>
  </operator>
  <operator activated="true" class="neural_net" compatibility="8.2.000" expanded="true" height="82" name="Neural Net" width="90" x="447" y="34">
    <list key="hidden_layers"/>
    <description align="center" color="green" colored="true" width="126">Train the neural net here</description>
  </operator>
  <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="648" y="136">
    <list key="application_parameters"/>
    <description align="center" color="green" colored="true" width="126">Apply the trained net on the test data</description>
  </operator>
  <operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="841" y="136">
    <list key="class_weights"/>
    <description align="center" color="orange" colored="true" width="126">Check how well the network worked on the data and the see output of classification</description>
  </operator>
  <connect from_op="Retrieve Weighting" from_port="output" to_op="Split Data" to_port="example set"/>
  <connect from_op="Split Data" from_port="partition 1" to_op="Neural Net" to_port="training set"/>
  <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
  <connect from_op="Neural Net" from_port="model" to_op="Apply Model" to_port="model"/>
  <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
  <connect from_op="Performance" from_port="performance" to_port="result 1"/>
  <connect from_op="Performance" from_port="example set" to_port="result 2"/>
  <portSpacing port="source_input 1" spacing="0"/>
  <portSpacing port="sink_result 1" spacing="0"/>
  <portSpacing port="sink_result 2" spacing="0"/>
  <portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

Question:

I successfully apply a neural net operator in rapidminer on a data set in which I have 3 columns and the 4th one the labelled one

column1|column2|column3|column4(labelled)
data   |data   |data   |data  

, now I have a testing data in order to predict the value of labelled column based upon the column1, column2, column3, testing data looks like:

column1|column2|column3
data   |data   |data   

Question: is this correct?

Using this approach, I created a model so that the process can predict value of unlabelled column:

Then, using the solution in the below reference :

Split data solution

I again created a model using split data, for this I combined my data set for training and testing (now the combined data has some values for labelled column and some does not have this column value as this is the part of testing data).

But still I am getting this error.


Answer:

from what I can see the problem is, that you don't apply the Nominal to Numerical operator to your test set. In the default settings, this operator creates a dummy encoding for each nominal value found in the specified attribute. In your case you will have a column/attribute named "Course1=A" with a 1 as entry for each example where the original column was "A" and so on.

What you need to do is to apply the same encoding to your test data as to your training data. As you can see, the Nominal to Numerical operator has an additional output port called pre (short for preprocessing model). This can be used apply the same pre-processing steps (like normalization or encoding) on multiple data sets.

For convince you can also also group several models into one by using the Group Model operator.

See the process XML below (just c&p it into the process view of RapidMiner) for an example.

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
  <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Golf"/>
  </operator>
  <operator activated="true" class="nominal_to_numerical" compatibility="8.2.000" expanded="true" height="103" name="Nominal to Numerical" width="90" x="179" y="34">
    <list key="comparison_groups"/>
    <description align="center" color="purple" colored="true" width="126">Transform the nominal attributes into a dummy encoding with 0/1 for each expression.&lt;br&gt;This encoding is then also delivered via &amp;quot;pre&amp;quot; output port.</description>
  </operator>
  <operator activated="true" class="neural_net" compatibility="8.2.000" expanded="true" height="82" name="Neural Net" width="90" x="447" y="34">
    <list key="hidden_layers"/>
  </operator>
  <operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Golf-Testset" width="90" x="45" y="340">
    <parameter key="repository_entry" value="//Samples/data/Golf-Testset"/>
  </operator>
  <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="447" y="340">
    <list key="application_parameters"/>
  </operator>
  <operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="648" y="340">
    <list key="application_parameters"/>
  </operator>
  <connect from_op="Retrieve Golf" from_port="output" to_op="Nominal to Numerical" to_port="example set input"/>
  <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Neural Net" to_port="training set"/>
  <connect from_op="Nominal to Numerical" from_port="preprocessing model" to_op="Apply Model (2)" to_port="model"/>
  <connect from_op="Neural Net" from_port="model" to_op="Apply Model" to_port="model"/>
  <connect from_op="Retrieve Golf-Testset" from_port="output" to_op="Apply Model (2)" to_port="unlabelled data"/>
  <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Apply Model" to_port="unlabelled data"/>
  <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
  <portSpacing port="source_input 1" spacing="0"/>
  <portSpacing port="sink_result 1" spacing="0"/>
  <portSpacing port="sink_result 2" spacing="0"/>
  <description align="center" color="green" colored="true" height="103" resized="true" width="315" x="433" y="433">First apply the &amp;quot;preprocessing&amp;quot; model so the test data have the same structure&lt;br/&gt;&lt;br/&gt;Then apply the trained neural net</description>
</process>
</operator>
</process>

Also feel free to ask further, or re-post, questions in the RapidMiner community forum.

Question:

I'm trying to use a neural network by training it on trainData and then testing on testData, as anyone would do. However, the data requires dummy coding of some nominal features to numerical. When I do that, it trains the neural network but fails when applying it to the test data (on which I apply the exact same transformations/blocks) because of a mismatch in the dummy coding*.

*The error message is in the lines of: v47=H does not exist in testData

I checked and it is true that testData does not have the value 'H' at all in v47, while trainData has it. Therefore, I'd like to ignore this 'H' in v47, or replace it.

Any way I could do this easily? Keeping in mind this might happen with other features as well and going through all the features, one by one, to fix this kind of issue, would be very time consuming.

Perhaps there's another way to tackle this?

Thanks!


Answer:

This is similar to a previous post

This answer suggests combining the test and training data to cause all possible values of a nominal to be present then splitting to recover the test and training sets again. The possible additional nominal values will be retained in both splits.

This may not suit so another possibility is to use the Data to Weights operator on the training example set. The resulting weights can then be used with the Select by Weights operator to keep only the attributes of interest in the test example set.