Morten Nielsen, CBS, BioCentrum, DTU CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU Neural Network training.

Download Report

Transcript Morten Nielsen, CBS, BioCentrum, DTU CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU Neural Network training.

Morten Nielsen,
CBS, BioCentrum,
DTU
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Neural Network training
• How
– Classification neural network
• Howlin
– Real value neural network
• Nnlinplayer
– Neural network player i.e. no training
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Neural network programs
• How and howlin clumsy but very fast
and efficient Fortran programs
• Three important files
– Parameter file; howlin.dat
– Data file
– Synaps (weight) file
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
How2doit
• Format of output file
• Plotting training and test
performance
– howlinplot fileout
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Output
• Neural networks can learn higher order
correlations!
– What does this mean?
0 0 => 0
0 1 => 1
1 0 => 1
1 1 => 0
No linear function can
learn this pattern
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Neural networks
w11
w12 w21
v1
w22
v2
w11=1, w12=-1 w21=1 w22=-1
V1 = 0.5 v2= -0.5
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Neural networks
• Use weight file(s) to generate neural network
predictions
• Format
– nnlinplayer synapsfilelist inputfile
• Makes consensus prediction over N neural
networks
• Input file must be generated separately
– seq2inp data
• Using pipes
– seq2inp data | nnlinplayet synlist --
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
nnlinplayer
• Classification network
• Generates input data directly from
sequence
– RIISSIEQKEENKGGEDKLKMIREYRQMVE
• Input is how files
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
how
•
•
•
•
•
•
fasta2pep
seq2inp
ranlines
splitfile
balanceset
xycorr
• Examples
fasta2pep ex.fsa | grep -v # | seq2inp -- |
grep -v # | ranlines -- | grep -v # | splitfile
-nc 4 -seq2inp data | nnlinplayer synlist -- |
grep -v # | args 1,3 | xycorr
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Useful programs
•
•
Copy all files from
– /usr/opt/www/pub/CBS/researchgroups/immunology/intro/NeuralNet
works/exercise/* to some directory
Open the file doit
– What does the program do?
– Run the program and save the output to a file named datafile
– Make a howlin neural network training
• Set the number of hidden neurons in the howlin2002.dat file to
0
• Run the training typing
– howlin2002 < howlin2002.dat > output
• Plot the training/test performance using the howlinplot
program
• Redo the training using 2 hidden neurons
– Check the synaps file. What are the weight values?
•
Do the prediction of T cell epitopes exercise
www.cbs.dtu.dk/courses/27485.imm/exercise5/index.php
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS TECHNICAL UNIVERSITY OF DENMARK DTU
Exercises