Code for String Kernels
This page provides information on downloading the string
kernel code for spectrum/mismatch kernel
[1,2] and profile kernel [3].
Code for other variants [4] of the string
kernels will be available at a later date. The code for the
spectrum/mismatch kernel and profile kernel are packaged together
with sample data files and the motif extraction software
(specifically for the profile kernel). The PSIBLAST profile
for 7329 sequences (using 5 iterations) has been included, as
well as the 54 experimental setup for the profile kernel
experiments. You can design your own experiments and create
your own set of profiles. Included are also license files and
a number of README files which will facilitate your using of the
software.
Note: A version of SPIDER is included in the
distribution. The SVM training/testing requires MATLAB to work with
SPIDER. For more information about spider, please see
http://www.kyb.tuebingen.mpg.de/bs/people/spider/.
Release Notes:
- Version 1.2 - September 26, 2004, fixed bug in profile
kernel code for trie data structure traversal.
Also, package now uses SPIDER for SVM training and testing.
This requires MATLAB.
- Version 1.1 - July 30, 2004, fixed bug in run_scripts/normalize_matrix.pl.
- Version 1.0 - March 30, 2004, Original release.
Please fill in the following form. Your information is used
solely for gathering statistics about the usage of the string
kernel code and will not be given out to anybody nor used by
us for any other purposes.
References
[1] C. Leslie, E. Eskin,
and W. Noble.
Spectrum kernel: A string kernel for SVM protein classification.
Proceedings of the Pacific Symposium on Biocomputing,
January 2-7, 2002. pp. 474-485.
[2] C. Leslie, E. Eskin, A. Cohen, J. Weston,
and W. Noble.
Mismatch String Kernels for Discriminative Protein Classification.
Bioinformatics, 20:4, pp. 467-476, 2004.
[3] R. Kuang, E. Ie, K. Wang, K. Wang,
M. Siddiqi, Y. Freund and C. Leslie.
Profile-based string kernels for remote homology detection and motif
extraction.
Accepted, Proceedings of the IEEE Computational Systems
Bioninformatics 2004, Stanford, August, 2004.
[4] C. Leslie and R. Kuang.
Fast Kernels for Inexact String Matching.
Proceedings of the Conference on Learning Theory and Kernel
Workshop, 2003.
Tze Way Eugene Ie
Last modified: Sat Sep 25 16:41:20 EDT 2004