C4.5 to FOIL Format

Actually I found FOIL provides a c function to convert a C4.5 format to FOIL.

C4.5 Format
Dataset in C4.5 are composed of two parts: .data file and .names file. Actually this format is used in Quinlan's C4.5 decision tree. I guess this is why it is called C4.5 format and why Quinlan will provide a function for us to automatically convert it to FOIL

A more detailed introduction for C4.5 format can be found http://www.cs.washington.edu/dm/vfml/appendixes/c45.htm

Compiling the code
cd FOIL gcc c4tofoil.c -o cf ./cf -f crx
 * 1) FOIL is where you have all codes extracted from the shell archive
 * 1) compile the code and name the excutable file as cf
 * 1) specify the file you want to convert. crx.data and crx.names should exist in current folder. crx.test is an optional one.
 * 2) the output will be stored in crx.d

Notes:
In the header of c4tofoil.c, there is some comments about how the *.data and *.names should be formatted. Just make sure you follow that. For convenience, I just pasted here /*****************************************************************************/ /*                                                                          */ /* Program to convert files from the standard C4.5 input format to a form    */ /* that can be used by FOIL                                                 */ /*                                                                          */ /* The relation to be found by FOIL will be of the form -                    */ /* is first class named in the .names file                                  */ /*                                                                          */ /* Hence changing the order of the class names will cause FOIL to find other */ /* relations from the same data                                             */ /*                                                                          */ /* Compilation and use:                                                      */ /* cc -o cf c4tofoil.c (produce executable cf)                              */ /* cf -f filestem (take filestem.names and filestem.data (and filestem.test */ /*                if present) and produce filestem.d for FOIL)              */ /* option -v produces some extra output on the standard output              */ /*                                                                          */ /* (Any error messages are currently printed on the standard output stream)  */ /*                                                                          */ /* Modification required to filestem.names:                                  */ /* Each line containing attribute information should have information       */ /* specifying the type - this is added as a C4.5 comment thus... */ /*                                                                          */ /* (attribute info for C4.5) | type: typename                                */ /*                                                                          */ /* where typename is the name of the type of this attribute. Each typename  */ /* shall start with a letter (upper or lower case) and no typename shall be */ /* the prefix of another typename. */ /* (The latter restriction is required as the output for FOIL distinguishes */ /* between constants of different types by prefixing them with their         */ /* typename). */ /*                                                                          */ /* For example:                                                              */ /* aardvarkish: true, false. | type: Boolean                                */ /*                                                                          */ /* Note that values of discrete attributes which occur in the data file      */ /* become theory constants for FOIL. (However those that occur in the test  */ /* file but not data file, are just constants, not theory constants). */ /*                                                                          */ /*****************************************************************************/ BTW, in my experiment, the change of line '\n' is something you should pay extra attention to. Make sure it is unix formatted and sometimes deleting empty lines can make the conversion work.