Light DataAgent

7-30-2007: Bug Update. v2.1 Executable | v2.1 Source

3-30-2007: Gives the option of converting 0 classifiers to -1 (SVMlight requires {-1,1}).
3-16-07: Light DataAgent v1.3; Expanded usage for floating point numbers.
3-15-07: Light DataAgent v1.2; Expanded usage in other environments;
3-4-07: Light DataAgent v1.1; bug Remedy; in some environments, SVMlight had errors reading the last data row.

Links:
Learn about SVMlight here
Learn about SVMdark here

The Light DataAgent application is designed to take space, tab or comma delimited data files and convert them into the format required by SVMlight (and SVMdark). It is a simple application which requires no previous knowledge of any programming language. It can also convert response values of 0 to -1 (SVM requires {-1,1} classification rather than {0,1}).The instructions and examples are provided below. Please cite use of the Light DataAgent application as the following:

Light DataAgent, Ophir Gottlieb, 2007

Space (or Tab) Delimited Example:
Space (or tab) delimited files are very common, let's take an example of a data file (original_data.txt). In order to use the Light DataAgent application the original data file must have the following format (and NO DATA LABELS) and you must know the number of attributes:

Response Attribute_1 Attribute_2 ..... Attribute_N

Where the "Response" is either 1 or -1 for SVM classification or a number for SVM regression. For example, original_data.txt is a classification dataset and looks like this:

1 93 49 90 15 95 19 65 68 62 75 9 82 64
-1 90 56 88 38 50 25 60 67 63 49 59 80 60
1 10 58 5 50 50 53 27 47 16 20 50 50 28
-1 12 59 7 50 50 53 23 48 14 20 50 50 24
1 11 58 52 50 77 67 18 92 13 24 50 50 8

In this case, there are 13 attributes excluding the classification (N = 13). In order to get this original data into proper SVMlight (or SVMdark) format, we will run the Light DataAgent application. We will call the new, properly formatted file, "formatted_data.txt."


And the new data file (formatted_data.txt) looks like this:

1 1:93 2:49 3:90 4:15 5:95 6:19 7:65 8:68 9:62 10:75 11:9 12:82 13:64
-1 1:90 2:56 3:88 4:38 5:50 6:25 7:60 8:67 9:63 10:49 11:59 12:80 13:60
1 1:10 2:58 3:5 4:50 5:50 6:53 7:27 8:47 9:16 10:20 11:50 12:50 13:28
-1 1:12 2:59 3:7 4:50 5:50 6:53 7:23 8:48 9:14 10:20 11:50 12:50 13:24
1 1:11 2:58 3:52 4:50 5:77 6:67 7:18 8:92 9:13 10:24 11:50 12:50 13:8

That's it! For larger files, the Light DataAgent application will provide notification in 100 row increments that it is still processing.

Contact: Ophir Gottlieb; ophirg @ stanfordalumni "dot" org.

Disclaimer: Absolutely no guarantees about the Light DataAgent application are made. The application has been minimally tested. No one makes any representation about the suitability, reliability, availability, timeliness or accuracy of the information, software, products, services or related graphics contained on this site for any purpose. All such information, software, products, services and related graphics are provided “as is” without warranty of any kind. I hereby disclaim all warranties and conditions with regard to this information, software, products, services, and related graphics, including all implied warranties of merchantibility, fitness for a particular purpose, title and non-infringement. IN NO EVENT SHALL I BE LIABLE FOR ANY DIRECT, INDIRECT, PUNITIVE, INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF USE, DATA OR PROFITS, ARISING OUT OF OR IN ANY WAY CONNECTED WITH THE USE OR PERFORMANCE OF THE Light DataAgent APPLICATION, WITH THE DELAY OR INABILITY TO USE THE APPLICATION OR RELATED SERVICES, THE PROVISION OF OR FAILURE TO PROVIDE SERVICES, OR FOR ANY INFORMATION, SOFTWARE, PRODUCTS, SERVICES AND RELATED GRAPHICS OBTAINED THROUGH THE APPLICATION, OR OTHERWISE ARISING OUT OF THE USE OF THE APPLICATION, WHETHER BASED ON CONTRACT, TORT, NEGLIGENCE, STRICT LIABILITY OR OTHERWISE, EVEN IF I HAVE BEEN ADVISED OF THE POSSIBILITY OF DAMAGES. Because some states/jurisdictions do not allow the exclusion or limitation of liability for consequential or incidental damages, the above limitation may not apply to you. If you are dissatisfied with any aspect of the application, your sole and exclusive remedy is to discontinue using the application.