Greg's Blog

helping me remember what I figure out

Working With CSV Files

| Comments

As part of an Intranet project we needed to import a CSV (comma separated values) file from a software package to our Intranet database (sometimes there just isn’t an ODBC driver around). In the following I just wanted to show you how I got Cold Fusion to parse through the text file, grab the values required, store them in a structure and then inserted these into a table.

Let’s start off with the code:

<cffile action=”READ” file=”d:\inetpub\wwwroot\test_file.csv” variable=”file_text”><!— -Load the file to be parsed- —>
<cfset stRecords = StructNew()><!— -Create new structure- —>
<cfloop list=”#file_text#” index=”i” delimiters=”#chr(13)#”><!— -Loop over the text file using the return character as the delimiter, i.e. looping over every row in the text file- —>
<cfset counter = 1><!— -Initialise counter- —>
<cfloop list=”#i#” index=”j” delimiters=”,”><!— -Now loop over the individual row using comma as the delimiter- —>
<cfif counter eq 1><!— -The column identifier we know occupies position 1 (always use trim to remove white space)- —>
<cfset item_id = #trim(j)#>
<cfelseif counter eq 2><!— -The unique user_id occupies position 2 (always use trim to remove white space)- —>
<cfset user_id = #trim(j)#>
<cfelseif counter eq 3><!— -The columns value is stored in position 3 (always use trim to remove white space)- —>
<cfset item = #trim(j)#>
</cfif>
<cfset counter = counter + 1><!— -Increment counter- —>
</cfloop>
<cfset stRecords[“#trim(item_id)#”] = ‘#trim(item)#’><!— -Add the column id and it’s value to the structure- —>
<cfif item_id eq 54><!— -We know that this is the last column to be parsed so know we can safely insert our new record- —>
<!– Your insert statement goes here and will look like this –>
INSERT INTO your_table<br />
(user_id, column_1, column_2, column_3, column_4, column_5, column_6, column_7, column_8, column_9, column_10, column_11, column_12, column_13)<br />
VALUES<br />
(<cfoutput>#user_id#, #stRecords[“7”]#, #stRecords[11]#, #stRecords[12]#, #stRecords[13]#, #stRecords[14]#, #stRecords[41]#, #stRecords[43]#, #stRecords[44]#, #stRecords[51]#, #stRecords[52]#, #stRecords[53]#, #stRecords[54]#</cfoutput>)
<p></p>
<cfscript>
/*Now clear the structure before starting the next chunk to be processed */
structClear(stRecords);
</cfscript>
</cfif>
</cfloop>

Step one involved telling Cold Fusion where to find the file to be parsed and storing the contents of that file in a variable called file_text using the CFFILE tag. Now the text file we were working with had the added ambiguity in that the records for weren’t stored in single rows but spanned multiple rows. However each row had a column identifier. Listed below I have included a sample of the data to be imported for clarities sake

1,”EMUPDV41”,”Employee Master Table”,09072001,1640
7,”00010”,”02 1234-5678”
11,”00010”,”22 Parkland Street”
12,”00010”,”North Sydney”
13,”00010”,”2000”
14,”00010”,”NSW”
41,”00010”,”Y”
43,”00010”,”N”
44,”00010”,”H”
45,”00010”,”Y”
51,”00010”,”062192”
52,”00010”,”193194195”
53,”00010”,”Commonwealth Bank of Australia”
54,”00010”,”Veronica Macgerkinshaw”
99

The rows that are prefixed with the numbers 1 and 9 were header and footer rows respectively. The values we were interested in were between the row numbers 7 and 54, where for example the number 7 was the row identifier (in this case named item_id) for phone numbers. Furthermore we new that the user_id was the second value and existed in each row. The third and final value was the actual value (stored later on in item) of the row identifier for that user. So in our example a user with the user_id of 00010 had the phone number: 02 1234-5678. Armed with this information I decided that I would need to store these values in a structure before I could insert them into the database. So I started out by declaring my structure: stRecords.

Next it was time to do some actual work. I started of by looping over the the contents of the text file which I had previously stored in the variable file_text. Using the chr(13) (carriage return) as my delimiter I was able to break the file into rows. The following lines sets a counter variable. This variable will be used shortly to determine the position of the loop in the actual row. Then it was time to loop over the current row of my first loop. Using the index value i I set about grabbing the individual values of the first row in my imported text file.

Knowing that each row was comprised of three values (and subsequently the loop would loop three times over every line in the text file), I used the counter variable (which increments by one for every loop) to determine first the column name (counter = 1), then the user_id (counter = 2) and finally the value stored in that column (counter = 3). At each step of the loop I stored the values in the aforementioned variables: item_id, user_id and item.

Once the row had been parsed the loop terminated and I was able to add the resulting values to my structure stRecords. First I used the value stored in item_id to determine the structure key and then set it’s value corresponding to the one stored in the variable item. Then it was on to the next row. At this point you may have noticed the use of the trim() function. Simply used to remove any whitespace from the values.

Now of course there were going to multiple recordsets for many many users, so we needed to figure out where the final row for a user was and where the next one began. Remember I mentioned that the values I was interested in were between 7 and 54, well 7 marks the first row in a user set and 54 the last. So by checking the value of item_id I was able to ascertain when I had reached the end of a set and then commit those changes to the database. Having stored the user details in a structure, I then proceeded to access that the structure keys to populate the table columns. So in our example the user’s phone number had an ID of 7 and a value 02 1234-5678. In my table I stored the user’s phone number in column_2, so all that was now needed was so to specify the position in the structure, which is stRecords[“7”] (remember I set the structure key equal to row identifier?). And so on and so forth for the remaining columns.

All that was left to do was to clear the structure before the loop moved on to the next set of user details and repeated the whole process until the footer (99) was reached. I know this is quite specific to a problem we had, but I do hope that it shows you how you can access text files, parse them, store the values you need (in a structure) and then insert these into a database table. ‘Til the next time.