----------------------------------------------------------------------------------------------------------------------------

Defrag the data identifiers

General

If you randomly pick a new GUID each time you add a new item to the FileStorage, the FileStorage's index file will grow rapidly. Its not unlikely if you file storage holds around 140.000 items, the index file will have grown up to around 4 GB (yes, that's right, four giga bytes; about 1 DVD filled with data just for indexing). Its even likely this index file is even larger than the data contents you store in the .data file.

Rather than using unstructured and truly random GUIDs, in some cases its OK to use incremental GUIDs. If this is acceptable, you might want to defrag your data identifiers, using the defrag command in the CLI.

Defragging using the CLI

If you want to defrag a FileStorage yourself, you can use the CLI, by passing the old file storage name, the new file storage name, and by specifying a SQL table, and a SQL column:

H:\CLI>FileStorageCmd.exe defrag youroldfile DefraggedFileStorage mytable mycolumn
. 4 (4 files/sec, 00:00:00 mins) Reading indexes....................................................
.
/ 4 (4 files/sec, 00:00:00 mins) Writing............................................................
File storage optimization finished
This operation took 611 msecs

What happens when defragging

If we compare the verbose 'dir' between the original and the defragged filestorage, we can see what has happened;

Original

H:\Proj\CodePlex\NFileStorage\FileStorageCmd\bin\Debug>FileStorageCmd.exe dir opt_v1.3 verbose
1
. 4 (4 files/sec, 00:00:00 mins) Dir................................................................
.
Data identifier                      | Text identifier  | Creation date     | Size
-------------------------------------+------------------+-------------------+-----------------
2f97b016-bcba-4d1e-a5e8-1371d1fc4a21 | **************** | 20090412 07:38:22 | 11.264
9f0e3f19-eb0f-4381-a530-c8c1e5eae4d8 | **************** | 20090412 07:38:23 | 11.264
9ff92cbf-1529-44cf-bdfa-5f5cc70bb6d8 | **************** | 20090412 07:38:23 | 11.264
a337a5fa-ca4d-4166-a10e-eaafc7acf679 | **************** | 20090412 07:38:23 | 11.264
4 files found (45.056 bytes)
This operation took 469 msecs

Defragged

H:\Proj\CodePlex\NFileStorage\FileStorageCmd\bin\Debug>FileStorageCmd.exe dir DefraggedFileStorage verbose
1
. 4 (4 files/sec, 00:00:00 mins) Dir................................................................
.
Data identifier                      | Text identifier  | Creation date     | Size
-------------------------------------+------------------+-------------------+-----------------
00000000-0000-0000-0000-000000000000 | **************** | 20090412 07:43:52 | 11.264
00000000-0000-0000-0000-000000000001 | **************** | 20090412 07:43:52 | 11.264
00000000-0000-0000-0000-000000000002 | **************** | 20090412 07:43:52 | 11.264
00000000-0000-0000-0000-000000000003 | **************** | 20090412 07:43:52 | 11.264
4 files found (45.056 bytes)
This operation took 384 msecs

So each (random) GUID from the original is mapped to a incremental new GUID. Note that just altering the data identifiers would become a trouble for the program that uses the data identifiers. Likely you will have a database that contains a pointer to a specific item (like the identifier of a Person pointing to the Data identifier that contains information about that person). If we would alter the data identifier in the filestorage, also we would have to alter the GUID in the database ofcourse. This is the reason why the CLI command for defragging has two additional parameter that let you specify a table name, and a column name. Besides the defrag command producing an index and data file, a third file is produced; a SQL file. The SQL file will assist you in upgrading your DB contents.

Below you can see an example of the contents of this .SQL file;

H:\CLI>type DefraggedFileStorage.FileStorage.index.fc.sql
-- SQL Patch script to adjust the dataidentifiers
update mytable set mycolumn='2f97b016-bcba-4d1e-a5e8-1371d1fc4a21' where mycolumn='00000000-0000-0000-0000-000000000000'
update mytable set mycolumn='9f0e3f19-eb0f-4381-a530-c8c1e5eae4d8' where mycolumn='00000000-0000-0000-0000-000000000001'
update mytable set mycolumn='9ff92cbf-1529-44cf-bdfa-5f5cc70bb6d8' where mycolumn='00000000-0000-0000-0000-000000000002'
update mytable set mycolumn='a337a5fa-ca4d-4166-a10e-eaafc7acf679' where mycolumn='00000000-0000-0000-0000-000000000003'
-- EOF

Con's of defragging

Pro's of defragging

01-04-2009  11:36     3.992.971.364 example.FileStorage.index.fc
11-04-2009  16:57         1.200.228 example_optimized.FileStorage.index.fc

----------------------------------------------------------------------------------------------------------------------------