Friday, January 22, 2010

Counting letters in Microsoft Word files using C# and Office Interop

Today we will learn how to work with Office Interop. I never had a chance to try it, till some one asked me for a batch script that will count characters in a bunch of Word files. I choose to use C# instead, hoping I can coin some thing up in under 5 minutes. It took me a bit more, mainly because of deployment pain: both .NET 3.5 SP 1 had to be installed on other machine (with reboot) and I had to include interop DLL.

First, I created a simple command line project for C#. You will need to add a reference to Microsoft.Office.Interop.Word to your project and I would recommend choose TRUE in properties to copy it to your project: this way it will be easy for you just to provide EXE and DLL when you distribute, also automatic publish option in VS 2008 will pack it up for you. Your application will not run if this DLL is not present on target system.


Now, working with Word is easy. All you need is




    Word.ApplicationClass wordApp = new ApplicationClass();


and do not forget to call






    doc.Close(ref falseObj, ref nullobj, ref nullobj);

or you will end up with many "ghost" WINWORD.EXE in memory.

Now to get your hands on Word file DOM, you need just call this function with 16 parameters! :-) If you want to edit your file or do other fancy things, read MSDN help. 

Word.Document doc = wordApp.Documents.Open(ref fileObj, ref falseObj, ref trueObj,
                         ref falseObj, ref nullobj, ref nullobj,
                         ref nullobj, ref nullobj, ref nullobj,
                         ref nullobj, ref nullobj, ref falseObj,
                         ref nullobj, ref nullobj, ref nullobj, ref nullobj);


I open doc file in read only, no show mode here. If you like diferent options, check MSDN and good luck counting these arguments!

All I need for my task is

long count = doc.Characters.Count;

but you can have as much fun as you like with doc object. Do not forget to call

doc.Close(ref falseObj, ref nullobj, ref nullobj);

when you done. Again, if you would like to Edit file, you will need use a diferent set of flags. Click F1 on fucntion and read MSDN help.

File enumeration part is not as interesting: I just read all doc files in one folder: that was my task.

Below is a full text of this small utilty:


//-----------------------------------------------------------------------
// <copyleft file="WordFilesLetterCount.cs">
// Do what you wilt with this code and have fun on your own risk!
// </copyleft>
//-----------------------------------------------------------------------
namespace WordFilesLetterCount
{
   using System;
   using System.IO;
   using Microsoft.Office.Interop.Word;
   using Word = Microsoft.Office.Interop.Word;

   class Program
   {
       static void Main(string[] args)
       {
           if (args.Length == 1)
           {
               long totalCount = 0;
               long fileCount = 0;

               Word.ApplicationClass wordApp = new ApplicationClass();

               object nullobj = System.Reflection.Missing.Value;
               object trueObj = true;
               object falseObj = false;
              
               DirectoryInfo di = new DirectoryInfo(args[0]);
               FileInfo[] rgFiles = di.GetFiles("*.doc*");
               foreach (FileInfo fi in rgFiles)
               {
                   if (fi.Attributes != FileAttributes.Temporary &amp;&amp;
                       fi.Attributes != FileAttributes.Hidden &amp;&amp;
                       fi.Attributes != FileAttributes.System)
                   {
                       if (!(fi.FullName.Contains("~") &amp;&amp; fi.FullName.Contains("$")))
                       {
                           try
                           {
                               object fileObj = fi.FullName;
                               Word.Document doc = wordApp.Documents.Open(ref fileObj, ref falseObj, ref trueObj,
                                          ref falseObj, ref nullobj, ref nullobj,
                                          ref nullobj, ref nullobj, ref nullobj,
                                          ref nullobj, ref nullobj, ref falseObj,
                                          ref nullobj, ref nullobj, ref nullobj, ref nullobj);

                               //doc.ActiveWindow.Selection.WholeStory();
                               //doc.ActiveWindow.Selection.Characters.Count;

                               long count = doc.Characters.Count;
                               totalCount += count;

                               Console.WriteLine("File: {0} Character Count: {1}", fi.Name, count);

                               doc.Close(ref falseObj, ref nullobj, ref nullobj);

                               fileCount++;
                        }
                        catch (Exception ex)
                        {
                           Console.WriteLine("Error {0} with file: {1}", ex.Message, fi.Name);
                        }
                    }
                 }
               }

               wordApp.Quit(ref falseObj, ref nullobj, ref nullobj);

               Console.WriteLine(
                   "Total: {0} letters counted in {1} files ((*.doc*) only in one directory {2}",
                   totalCount,
                   fileCount,
                   args[0]);
           }
           else
           {
               Console.WriteLine("Usage: WordFilesLetterCount [directoryWithWordFiles]");
           }
       }
   }
}