generating excel 2010 workbooks by using the open xml sdk 2.0

Upload: caroline-cameron

Post on 04-Jun-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    1/15

    Generating Excel 2010 Workbooks by Using

    the Open XML SDK 2.0Office 2010

    Summary: This document demonstrates how to using the Open XML SDK 2.0 to manipulate an MicrosoftExcel 2010 workbook.

    Applies to: Microsoft Excel 2010

    Published: April 2011

    Provided by: Steve Hansen, Grid Logic

    Contents

    Introduction to the Open XML File Format

    Peeking Inside an Excel File

    Manipulating Open XML Files Programmatically

    Manipulating Workbooks using the Open XML SDK 2.0

    Conclusion

    Additional Resources

    About the Author

    Download the sample code1

    Introduction to the Open XML File Format

    Open XML is an open file format for the core document-oriented Office applications. Open XML is designed tobe a faithful replacement for existing word-processing documents, presentations, and spreadsheets that are

    encoded in binary formats that are defined by the Microsoft Office applications. Open XML file formats offerseveral benefits. One benefit is that the Open XML file formats ensure that data that is contained in

    documents is c an be accessed by any program that understands the file format. This helps ensureorganizations that the documents they create today will be available in the future. Another benefit is that it

    fac ilitates document creation and manipulation in server environments or other environments where it is not

    possible to install the Office client applications.

    True to its moniker, Open XML files are represented by using XML. However, instead of representing a

    document by using a single, large XML file, an Open XML document is actually represented by using a

    collect ion of related files, called parts, that are stored in a package and then compressed in a ZIP archive.An Open XML document package complies with the Open Packaging Conventions (OPC) specification, acontainer-f ile technology to store a combination of XML and non-XML files that c ollectively form a single

    entity.

    Peeking Inside an Excel File

    One of the best ways to gain an initial understanding of how everything works together is to open aworkbook file and take a look at the pieces. To examine the parts of an Microsoft Excel 2010 workbook

    package, merely change the file name extension from .xlsx to .zip. As an example, consider the workbookshown in Figures 1 and 2.

    Figure 1. Simple workbook

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    2/15

    This workbook contains two worksheets: Figure 1 shows a worksheet containing sales by year while the

    worksheet shown in Figure 2 contains a simple chart.

    Figure 2. Basic chart in a workbook

    By changing the name of this workbook from Simple Sales Example.xlsx to Simple Sales Example.zip, you can

    inspect the structure of parts within the file container or package using Windows Explorer.

    Figure 3. Part structure of a simple workbook

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    3/15

    Figure 3 shows the primary folders inside the package along with the parts stored in the worksheets folder.Digging a bit deeper, Figure 4 provides a peek at the XML encountered in the part named sheet1.xml.

    Figure 4. Example of the XML inside a worksheet part

    The XML shown in Figure 4 provides the necessary information that Excel needs to represent the worksheet

    shown in Figure 1. For example, within the sheetDatanode there are row nodes. There is a row node forevery row that has at least one non-empty cell. Then, within each row, there is a node for each non-empty

    cell.

    Notice that cell C3 shown in Figure 1 contains the value 2008 in bold font. Cell C4, meanwhile, contains the

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    4/15

    value 182, but uses default formatting and does not contain bold font. The XML representation for each of

    these cells is shown in the Figure 4. In particular, the XML for cell C3 is shown in the following example.

    To keep the size of Open XML files as compact as possible, many of the XML nodes and attributes have veryshort names. In the previous fragment, the crepresents a cell. This particular cell specifies two attributes: r

    (Reference) and s(Style Index). The reference attribute specifies a location reference for the cell.

    The style index is a reference t o the style that is used to format the cell. Styles are defined in the sty les

    part (styles.xml) which is found in the xl folder (see the xl folder in Figure 3). Compare cell C3s XML with cellC4s XML shown in the following example.

    Because cell C4 uses default formatting, you do not have to specify a value for the style index attribute.

    Later in this article, you learn a little more about how to use style indexes in an Open XML document.

    Although it is very helpful to learn more about the nuances of the Open XML file formats, the real purpose of

    this artic le is to show how to use the Open XML SDK 2.0 for Microsoft Office to programmatically manipulateOpen XML documents, spec ifically Excel workbooks.

    Manipulating Open XML Files Programmatically

    One way to programmatically create or manipulate Open XML documents is to use the following high-level

    pattern:

    1. Open/create an Open XML package

    2. Open/create package parts

    3. Parse the XML in the parts that you need to manipulate

    4. Manipulate the XML as required

    5. Save the part

    6. Repackage the document

    Everything except steps three and four can be achieved fairly easily using the classes found in theSystem.IO.Packagingnamespace. These classes are designed to make it easy to handle working with OpenXML packages and tasks associated with high-level part manipulation.

    The hardest part of this process is step four, manipulating the XML. For this part it is criticallynecessary forthe developer to have a high degree of understanding of the many tedious details required to successfully

    work with the many nuances of the Open XML file formats. For example, previously you learned that

    formatting information for a c ell is not stored with a cell. Instead, the formatting details are defined as astyle in a different document part and the style index associated with the style is what Excel stores inside a

    cell.

    Even with a generous knowledge of the Open XML specificat ion, the thought of manipulating so much raw

    XML programmatically is not a task that many developers look forward too. That is where the Open XML SDK

    2.0 comes in.

    The Open XML SDK 2.0 was developed to simplify manipulating Open XML packages and the underlying Open

    XML schema elements inside a package. The Open XML SDK 2.0 encapsulates many common tasks thatdevelopers perform on Open XML packages so that instead of working with raw XML, you can use .NET

    classes that give you many design-time advantages such as IntelliSense support and a type-safe

    2008

    182

    XML

    XML

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    5/15

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    6/15

    Note:

    Setting up the Project

    To create a portfolio report generator, open up Microsoft Visual Studio 2010 and create a new Console

    application named PortfolioReportGenerator.

    To download the sample C# and Visual Basic .NET projects, click Download the Code Sample3.

    Figure 6. Create the Portfolio Report Generator Solution

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    7/15

    Note:

    Next, add two classes to the project: PortfolioReportand Portfolio. The PortfolioReport class is the keyclass that performs all of the document manipulation using the Open XML SDK 2.0. The Portfolioclass is

    basically a data structure that contains the necessary properties to represent a client portfolio.

    The Portfolioclass is detailed in this change. It is a data container together with some test data and hasno code related to Open XML or the Open XML SDK 2.0.

    Before you write any code, the first step in any project involving Open XML and the Open XML SDK 2.0 is to

    add the necessary references to the project. Two specific references are needed:DocumentFormat.OpenXmland WindowsBase.

    DocumentFormat.OpenXmlcontains the classes that are installed with the Open XML SDK 2.0. If you

    cannot find this reference after you install the Open XML SDK 2.0, you can browse for it. By default it islocated at C:\Program Files (x86)\Open XML SDK\V2.0\lib\. This reference is required only if you plan

    to use the Open XML SDK 2.0. If you would rather manipulate Open XML documents by tweaking raw XML,you do not need this reference.

    WindowsBaseincludes the c lasses in the System.IO.Packagingnamespace. This reference is required forall Open XML projects whether you are using the Open XML SDK 2.0 or not. The classes in the

    System.IO.Packagingnamespace provide functionality to open Open XML packages. In addition, there are

    classes that enable you to manipulate (add, remove, edit) parts inside an Open XML package.

    At this point, your project should resemble Figure 7.

    Figure 7. Portfolio Report Generator project after initial project setup

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    8/15

    Initializing the Portfolio Report

    As mentioned earlier, the report generation process works by creating a copy of the report template and

    then adding data to the report. The report template is a pre-formatted Excel workbook namedPortfolioReport.xlsx. Add a constructor to the PortfolioReportclass that performs this process. In order

    to copy the file, you must also have to import the System.IOnamespace. While adding the System.IOnamespace, add the namespaces related to the Open XML SDK 2.0.

    usingSystem;usingSystem.Collections.Generic;usingSystem.Linq;usingSystem.Text;usingSystem.IO;usingDocumentFormat.OpenXml.Packaging;usingDocumentFormat.OpenXml.Spreadsheet;usingDocumentFormat.OpenXml;

    namespacePortfolioReportGenerator{ classPortfolioReport { stringpath = "c:\\example\\"; stringtemplateName = "PortfolioReport.xlsx";

    publicPortfolioReport(stringclient) { stringnewFileName = path + client + ".xlsx"; CopyFile(path + templateName, newFileName); }

    C#

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    9/15

    Notice that the PortfolioReportconstructor requires a single parameter that represents the client the report

    is being generated for.

    To avoid the need to pass parameters into methods or constantly re-open the document and extract the

    workbook part, add two c lass-scoped private variables to the PortfolioReportclass. Likewise, add a class

    scoped private variable to hold a reference to the current Portfolioobject whose data is being used togenerate the report. By using these variables in place, you can then initialize them inside thePortfolioReportconstructor as shown in the following example.

    This code segment highlights how easy it is to open a document and extract a part using the Open XML SDK2.0. In the PortfolioReportconstructor, the workbook file is opened by using the Openmethod of the

    SpreadsheetDocumentclass. SpreadsheetDocumentis part of the

    DocumentFormat.OpenXml.Packagingnamespace. SpreadsheetDocumentprovides convenient access

    to the workbook part within the document package via the property named WorkbookPart. At this point inthe process, the report generator has:

    1. Created a copy of the PortfolioReport.xlsx file

    2. Named the copy after the name of the client

    3. Opened the client report for editing

    4. Extracted the workbook part

    Modifying Worksheet Cell Values using the Open XML SDK

    The main task that needs to be solved in order to complete the report generator is to figure out how to

    modify values inside an Excel workbook by using the Open XML SDK 2.0. When using Excels objec t model

    with Microsoft Visual Basic for Applications (VBA) or .NET, changing a cells value is easy. To change thevalue of a cell (which is a Rangeobject in Excels object model), you modify the value of the Value

    property. For example, to change the value of cell B4 on a worksheet named Salesto the value of 250, you

    privatestringCopyFile(stringsource, stringdest) { stringresult = "Copied file"; try { // Overwrites existing files File.Copy(source, dest, true); } catch(Exception ex)

    { result = ex.Message;

    } returnresult; } }}

    stringpath = "c:\\example\\"; stringtemplateName = "PortfolioReport.xlsx";

    WorkbookPart wbPart = null; SpreadsheetDocument document = null; Portfolio portfolio = null;

    publicPortfolioReport(stringclient)

    { stringnewFileName = path + client + ".xlsx"; CopyFile(path + templateName, newFileName); document = SpreadsheetDocument.Open(newFileName, true); wbPart = document.WorkbookPart; portfolio = newPortfolio(client); }

    C#

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    10/15

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    11/15

    Another difference between using Excels object model and manipulating an Open XML document is that when

    you use the Excel object model, the data kind of the value that you supply to the cell or range is irrelevant.

    When changing the value of a cell using Open XML however, the process varies depending on the data kindof the value. For numeric values, the process is somewhat similar to using Excels object model. There is a

    property associated with a Cellobject in the Open XML SDK 2.0 named CellValue. You can use this propertyto assign numeric values to a cell.

    Storing strings, or text, in a cell works differently. Rather than storing text directly in a cell, Excel stores it insomething called a shared str ing table. The shared string table is merely a listing of all the unique strings

    within the workbook where each unique string is assoc iated with an index. To assoc iate a cell with a string,

    the cell holds a reference to the string index instead of in the string itself. When you change a cells value toa string, you first need to see whether the string is in the shared string table. If it is in the table, you look up

    the shared string index and store that in the cell. If the string is not in the shared string table, you need toadd it, retrieve its string index, and then store the string index in the c ell. The following example shows a

    method named UpdateValueused to change a cells values along InsertSharedStringItemto update theshared string table.

    refCell = cell; break; } }

    cellResult = newCell(); cellResult.CellReference = address;

    row.InsertBefore(cellResult, refCell); returncellResult;

    }

    // Return the row at the specified rowIndex located within // the sheet data passed in via wsData. If the row does not // exist, create it. privateRow GetRow(SheetData wsData, UInt32 rowIndex) { varrow = wsData.Elements(). Where(r => r.RowIndex.Value == rowIndex).FirstOrDefault(); if(row == null) { row = newRow(); row.RowIndex = rowIndex; wsData.Append(row);

    } returnrow; }

    // Given an Excel address such as E5 or AB128, GetRowIndex // parses the address and returns the row index. privateUInt32 GetRowIndex(stringaddress) { stringrowPart; UInt32 l; UInt32 result = 0;

    for(inti = 0; i < address.Length; i++)

    { if(UInt32.TryParse(address.Substring(i, 1), outl)) { rowPart = address.Substring(i, address.Length - i); if(UInt32.TryParse(rowPart, outl)) { result = l; break; } } } returnresult; }

    C#

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    12/15

    publicboolUpdateValue(stringsheetName, stringaddressName, stringvalue,UInt32Value styleIndex, boolisString)

    { // Assume failure. boolupdated = false;

    Sheet sheet = wbPart.Workbook.Descendants().Where( (s) => s.Name == sheetName).FirstOrDefault();

    if(sheet != null)

    { Worksheet ws = ((WorksheetPart)(wbPart.GetPartById(sheet.Id))).Worksheet; Cell cell = InsertCellInWorksheet(ws, addressName);

    if(isString) { // Either retrieve the index of an existing string, // or insert the string into the shared string table // and get the index of the new item.

    intstringIndex = InsertSharedStringItem(wbPart, value);

    cell.CellValue = newCellValue(stringIndex.ToString()); cell.DataType = newEnumValue(CellValues.SharedString);

    } else { cell.CellValue = newCellValue(value); cell.DataType = newEnumValue(CellValues.Number); }

    if(styleIndex > 0) cell.StyleIndex = styleIndex;

    // Save the worksheet. ws.Save(); updated = true; }

    returnupdated; }

    // Given the main workbook part, and a text value, insert the text into// the shared string table. Create the table if necessary. If the value

    // already exists, return its index. If it doesn't exist, insert it and// return its new index.

    privateintInsertSharedStringItem(WorkbookPart wbPart, stringvalue) { intindex = 0; boolfound = false; varstringTablePart = wbPart

    .GetPartsOfType().FirstOrDefault();

    // If the shared string table is missing, something's wrong. // Just return the index that you found in the cell. // Otherwise, look up the correct text in the table. if(stringTablePart == null) { // Create it. stringTablePart = wbPart.AddNewPart(); }

    varstringTable = stringTablePart.SharedStringTable; if(stringTable == null) {

    stringTable = newSharedStringTable(); }

    // Iterate through all the items in the SharedStringTable.// If the text already exists, return its index.

    foreach(SharedStringItem item instringTable.Elements())

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    13/15

    One area of interest in the previous code example deals with formatt ing a c ell. As mentioned earlier in this

    article, a cells format is not stored within the cell node. Instead, a cell stores a style index that points to astyle that is defined in a different part (styles.xml). When using the template pattern demonstrated in this

    document and Excels object model via VBA or .NET, you typically apply formatting that you want to a rangeof one or more cells. As you add data to the workbook programmatically, any formatting that you applied

    within the range is faithfully applied.

    Because Open XML files only contain information related to cells that contain data, any time that you add a

    new cell to the file, if the cell requires any formatting, you must update the style index. Consequently, the

    UpdateValuemethod accepts a styleIndexparameter that indicates which style index to apply to the cell.If you pass in a value of zero, no style index is set and the cell uses Excels default formatting.

    One simple method for determining the appropriate style index for each cell is to format the workbooktemplate file as you want and then open up the appropriate workbook parts in XML mode (shown in Figure 4)

    and observe the style index of the cells that you formatted.

    With the methods from the previous code listing in place, generating the report is now a process of getting

    the portfolio data and repeatedly calling UpdateValueto create the report. Indeed, if you add the

    necessary code to do this, things seem to work fine except for one problem - any cell that contains aformula that refers to a cell whose value was changed via Open XML manipulation does not show the correct

    result. This is because Excel caches the result of a formula within the cell. Because Excel thinks it has thecorrect value cached, it does not recalculate the cell. Even if you have auto calculation turned on or if you

    press F9 to force a manual recalculation, Excel does not recalculate the cell.

    The solution to this is to remove the cached value from these cells so that Excel recalculates the value as

    soon as the file is opened in Excel. Add the RemoveCellValuemethod shown in the following example to the

    PortfolioReportclass to provide this functionality.

    { if(item.InnerText == value) { found = true; break; } index += 1; }

    if(!found)

    { stringTable.AppendChild(newSharedStringItem(newText(value)));

    stringTable.Save(); }

    returnindex; }

    // This method is used to force a recalculation of cells containing formulas. The // CellValue has a cached value of the evaluated formula. This

    // prevents Excel from recalculating the cell even if// calculation is set to automatic.

    privateboolRemoveCellValue(stringsheetName, stringaddressName) { boolreturnValue = false;

    Sheet sheet = wbPart.Workbook.Descendants(). Where(s => s.Name == sheetName).FirstOrDefault(); if(sheet != null) { Worksheet ws = ((WorksheetPart)(wbPart.GetPartById(sheet.Id))).Worksheet;

    Cell cell = InsertCellInWorksheet(ws, addressName);

    // If there is a cell value, remove it to force a recalculation // on this cell. if(cell.CellValue != null) { cell.CellValue.Remove(); }

    C#

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    14/15

    To complete the PortfolioReport class, add the CreateReportmethod shown in the following example tothe PortfolioReportclass. It uses the CreateReportmethod UpdateValueto put portfolio information into

    the desired cells. After updating all of the necessary cells, it calls RemoveCellValueon each cell that needsto be recalculated. F inally, CreateReport calls the Closemethod on the SpreadsheetDocumentto save

    all the changes and close the file.

    // Save the worksheet. ws.Save(); returnValue = true; }

    returnreturnValue; }

    // Create a new Portfolio report publicvoidCreateReport() { stringwsName = "Portfolio Summary";

    UpdateValue(wsName, "J2", "Prepared for "+ portfolio.Name, 0, true); UpdateValue(wsName, "J3", "Account # "+

    portfolio.AccountNumber.ToString(), 0, true); UpdateValue(wsName, "D9", portfolio.BeginningValueQTR.ToString(), 0, false); UpdateValue(wsName, "E9", portfolio.BeginningValueYTD.ToString(), 0, false); UpdateValue(wsName, "D11", portfolio.ContributionsQTR.ToString(), 0, false); UpdateValue(wsName, "E11", portfolio.ContributionsYTD.ToString(), 0, false); UpdateValue(wsName, "D12", portfolio.WithdrawalsQTR.ToString(), 0, false); UpdateValue(wsName, "E12", portfolio.WithdrawalsYTD.ToString(), 0, false); UpdateValue(wsName, "D13", portfolio.DistributionsQTR.ToString(), 0, false); UpdateValue(wsName, "E13", portfolio.DistributionsYTD.ToString(), 0, false); UpdateValue(wsName, "D14", portfolio.FeesQTR.ToString(), 0, false); UpdateValue(wsName, "E14", portfolio.FeesYTD.ToString(), 0, false); UpdateValue(wsName, "D15", portfolio.GainLossQTR.ToString(), 0, false); UpdateValue(wsName, "E15", portfolio.GainLossYTD.ToString(), 0, false);

    introw = 7; wsName = "Portfolio Holdings";

    UpdateValue(wsName, "J2", "Prepared for "+ portfolio.Name, 0, true); UpdateValue(wsName, "J3", "Account # "+

    portfolio.AccountNumber.ToString(), 0, true); foreach(PortfolioItem item inportfolio.Holdings) {

    UpdateValue(wsName, "B"+ row.ToString(), item.Description, 3, true); UpdateValue(wsName, "D"+ row.ToString(),

    item.CurrentPrice.ToString(), 24, false); UpdateValue(wsName, "E"+ row.ToString(),

    item.SharesHeld.ToString(), 27, false);

    UpdateValue(wsName, "F"+ row.ToString(),item.MarketValue.ToString(), 24, false); UpdateValue(wsName, "G"+ row.ToString(),

    item.Cost.ToString(), 24, false); UpdateValue(wsName, "H"+ row.ToString(),

    item.High52Week.ToString(), 28, false); UpdateValue(wsName, "I"+ row.ToString(),

    item.Low52Week.ToString(), 28, false); UpdateValue(wsName, "J"+ row.ToString(), item.Ticker, 11, true); row++; }

    // Force re-calc when the workbook is opened this.RemoveCellValue("Portfolio Summary", "D17");

    this.RemoveCellValue("Portfolio Summary", "E17");

    // All done! Close and save the document. document.Close(); }

    C#

  • 8/14/2019 Generating Excel 2010 Workbooks by Using the Open XML SDK 2.0

    15/15