shigarov a.o. a method for table detection in metafiles // presentation for imt-2008

15
Метод обнаружения таблиц в разноформатных документах Шигаров Алексей Олегович [email protected] Институт динамики систем и теории управления СО РАН ИМТ Июль 2008 Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 1 / 15

Upload: alexey-shigarov

Post on 13-Jun-2015

255 views

Category:

Technology


0 download

DESCRIPTION

Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

TRANSCRIPT

Page 1: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Метод обнаружения таблиц в разноформатных документах

Шигаров Алексей Олегович[email protected]

Институт динамики систем и теории управления СО РАН

ИМТИюль 2008

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 1 / 15

Page 2: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Задачи извлечения таблиц из документов

Рисунок из работы

e Silva A.C., Jorge A.M., Torgo L.

Design of an end-to-end method

to extract information from tables //

International Journal

on Document Analysis

and Recognition.

2006. Vol. 8, No. 2. pp. 144-171

I Обнаружение таблиц в документах.

I Сегментация таблиц на отдельные строки, столбцы и ячейки.

I Функциональный анализ — определение роли ячеек в таблице.

I Структурный анализ — определение зависимостей между ячейками.

I Интерпретация — преобразование табличной информации к требуемому виду.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 2 / 15

Page 3: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Форматы входных данных в методах обнаружения таблиц

I Исходные форматы документов

I DOC, RTF, XLS, PDF, HTML, ASCII текст,Растровые изображения

I Большинство существующих методов обнаружения таблиц используют в качествевходных данных

I Растровые изображенияI ASCII текст

I Для задач извлечения таблиц можно использовать обменные (PDL) форматы

I PostScript*I PDF**

I Метафайлы EMF

* Ramel J.-Y., Crucianu M., Vincent N., Faure C. Detection, Extraction and Representation of Tables // In Proc. 7th

International Conference on Document Analysis and Recognition (ICDAR 2003), IEEE Computer Society, 2003, Vol. 2, pp. 374

-379.

** Hassan T., Baumgartner R. Table Recognition and Understanding from PDF Files // In Proc. 9th International Conference on

Document Analysis and Recognition (ICDAR 2007), IEEE Computer Society, 2007. P. 1143-1147.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 3 / 15

Page 4: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Особенности статистических таблиц

-----------------------T-------------T-------------¬¦ ¦Намолочено ¦Намолочено ¦¦ ¦зерна, всего ¦зерна, с 1 гদ +------T------+------T------+¦ ¦ ¦ ¦ ¦ ¦¦ ¦ 2004 ¦ 2005 ¦ 2004 ¦ 2005 ¦¦ ¦ ¦ ¦ ¦ ¦L----------------------+------+------+------+------- Хозяйства всех категорийИркутская область 7250 9334 30 20 Братский район 640 977 18 16 Заларинский район 100 141 17 13 Зиминский район 292 1309 25 28 Иркутский район 799 942 16 18 Kачугский район 61 98 20 15 Kуйтунский район 414 722 19 20 с/х предприятияИркутская область 3221 5237 23 24 Братский район 159 488 19 17 Заларинский район 56 121 18 22

Тело

Шапка

Боковик

Перерез

Охватывающий

заголовок

строки

Вложенный

заголовок

строки

Вложенный

заголовок

столбца

Текстовая

разграфка

Охватывающий

заголовок

столбца

¦ ¦Намолочено ¦Намолочено ¦¦ ¦зерна, всего ¦зерна, с 1 га¦

¦ ¦ 2004 ¦ 2005 ¦ 2004 ¦ 2005 ¦

с/х предприятия

Иркутская область 7250 9334 30 20 Братский район 640 977 18 16

Пример статистической таблицы Росстата

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 4 / 15

Page 5: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Пример страниц из государственного статистического отчета Японии(Statistical Handbook of Japan 2006)

AGRICULTURE, FORESTRY, AND FISHERIES

65

Table 5.7

Supply of Cereal Grains

Rice1995 2,118 10,748 5.07 495 10,2902000 1,770 9,490 5.36 879 9,7902002 1,688 8,889 5.27 882 9,4592003 1,665 7,792 4.68 957 9,389

2004* 1,701 8,730 5.13 726 9,269

Wheat1995 151 444 2.93 5,750 6,3552000 183 688 3.76 5,688 6,3112002 207 829 4.01 4,973 6,2032003 212 856 4.03 5,539 6,316

2004* 213 860 4.05 5,484 6,266

Source: Ministry of Agriculture, Forestry and Fisheries.

Imports

(1,000 t)

Supplies for domestic

consumption (1,000 t)Fiscal year

Area planted

(1,000 ha)

Production

(1,000 t)

Yield per

hectare (t)

0

2

4

6

8

10

12

95 00 02 03 04

Wheat

Production

(left scale)

Million tons

Domestic supply

(left scale)

95 00 02 03 04

Fruits

95 00 02 03 04

120

100

80

60

40

20

0

20

40

60

80

100

Meats

Self-sufficiency

rate (right scale)

FY1990 FY1990FY1990

Figure 5.4

Self-Sufficiency Rates for Selected Categories of Agricultural Produce

Source: Ministry of Agriculture, Forestry and Fisheries.

%

AGRICULTURE, FORESTRY, AND FISHERIES

61

Domestic roundwood production totaled 16.6 million cubic meters in 2004, which is equivalent to only 30 percent of the peak in 1967 (52.7 million cubic meters). In 2004, Japan's self-sufficiency rate for lumber was 18.4 percent. Currently, Japan depends mostly on imported lumber for pulp, woodchip and plywood material.

The slowdown in domestic lumber production has resulted in a decline in the number of workers engaged in forestry. In 2000, there were 67,000 workers engaged in forestry, a level which represented only 60 percent of the number recorded ten years before. Also, one out of four workers was aged 65 and over, highlighting the aging of the labor force.

Table 5.3

Forest Land Area and Forest Resources (2002)

Item Total National Municipal Private

Forest land area (1,000 ha) ............. 25,121 7,838 2,796 14,487

Forest growing stock (1 mil. m3) ......... 4,040 1,011 433 2,596

Planted forests

Land area (1,000 ha) ................. 10,361 2,411 1,232 6,717

Growing stock (1 mil. m3) ........ 2,338 368 255 1,715

Natural forests

Land area (1,000 ha) ................. 13,349 4,770 1,426 7,153

Growing stock (1 mil. m3) ........ 1,701 642 178 881

Source: Ministry of Agriculture, Forestry and Fisheries.

Table 5.4

Supply of Industrial Roundwood(Thousand cubic meters)

Domestic logs

By use

Total Saw-logs PlywoodPulp and

ChipsOthers

2000 99,263 18,022 12,798 138 4,749 337 81,241

2001 91,247 16,759 11,766 182 4,509 302 74,488

2002 88,127 16,077 11,142 279 4,370 286 72,050

2003 87,191 16,155 11,214 360 4,293 288 71,0362004 89,799 16,555 11,469 546 4,249 291 73,245

1) Including wood products converted into log equivalence.

Source: Ministry of Agriculture, Forestry and Fisheries.

Year TotalImported

logs 1)

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 5 / 15

Page 6: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Пример страниц из финансового отчета открытой акционернойкомпании (Consolidated financial statements Open joint stock companyAeroflot - russian airlines 2006)

23

12. INCOME TAX

2006 2005

Current income tax charge (136.0) (58.4)Deferred income tax benefit / (expense) 6.2 (30.6)

(129.8) (89.0)

Income before taxation for financial reporting purposes is reconciled to taxation as follows:

2006 2005

Profit before income tax 387.9 278.8

Theoretical tax at statutory rate (24%) (93.1) (66.9)

Tax effect of items which are not deductible or assessable for taxation purposes:

Effect of lower tax rates applied 8.9 8.9Non-deductible expenses (35.6) (30.4)Non-taxable income 6.7 8.3Other permanent differences - 1.1Prior period current tax adjustments (16.7) (10.0)

Income tax (129.8) (89.0)

Differences between IFRS and Russian statutory taxation regulations give rise to certain temporary differences between the carrying values of certain assets and liabilities for financial reporting purposes and their values for profits tax purposes. The tax effect of the movement on these temporary differences is recorded at the tax rates applicable to the Group’s companies and range from 20% to 24% for the years ended December 31, 2006 and 2005.

6002

Movement

for year 2005

Movement

for year 2004

Tax effects of temporary

differences:

Property, plant and equipment 5.2 3.3 1.9 (1.2) 3.15.1 sgniworroB (1.6) 3.1 - 3.1

Accounts receivable 0.2 0.2 - - -2.0 elbayap stnuoccA 0.2 - (0.2) 0.2

Deferred tax assets, net 7.1 5.0 6.4

Property, plant and equipment (70.2) (34.1) (36.1) (9.8) (26.3)Long-term investments (10.6) 3.0 (13.6) 0.3 (13.9)Accounts receivable (16.5) (15.6) (0.9) 1.9 (2.8)

8.46 elbayap stnuoccA 50.7 14.1 (0.7) 14.8- eunever derrefeD - - (20.0) 20.0

Deferred tax liabilities, net (32.5) (36.5) (8.2)

6.1 (29.7)

5

OJSC “AEROFLOT – RUSSIAN AIRLINES”

CONSOLIDATED BALANCE SHEET

AS OF DECEMBER 31, 2006

(Amounts in millions of US Dollars)

Notes 2006 2005

ASSETS

Current assets

Cash and cash equivalents 13 181.3 109.5Short-term investments 14 44.9 30.5

51 ten ,stnemyaperp dna elbaviecer stnuoccA 701.3 574.9Short-term aircraft lease deposits - 4.4

61 seirotnevni dna strap eraps elbadnepxE 79.0 61.61,006.5 780.9

Non-current assets

71 stnemtsevni detnuocca ytiuqE 21.5 14.1Long-term investments 18 18.9 16.8Aircraft lease deposits 4.7 4.4Deferred tax assets 12 7.1 5.0Other non-current assets 19 119.9 6.7

02 tnempiuqe dna tnalp ,ytreporP 1,227.5 794.31,399.6 841.3

TOTAL ASSETS 2,406.1 1,622.2

LIABILITIES AND EQUITY

Current liabilities

12 seitilibail deurcca dna elbayap stnuoccA 485.4 333.2Unearned transportation revenue 123.6 99.6Short-term borrowings 24 228.8 100.9Provisions 22 7.2 7.2

62 elbayap esael ecnanif fo noitrop tnerruC 52.2 26.1897.2 567.0

Non-current liabilities

Long-term borrowings 25 5.7 76.0Finance lease payable 26 453.0 281.3Provisions 22 76.1 81.4Deferred tax liabilities 12 32.5 36.5

32 seitilibail tnerruc-non rehtO 152.7 12.5720.0 487.7

Capital and reserves

Share capital 27 51.6 51.6Treasury stock 27 (33.5) (32.9)Investments revaluation reserve 11.0 8.8Cumulative translation reserve 2.7 0.3Retained earnings 28 752.7 530.8

Equity attributable to equity holders of the parent 784.6 558.6Minority interest 4.3 8.9

788.9 567.5

TOTAL LIABILITIES AND EQUITY 2,406.1 1,622.2

__________________________________ ________________ __________________ V. M. Okulov M. I. Poluboyarinov General Director Deputy General Director

Finance and Planning May 29, 2006

The accompanying notes form an integral part of these consolidated financial statements. The Independent Auditors’ Report is presented on pages 2-3.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 6 / 15

Page 7: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Получение данных из метафайлов

I Метафайлы интерпретируются с помощью GDI API.

I Из записей EMR_EXTTEXTOUTW, EMR_SMALLTEXTOUT извлекаютсятекстовые элементы.

ВсегоМежсимвольные интервалыПодстрочный

интервал

Надстрочный

интервал

Внутренний зазор

Внешний зазор Ограничивающий

прямоугольник

Пример текстового элемента

I Из записей EMR_BITBLT извлекаются линии разграфки.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 7 / 15

Page 8: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Предобработка

1) Исключение текстовой разграфки из текста.

T-------------T-------------¬¦Намолочено ¦Намолочено ¦¦зерна, всего ¦зерна, с 1 га¦+------T------+------T------+¦ ¦ ¦ ¦ ¦¦ 2004 ¦ 2005 ¦ 2004 ¦ 2005 ¦¦ ¦ ¦ ¦ ¦L------+------+------+-------

а

T-------------T-------------¬¦Намолочено ¦Намолочено ¦¦зерна, всего ¦зерна, с 1 га¦+------T------+------T------+¦ ¦ ¦ ¦ ¦¦ 2004 ¦ 2005 ¦ 2004 ¦ 2005 ¦¦ ¦ ¦ ¦ ¦L------+------+------+-------

¦Намолочено ¦зерна, всего ¦зерн

T-------------T-------------¬ ¦Намолочено ¦зерна, с 1 га¦

+------T------+------T------+

L------+------+------+-------

¦ 2004 ¦ 04 ¦ 2005 ¦ 25 ¦ 2004 ¦ 4 ¦ 2005 ¦¦ ¦ 20¦

¦ 04 ¦

¦

¦ 5 ¦ 2¦

¦ 4 ¦ ¦

¦2005 ¦

¦

¦

б

T-------------T-------------¬¦Намолочено ¦Намолочено ¦¦зерна, всего ¦зерна, с 1 га¦+------T------+------T------+¦ ¦ ¦ ¦ ¦¦ 2004 ¦ 2005 ¦ 2004 ¦ 2005 ¦¦ ¦ ¦ ¦ ¦L------+------+------+-------

в

(a) Фрагмент таблицы с текстовой разграфкой.

Серым выделены текстовые элементы:

(б) до исключения текстовой разграфки;

(в) после исключения текстовой разграфки.

2) Восстановление слов.

I Если одному слову соответствует несколько текстовых элементов.

I Если нескольким словам соответствует один текстовый элемент.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 8 / 15

Page 9: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Формирование текстовых блоков

AGRICULTURE, FORESTRY, AND FISHERIES

61

Domestic roundwood production totaled 16.6 million cubic meters in 2004, which is equivalent to only 30 percent of the peak in 1967 (52.7 million cubic meters). In 2004, Japan's self-sufficiency rate for lumber was 18.4 percent. Currently, Japan depends mostly on imported lumber for pulp, woodchip and plywood material.

The slowdown in domestic lumber production has resulted in a decline in the number of workers engaged in forestry. In 2000, there were 67,000 workers engaged in forestry, a level which represented only 60 percent of the number recorded ten years before. Also, one out of four workers was aged 65 and over, highlighting the aging of the labor force.

Table 5.3

Forest Land Area and Forest Resources (2002)

Item Total National Municipal Private

Forest land area (1,000 ha) ............. 25,121 7,838 2,796 14,487

Forest growing stock (1 mil. m3) ......... 4,040 1,011 433 2,596

Planted forests

Land area (1,000 ha) ................. 10,361 2,411 1,232 6,717

Growing stock (1 mil. m3) ........ 2,338 368 255 1,715

Natural forests

Land area (1,000 ha) ................. 13,349 4,770 1,426 7,153

Growing stock (1 mil. m3) ........ 1,701 642 178 881

Source: Ministry of Agriculture, Forestry and Fisheries.

Table 5.4

Supply of Industrial Roundwood(Thousand cubic meters)

Domestic logs

By use

Total Saw-logs PlywoodPulp and

ChipsOthers

2000 99,263 18,022 12,798 138 4,749 337 81,241

2001 91,247 16,759 11,766 182 4,509 302 74,488

2002 88,127 16,077 11,142 279 4,370 286 72,050

2003 87,191 16,155 11,214 360 4,293 288 71,0362004 89,799 16,555 11,469 546 4,249 291 73,245

1) Including wood products converted into log equivalence.

Source: Ministry of Agriculture, Forestry and Fisheries.

Year TotalImported

logs 1)

а б в

(a) Исходная страница.

(б) Выделены текстовые элементы.

(в) Выделены текстовые блоки.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 9 / 15

Page 10: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Строки на странице

а б в

(a) Все строки.

(б) Скорее всего эти строки являются строками текста*.

(в) Скорее всего эти строки являются строками таблиц*.

* Mandal S., Chowdhury S.P., Das A.K., Chanda B. A simple and effective table detection system from document images //

International Journal on Document Analysis and Recognition. 2006. Vol. 8, No. 2. P. 172-182.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 10 / 15

Page 11: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Формирование табличных регионов

I Каждый табличный регион охватывает последовательность подряд идущих строк.

I Строка табличного региона должна удовлетворять следующим условиям:

I строка должна содержать, хотя бы, два текстовых блока;

I ширина не пустого пространства в строке ограничена заранее определенным порогом;

I нижняя граница любого вертикального промежутка в строке должна совпадать с нижнейграницей этой строки.

I Строки в табличном регионе должны иметь схожие распределения проекций на ось Xсвоих вертикальных промежутков.

By use

Total Saw-logs PlywoodPulp and

ChipsOthers

2000 99,263 18,022 12,798 138 4,749 337 81,241

2001 91,247 16,759 11,766 182 4,509 302 74,488

Year TotalImported

logs 1)

I Строки страницы проходятся сверху вниз в поиске табличных регионов.

I После обнаружения табличного региона, его строки исключаются из дальнейшегопоиска.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 11 / 15

Page 12: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Объединение табличных регионов в таблицы

I Табличные регионы из одной таблицы имеют схожие распределения проекций на ось Xсвоих вертикальных промежутков.

January - July July

COUNTRY Quantity Value Quantity Value

2004 2005 2004 2005 2004 2005 2004 2005

European Union

Germany 11,662 10,684 86,690 81,784 1,118 415 6,525 1,685

Belgium-Luxembourg 9,505 5,284 67,820 37,930 27 123 146 648

Netherlands 2,775 4,875 21,429 39,694

France 5,612 3,030 15,889 12,923

Other Markets

Japan 13,352 9,117 90,901 52,604 107 76 550 313

Russian Federation 6,406 8,801 29,026 47,781 1,173 5,123

Switzerland 1,902 2,899 13,713 21,090 144 58 878 447

I Эта особенность и ряд эвристик о строках, расположенных между табличнымирегионами, используются для объединения табличных регионов в таблицы.

I В редких случаях возможно, что полученные таблицы будут иметь общие строки(тогда границы пересекающихся таблиц должен уточнить пользователь).

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 12 / 15

Page 13: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Результаты обнаружения

AGRICULTURE, FORESTRY, AND FISHERIES

61

Domestic roundwood production totaled 16.6 million cubic meters in 2004, which is equivalent to only 30 percent of the peak in 1967 (52.7 million cubic meters). In 2004, Japan's self-sufficiency rate for lumber was 18.4 percent. Currently, Japan depends mostly on imported lumber for pulp, woodchip and plywood material.

The slowdown in domestic lumber production has resulted in a decline in the number of workers engaged in forestry. In 2000, there were 67,000 workers engaged in forestry, a level which represented only 60 percent of the number recorded ten years before. Also, one out of four workers was aged 65 and over, highlighting the aging of the labor force.

Table 5.3

Forest Land Area and Forest Resources (2002)

Item Total National Municipal Private

Forest land area (1,000 ha) ............. 25,121 7,838 2,796 14,487

Forest growing stock (1 mil. m3) ......... 4,040 1,011 433 2,596

Planted forests

Land area (1,000 ha) ................. 10,361 2,411 1,232 6,717

Growing stock (1 mil. m3) ........ 2,338 368 255 1,715

Natural forests

Land area (1,000 ha) ................. 13,349 4,770 1,426 7,153

Growing stock (1 mil. m3) ........ 1,701 642 178 881

Source: Ministry of Agriculture, Forestry and Fisheries.

Table 5.4

Supply of Industrial Roundwood(Thousand cubic meters)

Domestic logs

By use

Total Saw-logs PlywoodPulp and

ChipsOthers

2000 99,263 18,022 12,798 138 4,749 337 81,241

2001 91,247 16,759 11,766 182 4,509 302 74,488

2002 88,127 16,077 11,142 279 4,370 286 72,050

2003 87,191 16,155 11,214 360 4,293 288 71,0362004 89,799 16,555 11,469 546 4,249 291 73,245

1) Including wood products converted into log equivalence.

Source: Ministry of Agriculture, Forestry and Fisheries.

Year TotalImported

logs 1)

9

OJSC “AEROFLOT – RUSSIAN AIRLINES”

NOTES TO THE CONSOLIDATED FINANCIAL STATEMENTS

FOR THE YEAR ENDED DECEMBER 31, 2006

(Amounts in millions of US Dollars)

1. NATURE OF THE BUSINESS

OJSC “Aeroflot – Russian Airlines” (the “Company” or “Aeroflot”) was formed as a joint stock company following a government decree in 1992. The 1992 decree conferred all the rights and obligations of “Aeroflot-Soviet Airlines” and its structural units, excluding its operations in Russia and Sheremetyevo Airport, upon the Company, includi ng inter-governmental bilateral agreements and agreements signed with foreign airlines and enterprises in the field of civil aviation.

The principal activity of the Company is the provision of passenger and cargo air transportation services, both domestically and internationally, and other aviation services from its base at Moscow Sheremetyevo Airport. The Company and its subsidiaries (the “Group”) also conduct activities comprising airline catering, operation of a hotel, and construction of Shremetyevo-3 Terminal. Associated undertakings mainly comprise cargo-handling services, fuelling s ervices and duty-free retail businesses.

As of December 31, 2006 and 2005, the Government of the Russian Federation owned 51% of the Company. The Company’s headquarters are located in Moscow at 37 Leningradsky Prospect.

The principal subsidiary undertakings are:

Company name

Place of incorporation and operation Activity

Percentage held as of

December 31, 2006

Percentage held as of

December 31, 2005

CJSC “Sherotel” Moscow region Hotel 100.0% 100.0%OJSC “Terminal” Moscow region Project Sheremetyevo-3 100.0% 100.0%CJSC “Aeroflot Plus” Moscow region Airline 100.0% 100.0%OJSC “Insurance company

“Moscow” Moscow Captive insurance

services 100.0% 100.0%CJSC “Aeromar” Moscow region Catering 51.0% 51.0%OJSC “Aeroflot-Don” Rostov-on-Don Airline 100.0% 51.0%CJSC “Aeroflot-Nord” Arkhangelsk Airline 51.0% 51.0%

CJSC “Aeroflot-Cargo” Moscow Cargo transportation

services 100.0% -

In 2006 the Company increased its share in OJSC “Ae roflot-Don” up to 100% by purchasing of minority interests for a total cash consideration of approximately USD 6.6 million. Also during 2006 a new wholly owned entity CJSC “Aeroflot-Cargo” was created. During 2006 all of the cargo operations and the related assets were transferred to this entity.

The significant entities in which the Group holds more than 20% but less than 50% of equity are:

Company name

Place of incorporation and operation Activity

Percentage held as of

December 31, 2006

Percentage held as of

December 31, 2005

LLC “Airport Moscow” Moscow region Cargo handling 50.0% 50.0%CJSC “Aerofirst” Moscow region Trading 33.3% 33.3%CJSC “TZK

Sheremetyevo” Moscow region Fuel trading company 31.0% 31.0%CJSC “AeroMASH – AB” Moscow region Aviation security 45.0% 45.0%

All the companies listed above are incorporated in the Russian Federation.

-MORE-

-2-

INCOME BY MAJOR OPERATING AREA

(unaudited)

2007 2006 2007 2006

Upstream – Exploration and Production

United States $ 1,223 $ 901 $ 2,019 $ 2,115

International 2,416 2,371 4,527 4,615

Total Exploration and Production 3,639 3,272 6,546 6,730

Downstream – Refining, Marketing and Transportation

United States 781 554 1,131 764

International 517 444 1,790 814

Total Refining, Marketing and Transportation 1,298 998 2,921 1,578

Chemicals 104 94 224 247

All Other (1)

339 (11) 404 (206)

083,5$emocnI teN $ 4,353 $ 10,095 $ 8,349

June 30, 2007 Dec. 31, 2006

(unaudited)

Cash and Cash Equivalents $ 11,216 $ 10,493

Marketable Securities $ 887 $ 953

Total Assets $ 139,606 $ 132,628

Total Debt $ 8,189 $ 9,838

Stockholders' Equity $ 74,179 $ 68,935

CAPITAL AND EXPLORATORY EXPENDITURES(2)

2007 2006 2007 2006

United States

Exploration and Production $ 970 $ 1,151 $ 1,890 $ 1,971

Refining, Marketing and Transportation 325 252 558 444

Chemicals 38 24 67 41

Other 133 108 396 154

664,1setatS detinU latoT 1,535 2,911 2,610

International

Exploration and Production 2,579 1,998 4,826 3,691

Refining, Marketing and Transportation 460 767 809 1,039

Chemicals 11 11 22 17

Other - - 3 2

050,3lanoitanretnI latoT 2,776 5,660 4,749

615,4$ediwdlroW $ 4,311 $ 8,571 $ 7,359

(1) Includes the company's interest in Dynegy prior to its sale in

May 2007, mining operations, power generation businesses,

worldwide cash management and debt financing activities,

corporate administrative functions, insurance operations,

real estate activities, alternative fuels and technology companies.

(2) Includes interest in affiliates:

United States $ 40 $ 38 $ 72 $ 70

International 582 435 1,024 714

Total $ 622 $ 473 $ 1,096 $ 784

Six Months

Ended June 30

Six Months

CHEVRON CORPORATION - FINANCIAL REVIEW

Three Months

Ended June 30

(Millions of Dollars)

Ended June 30 Ended June 30

SELECTED BALANCE SHEET ACCOUNT DATA

Three Months

а б в

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 13 / 15

Page 14: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Экспериментальная оценка

I Экспериментальные данные

I государственные статистические отчёты России (www.gks.ru), США (www.fedstats.gov),Евросоюза (Eurostat yearbook 2006-07), Японии(Statistical Handbook of Japan 2007); финансовые отчёты различных компаний: Boeing,Aeroflot, Транснефть;

I всего 345 страниц, содержащих 440 таблиц (134 таблицы имели текстовую разграфку).

I Оценки*:

I точность обнаружения — процент количества корректно обнаруженных таблиц к общемуколичеству обнаруженных таблиц;

I полнота обнаружения — процент количества корректно обнаруженных таблиц к общемучислу существующих таблиц.

I Измерения:

I точность 86,4 %;

I полнота 92,6 %.

* Hu J., Kashi R., Lopresti D., Wilfong G. Medium-Independent Table Detection // Document Recognition and Retrieval VII.

IS&T/SPIE Electronic Imaging, San Jose, 2000. P. 291-302.

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 14 / 15

Page 15: Shigarov A.O. A Method for Table Detection in Metafiles // Presentation for IMT-2008

Заключение

I Сходство статистических таблиц позволило сформулировать эвристики, используемыев предлагаемом методе.

I Использование метафайлов позволяет применить этот метод к разноформатнымдокументам.

I Экспериментальная оценка показывает применимость этого метода для обнаруженияширокого круга статистических таблиц.

I Точность обнаружения этого метода можно улучшить после выполненияфункционального анализа таблиц.

I На основе этого метода разработано приложение для извлечения таблиц изметафайлов

Хмельнов А.Е., Шигаров А.О. (ИДСТУ СО РАН) Метод обнаружения таблиц в разноформатных документах 15 / 15