flex-fast lexical analyzer generator

33
1 TERM PAPER OF SUBJECT:- System Software (CAP: - 658) TOPIC:- Flex-Fast Lexical analyzer generator Submitted To

Upload: ajay-kumar-saklani

Post on 14-May-2017

276 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: Flex-Fast Lexical analyzer generator

1

TERM PAPER

OF

SUBJECT:- System Software

(CAP: - 658)

TOPIC:- Flex-Fast Lexical analyzer generator

Submitted

To

Lovely Professional University, Phagwara

For the partial fulfillment of the degree of MCA-III SEM

Page 2: Flex-Fast Lexical analyzer generator

2

Submitted To: Submitted By:

Miss Neha Malhotra Name: Ajay Kumar(Lecturer) Roll No: RD1206a19

Reg. No:11203127

ACKNOWLEDGEMENT

It is not until you undertake the project like this one that you realize how massive the effort it really is, or how much you must rely upon the Selfless efforts and goodwill of others. There are many who helped us with this project, and we want to thank them all from the core of our Hearts.

We owe special words of thanks to our Teacher Miss Neha Malhotra for their vision, thoughtful counselling and encouragement at every step of the project. We are also thankful to the teachers of the Department for giving us the best of knowledge and guidance throughout the project.

And last but not the least, we find no words to acknowledge the Analog clock application & moral support rendered by our parents in making the effort a success. All this has become reality because of their blessings and above all by the grace of god.

Ajay kumar

Page 3: Flex-Fast Lexical analyzer generator

3

TABLE OF CONTENTS

TOPIC NAME PAGE NO.Certificate …………………a Declaration ………………… bAbstract ………………… 11.0 INTRODUCTION ………………… 21.1.0 Introduction to Lexical Grammar ………………… 31.2.0 Introduction to Token ………………… 41.3.0 How scanner and tokenizer works ? ………………… 41.4.0 Platform used ………………… 72.0 PROPOSED METHODOLOGY ………………… 92.1.0 Block Diagram ………………… 92.2.0 Data Flow Diagram ………………… 92.3.0 Flow Chart ………………… 112.4.0 Code AND Screenshot .......123.0 APPROACHED RESULT AND CONCLUSION ………………… 224.0 APPLICATIONS AND FUTURE WORK ………………… 23REFERENCES

Page 4: Flex-Fast Lexical analyzer generator

4

ABSTRACTThe lexical analyzer is responsible for scanning the source input file and translatinglexemes (strings) into small objects that the compiler for a high level language caneasily process. These small values are often called “tokens”. The lexical analyzer is alsoresponsible for converting sequences of digits in to their numeric form as well asprocessing other literal constants, for removing comments and white spaces from thesource file, and for taking care of many other mechanical details. Lexical analyzerconverts stream of input characters into a stream of tokens. For tokenizing intoidentifiers and keywords we incorporate a symbol table which initially consists ofpredefined keywords. The tokens are read from an input file. The output file willconsist of all the tokens present in our input file along with their respective tokenvalues.

KEYWORDS: Lexeme, Lexical Analysis, Compiler, Parser, Token

1.0 INTRODUCTIONIn computer science, lexical analysis is the process of converting a sequence ofcharacters into a sequence of tokens. A program or function which performs lexicalanalysis is called a lexical analyzer, lexer or scanner. A lexer often exists as a singlefunction which is called by a parser or another function and works alongside othercomponents for making compilation of a high level language possible. This completesetup is what we call a compiler.To define what a compiler is one must first define what a translator is. A translatoris a program that takes another program written in one language, also known as thesource language, and outputs a program written in another language, known as thetarget language.Now that the translator is defined, a compiler can be defined as a translator. Thesource language is a high-level language such as Java or Pascal and the target languageis a low-level language such as machine or assembly.There are five parts of compilation (or phases of the compiler)1.) Lexical Analysis2.) Syntax Analysis3.) Semantic Analysis4.) Code Optimization5.) Code GenerationLexical Analysis is the act of taking an input source program and outputting astream of tokens. This is done with the Scanner. The Scanner can also place identifiersinto something called the symbol table or place strings into the string table. TheScanner can report trivial errors such as invalid characters in the input file.Syntax Analysis is the act of taking the token stream from the scanner andcomparing them against the rules and patterns of the specified language. Syntax

Page 5: Flex-Fast Lexical analyzer generator

5

Analysis is done with the Parser. The Parser produces a tree, which can come in manyformats, but is referred to as the parse tree. It reports errors when the tokens do notfollow the syntax of the specified language. Errors that the Parser can report aresyntactical errors such as missing parenthesis, semicolons, and keywords.Semantic Analysis is the act of determining whether or not the parse tree is relevantand meaningful. The output is intermediate code, also known as an intermediaterepresentation (or IR). Most of the time, this IR is closely related to assembly languagebut it is machine independent. Intermediate code allows different code generators for different machines and promotes abstraction and portability from specific machinetimes and languages. (I dare to say that the most famous example is java’s byte-codeand JVM). Semantic Analysis finds more meaningful errors such as undeclaredvariables, type compatibility, and scope resolution.Code Optimization makes the IR more efficient. Code optimization is usually donein a sequence of steps. Some optimizations include code hosting, or moving constantvalues to better places within the code, redundant code discovery, and removal ofuseless code.Code Generation is the final step in the compilation process. The input to the CodeGenerator is the IR and the output is machine language code.1.1.0 Introduction to Lexical GrammarThe specification of a programming language will often include a set of rules whichdefines the lexer. These rules are usually called regular expressions and they define theset of possible character sequences that are used to form tokens or lexemes. whitespace,(i.e. characters that are ignored), are also defined in the regular expressions.

1.2.0 Introduction to tokenA token is a string of characters, categorized according to the rules as a symbol(e.g. IDENTIFIER, NUMBER, COMMA, etc.). The process of forming tokens from

an input stream of characters is called (tokenization)and the lexer categorizes them according to a symbol type. A token can look like anything that is useful for processing an input text stream or text file. A lexical analyzer generally does nothing with combinations of tokens, a task left for a parser. For example, a typical lexical analyzer recognizes parenthesis as tokens, but does nothing to ensure that each '(' is matched with a ')'.Consider this expression in the C programming language: sum=3+2;Tokenized in the following table:Tokens are frequently defined by regular expressions, which are understood by alexical analyzer generator such as lex. The lexical analyzer (either generated

Page 6: Flex-Fast Lexical analyzer generator

6

automatically by a tool like lex, or hand-crafted) reads in a stream of characters,identifies the lexemes in the stream, and categorizes them into tokens. This is called"tokenizing." If the lexer finds an invalid token, it will report an error.Following tokenizing is parsing. From there, the interpreted data may be loaded intodata structures for general use, interpretation, or compiling.1.3.0 How scanner and tokenizer work?The first stage, the scanner, is usually based on a finite state machine. It hasencoded within it information on the possible sequences of characters that can becontained within any of the tokens it handles (individual instances of these charactersequences are known as lexemes). For instance, an integer token may contain anysequence of numerical digit characters. In many cases, the first non-white spacecharacter can be used to deduce the kind of token that follows and subsequent inputcharacters are then processed one at a time until reaching a character that is not in theset of characters acceptable for that token (this is known as the maximal munch rule, orlongest match rule). In some languages the lexeme creation rules are more complicatedand may involve backtracking over previously read characters.Tokenization is the process of demarcating and possibly classifying sections of astring of input characters. The resulting tokens are then passed on to some other form ofprocessing. The process can be considered a sub-task of parsing input.Take, for example, the following string.The quick brown fox jumps over the lazy dogUnlike humans, a computer cannot intuitively 'see' that there are 9 words. To acomputer this is only a series of 43 characters.A process of tokenization could be used to split the sentence into word tokens.Although the following example is given as XML there are many ways to representtokenized input:<sentence><word>The</word><word>quick</word><word>brown</word><word>fox</word><word>jumps</word><word>over</word><word>the</word><word>lazy</word><word>dog</word></sentence>A lexeme, however, is only a string of characters known to be of a certain kind (eg,a string literal, a sequence of letters). In order to construct a token, the lexical analyzerneeds a second stage, the evaluator, which goes over the characters of the lexeme toproduce a value. The lexeme's type combined with its value is what properly constitutesa token, which can be given to a parser. (Some tokens such as parentheses do not reallyhave values, and so the evaluator function for these can return nothing. The evaluatorsfor integers, identifiers, and strings can be considerably more complex. Sometimesevaluators can suppress a lexeme entirely, concealing it from the parser, which is usefulfor whitespace and comments.)For example, in the source code of a computer program the string

Page 7: Flex-Fast Lexical analyzer generator

7

net_worth_future = (assets - liabilities);might be converted (with whitespace suppressed) into the lexical token stream:

Though it is possible and sometimes necessary to write a lexer by hand, lexers areoften generated by automated tools. These tools generally accept regular expression that describe the tokens allowed in the input stream. Each regular expression is associated with a production in the lexical grammar of the programming language that evaluates the lexemes matching the regular expression. These tools may generate source code that can be compiled and executed or construct a state table for a finite state machine (which is plugged into template code for compilation and execution). Regular expressions compactly represent patterns that the characters in lexemes might follow. For example, for an English-based language, a NAME token might be any English alphabetical character or an underscore, followed by any number of instances of any ASCII alphanumeric character or an underscore. This could be represented compactly by the string [a-zA-Z_][a-zA-Z_0-9]*. This means "any character a-z, A-Z or _, followed by 0 or more of a-z, A-Z, _ or 0-9".Regular expressions and the finite state machines they generate are not powerfulenough to handle recursive patterns, such as "n opening parentheses, followed by a statement, followed by n closing parentheses." They are not capable of keeping count, and verifying that n is the same on both sides — unless you have a finite set of permissible values for n. It takes a full-fledged parser to recognize such patterns in their full generality. A parser can push parentheses on a stack and then try to pop them off and see if the stack is empty at the end.The Lex programming tool and its compiler is designed to generate code for fastlexical analysers based on a formal description of the lexical syntax. It is not generall considered sufficient for applications with a complicated set of lexical rules and severe performance requirements; for instance, the GNU Compiler Collection uses handwritten lexers.1.4.0 Platform usedIn computing, C is a general-purpose computer programming language originallydeveloped in 1972 by Dennis Ritchie at the Bell Telephone Laboratories to implementthe Unix operating system.Although C was designed for writing architecturally independent system software,it is also widely used for developing application software.Worldwide, C is the first or second most popular language in terms of number ofdeveloper positions or publicly available code. It is widely used on many differentsoftware platforms, and there are few computer architectures for which a C compiler does not exist. C has greatly influenced many other popular programming languages, most notably C++, which originally began as an extension to C, and Java and C# which borrow C lexical conventions and operators.Characteristics Like most imperative languages in the ALGOL tradition, C has facilities for structured programming and allows lexical variable scope and recursion, while a static type

Page 8: Flex-Fast Lexical analyzer generator

8

system prevents many unintended operations. In C, all executable code is contained within functions. Function parameters are always passed by value. Pass-by reference is achieved in C by explicitly passing pointer values. Heterogeneous aggregate data types (struck) allow related data elements to be combined and manipulated as a unit. C program source text is free-format, using the semicolon as a statement terminator (not a delimiter).C also exhibits the following more specific characteristics:· non-nest able function definitions· variables may be hidden in nested blocks· partially weak typing; for instance, characters can be used as integers· low-level access to computer memory by converting machine addresses toTyped pointers· function and data pointers supporting ad hoc run-time polymorphism· array indexing as a secondary notion, defined in terms of pointer arithmetic· a pre-processor for macro definition, source code file inclusion, and conditionalCompilation· complex functionality such as I/O, string manipulation, and mathematicalFunctions consistently delegated to library routines· A relatively small set of reserved keywords (originally 32, now 37 in C99)· A large number of compound operators, such as +=, ++ii. FeaturesThe relatively low-level nature of the language affords the programmer closeControl over what the computer does, while allowing special tailoring and aggressiveOptimization for a particular platform. This allows the code to run efficiently on verylimited hardware, such as embedded systems.iii. Turbo C++Turbo C++ is a C++ compiler and integrated development environment (IDE) fromBorland. The original Turbo C++ product line was put on hold after 1994, and wasrevived in 2006 as an introductory-level IDE, essentially a stripped-down version oftheir flagship C++ Builder. Turbo C++ 2006 was released on September 5, 2006 and isavailable in 'Explorer' and 'Professional' editions. The Explorer edition is free to download and distribute while the Professional edition is a commercial product. Theprofessional edition is no longer available for purchase from Borland.Turbo C++ 3.0 was released in 1991 (shipping on November 20), and came inamidst expectations of the coming release of Turbo C++ for Microsoft Windows.Initially released as an MS-DOS compiler, 3.0 supported C++ templates, Borland'sinline assembler, and generation of MS-DOS mode executables for both 8086 realmode& 286-protected (as well as the Intel 80186.) 3.0's implemented AT&T C++ 2.1,the most recent at the time. The separate Turbo Assembler product was no longerincluded, but the inline-assembler could stand in as a reduced functionality version.2.0 PROPOSED METHODOLOGYAim of the project is to develop a Lexical Analyzer that can generate tokens for thefurther processing of compiler. The job of the lexical analyzer is to read the sourceprogram one character at a time and produce as output a stream of tokens. The tokensproduced by the lexical analyzer serve as input to the next phase, the parser. Thus, thelexical analyzer’s job is to translate the source program in to a form more conductive torecognition by the parser.The goal of this program is to create tokens from the given input stream.

i. 2.1.0 Block diagram

Page 9: Flex-Fast Lexical analyzer generator

9

2.2.0 Data Flow DiagramA data flow diagram (DFD) is a graphical representation of the "flow" of datathrough an information system. It differs from the flowchart as it shows the data flowinstead of the control flow of the program. A data flow diagram can also be used for the

visualization of data processing (structured design).

Page 10: Flex-Fast Lexical analyzer generator

10

2.3.0 Flow ChartA flowchart is a common type of chart, that represents an algorithm or a process,showing the steps as boxes of various kinds, and their order by connecting these witharrows. Flowcharts are used in analyzing, designing, documenting or managing aprocess or program in various fields.Flowcharts are used in designing and documenting complex processes. Like othertypes of diagram, they help visualize what is going on and thereby help the viewer tounderstand a process, and perhaps also find flaws, bottlenecks, and other less-obviousfeatures within it. There are many different types of flowcharts, and each type has itsown repertoire of boxes and notational conventions. The two most common types ofboxes in a flowchart are:· A processing step, usually called activity, and denoted as a rectangular box

· A decision usually denoted as a diamond.

Page 11: Flex-Fast Lexical analyzer generator

11

Code using System;using System.Collections.Generic;using System.ComponentModel;using System.Data;using System.Drawing;using System.Linq;using System.Text;using System.Windows.Forms;using System.Threading;using System.IO;

namespace LexicalAnalyzer{ public partial class frmLexicalAnalyzer : Form { public frmLexicalAnalyzer()

Page 12: Flex-Fast Lexical analyzer generator

12

{ InitializeComponent(); }

int count = 0;

public void displayOutput() // for display output in DataGridView { try { AnalyzerBackWork.fileN = txtPath.Text; // reading file and store to array AnalyzerBackWork.myMain();

for (int i = 0; i < AnalyzerBackWork.librariesCount; i++) { dgvOutput.Rows.Add(); dgvOutput.Rows[dgvOutput.RowCount - 1].Cells["Libraries"].Value = AnalyzerBackWork.libraries[i].ToString(); count++; }

for (int i = 0; i < AnalyzerBackWork.keywordsCount; i++) { dgvOutput.Rows.Add(); dgvOutput.Rows[dgvOutput.RowCount - count - 1].Cells["ReserveWords"].Value = AnalyzerBackWork.keywordsArray[i].ToString(); }

for (int i = 0; i < AnalyzerBackWork.operatorCount; i++) { dgvOutput.Rows.Add(); dgvOutput.Rows[dgvOutput.RowCount - count - AnalyzerBackWork.keywordsCount - 1].Cells["Operators"].Value = AnalyzerBackWork.operators[i].ToString(); }

for (int i = 0; i < AnalyzerBackWork.varCount; i++) { dgvOutput.Rows.Add(); dgvOutput.Rows[dgvOutput.RowCount - count - AnalyzerBackWork.operatorCount - AnalyzerBackWork.keywordsCount - 1].Cells["VariableNames"].Value = AnalyzerBackWork.originalVariables[i]; } } catch (Exception) {} }

private void btnOpen_Click(object sender, EventArgs e) { if (openFileDialog1.ShowDialog() == DialogResult.OK) { if (openFileDialog1.FileName.EndsWith(".cpp".ToUpper())) { txtPath.Text = openFileDialog1.FileName;

Page 13: Flex-Fast Lexical analyzer generator

13

rTextSource.Text = AnalyzerBackWork.readFullFile(txtPath.Text); } else { MessageBox.Show("Must Open a .cpp file"); } } else { return; } }

private void frmLexicalAnalyzer_Load(object sender, EventArgs e) { this.Size = new Size(532, 455); }

private void btnBack_Click(object sender, EventArgs e) { int width = 1013; while (width >= 532) { this.Size = new Size(width, 455); this.Refresh(); this.SetStyle(ControlStyles.AllPaintingInWmPaint, true); this.SetStyle(ControlStyles.OptimizedDoubleBuffer, true); this.SetStyle(ControlStyles.UserPaint, true); Thread.Sleep(10); width -= 10; } }

private void btnGenerate_Click(object sender, EventArgs e) { // for spreading form int width = 532; while (width <= 1013) { this.Size = new Size(width, 455); this.Refresh(); this.SetStyle(ControlStyles.OptimizedDoubleBuffer, true); this.SetStyle(ControlStyles.AllPaintingInWmPaint, true); this.SetStyle(ControlStyles.UserPaint, true); Thread.Sleep(10); width += 10; }

displayOutput(); }

private void btnCancel_Click(object sender, EventArgs e) { Application.Exit(); }

Page 14: Flex-Fast Lexical analyzer generator

14

private void dgvOutput_CellContentClick(object sender, DataGridViewCellEventArgs e) {

}

private void txtPath_TextChanged(object sender, EventArgs e) {

}

private void rTextSource_TextChanged(object sender, EventArgs e) {

} }}

Back work codeusing System;using System.Collections.Generic;using System.Linq;using System.Text;using System.IO;using System.Windows.Forms;

namespace LexicalAnalyzer{ public class AnalyzerBackWork { public static char[] Token = new Char[500]; public static string[] keyWords = new String[] { "cout", "cin", "if", "else", "for", "while", "void", "int", "float", "char", "double" };

public static string[] keywordsArray = new String[25]; public static int keywordsCount = 0;

public static string[] operators = new String[25]; public static int operatorCount = 0;

public static string[] libraries = new String[25]; public static int librariesCount = 0;

public static char[] variables = new Char[500]; public static string[] originalVariables = new String[25]; public static int varCount = 0;

public static int index = 0; public static int cnt = 0; public static string fileN; public static string readFullFile(String fileName) { try {

Page 15: Flex-Fast Lexical analyzer generator

15

TextReader tr = File.OpenText(fileName);

return tr.ReadToEnd(); } catch (Exception ex) { MessageBox.Show(ex.Message); }

return null; }

public static void ReadFile() { try { TextReader tr = File.OpenText(fileN); string line = tr.ReadLine();

while (line != null) { for (int i = 0; i < line.Length; i++) { Token[index++] = line[i]; }

line = tr.ReadLine(); } } catch (Exception ex) { MessageBox.Show(ex.Message); } }

public static String charArrayToString(char[] tempStr) { string merged = "";

foreach (char str in tempStr) { merged += str; }

return merged; }

public static void extractKeyWords(String toSearch) // search reserve words { for (int i = 0; i < keyWords.Length; i++) { if (keyWords[i].Equals(toSearch)) { keywordsArray[keywordsCount++] = toSearch; // KEYWORDS STORAGE } }

Page 16: Flex-Fast Lexical analyzer generator

16

}

public static void extractVariablesAndDataTypes(Char[] word) { extractKeyWords(charArrayToString(word));

char[] varArr = new Char[500]; int varCounter = 0;

if (string.Compare(charArrayToString(word), "int") == 0 || string.Compare(charArrayToString(word), "char") == 0 || string.Compare(charArrayToString(word), "float") == 0 || string.Compare(charArrayToString(word), "double") == 0) { while (Token[cnt] != ';') varArr[varCounter++] = Token[cnt++]; char[] tempVariable = new Char[varCounter]; for (int a = 0; a < varCounter; a++) tempVariable[a] = varArr[a];

for (int v = 0; v < varCounter; v++) { if (tempVariable[v] == ',') { variables[varCount] = ' '; varCount = varCount + 1; } else if (tempVariable[v] == '=') { while (tempVariable[v] != ',' && v < varCounter - 1) { v = v + 1; }

variables[varCount] = ' '; varCount = varCount + 1; } else { variables[varCount] = tempVariable[v]; varCount = varCount + 1; } }

originalVariables = charArrayToString(variables).Split(new String[] { " " }, StringSplitOptions.RemoveEmptyEntries); for (int b = 0; b < varCounter; b++) { varArr[b] = ' '; }

varCounter = 0; } }

Page 17: Flex-Fast Lexical analyzer generator

17

public static void myMain() { char[] word = new Char[500]; int counter = 0; char[] lib = new Char[500]; int libCounter = 0;

ReadFile();

for (cnt = 0; cnt < index; cnt++) { // FOR CHECKING OPERATORS

if (Token[cnt] == '+' || Token[cnt] == '-' || Token[cnt] == '*' || Token[cnt] == '/' || Token[cnt] == '%' || Token[cnt] == '&' || Token[cnt] == '^' || Token[cnt] == '<' || Token[cnt] == '>' || Token[cnt] == '!' || Token[cnt] == '=') { if (Token[cnt] == '+') { if (Token[cnt + 1] == '+') operators[operatorCount++] = Token[cnt] + "" + Token[cnt + 1]; else operators[operatorCount++] = Token[cnt].ToString(); } else if (Token[cnt] == '-') { if (Token[cnt + 1] == '-') operators[operatorCount++] = Token[cnt] + "" + Token[cnt + 1]; else operators[operatorCount++] = Token[cnt].ToString(); } else if (Token[cnt] == '<') { if (Token[cnt + 1] == '=') operators[operatorCount++] = Token[cnt] + "" + Token[cnt + 1]; else operators[operatorCount++] = Token[cnt].ToString(); } else if (Token[cnt] == '>') { if (Token[cnt + 1] == '=') operators[operatorCount++] = Token[cnt] + "" + Token[cnt + 1]; else operators[operatorCount++] = Token[cnt].ToString(); } else if (Token[cnt] == '!') {

Page 18: Flex-Fast Lexical analyzer generator

18

if (Token[cnt + 1] == '=') operators[operatorCount++] = Token[cnt] + "" + Token[cnt + 1]; else operators[operatorCount++] = Token[cnt].ToString(); } else if (Token[cnt] == '=') { if (Token[cnt + 1] == '=') operators[operatorCount++] = Token[cnt] + "" + Token[cnt + 1]; else operators[operatorCount++] = Token[cnt].ToString(); } else operators[operatorCount++] = Token[cnt].ToString(); }

// FOR CHECKING LIBRARIES

if (Token[cnt] == '#') { while (Token[cnt] != '>') lib[libCounter++] = Token[cnt++];

lib[libCounter++] = Token[cnt];

char[] tempLib = new Char[libCounter]; for (int a = 0; a < libCounter; a++) tempLib[a] = lib[a];

libraries[librariesCount++] = charArrayToString(tempLib); // LIBRARIES STORAGE

for (int b = 0; b < libCounter; b++) { lib[b] = ' '; } libCounter = 0; }

// FOR CHECKING REST

if (Token[cnt] == ' ' || Token[cnt] == '(' || Token[cnt] == '{' || Token[cnt] == ',' || Token[cnt] == '<' || Token[cnt] == '>' || Token[cnt] == ';' || Token[cnt] == '}' || Token[cnt] == ')') { word[counter] = '\0'; char[] tempWord = new Char[counter]; for (int a = 0; a < counter; a++) tempWord[a] = word[a];

Page 19: Flex-Fast Lexical analyzer generator

19

extractVariablesAndDataTypes(tempWord); // FUNCTION That extracts variables and datatypes

for (int j = 0; j < counter; j++) word[j] = ' '; counter = 0; } else { word[counter++] = Token[cnt]; } } } }

}

Screen short Starting design form

Page 20: Flex-Fast Lexical analyzer generator

20

Coding print

Page 21: Flex-Fast Lexical analyzer generator

21

Output screen

Program insert

Page 22: Flex-Fast Lexical analyzer generator

22

Token generated from program

Page 23: Flex-Fast Lexical analyzer generator

23

3.0 APPROACHED RESULT AND CONCLUSIONLexical analysis is a stage in compilation of any program. In this phase we generatetokens from the input stream of data. For performing this task we need a LexicalAnalyzer.So we are designing a lexical analyzer that will generate tokens from the giveninput in a high level language statement. We have not used any database for storing thesymbol table used in this project as such, as parsing the entire statement is beyond thescope of this project. This makes our lexical analyzer portable and independent of aDBMS. This although reduces the number of keywords and special charactersidentifiable by the lexical analyzer, and increases the length of the code, but on theother hand reduces the program complexity and increases the overall speed of thesystem.The main features of this lexical analyzer can be summarized as: -· Simple implementation.· Fast lexical analysis.· Efficient resource utilization.

· Portable4.0 APPLICATIONS AND FUTURE WORKThis lexical analyzer can be used as a stand alone string analysis tool, which cananalyze a given set of strings and check there lexical correctness. This can also be usedto analyze the string sequences delimited by white spaces in a C / C++ source code(*.c / *.cpp) file and output all the results in a text file, if proper functionality of filehandling will be used in the source code of the lexical analyzer, this functionality willnot be a part of the present project but will be available in an upgraded version, if timepermits the development of it. Further more the applications of a lexical analyzerinclude: -1. Text Editing2. Text Processing3. Pattern Matching4. File SearchingAn enhanced version of this lexical analyzer can be incorporated with a Parserhaving the functionality of syntax directed translation, to make a complete Compiler inthe future. The lexical assembly of the keywords and special characters can beappropriately modified in the source code to create a new high level language like C++.

Page 24: Flex-Fast Lexical analyzer generator

24

REFERENCES:[1]. Yashwant Kanetkar, “Let Us C”, ISBN: 10:81-8333-163-7, eighth edition,Pages: 424 – 437.[2]. www.wikipedia.orghttp://en.wikipedia.org/wiki/Lexical_analysis[3]. www.dragonbook.stanford.eduhttp://dragonbook.stanford.edu/lecture-notes/Stanford-CS143/03-Lexical-Analysis.pdf[4]. www.docstore.mik.uahttp://docstore.mik.ua/orelly/java/langref/ch02_01.htm[5]. www.isi.eduhttp://www.isi.edu/~pedro/Teaching/CSCI565-Spring10/Lectures/LexicalAnalysis.part3.6p.pdf[6]. www.cs.berkeley.eduhttp://www.cs.berkeley.edu/~hilfingr/cs164/public_html/lectures/note2.pdf[7]. John E. Hopcroft, J.D. Ullman, “Introduction to Automata Theory, Languages, andComputation”, ISBN: 81-85015-96-1, eighteenth edition,Page No.: 9.[8]. Pankaj Jalote, “An Integrated Approach To Software Engineering”, ISBN: 81-

7319-271-5, second edition, Pages: 13 – 17.

...............................Thanks.......................