Code Scanner

From GPWiki
Jump to: navigation, search

A "Code Scanner" is a parser, a system capable to make syntax analysis. This code scanner will be a class. The language used for this tutorial is Pascal, but it can be easily translated. The comments in the code are explaining the code.

Requirements

  • You must understand pointers - Two kinds of pointers will be used:
    • Pointers to characters variables;
    • Pointers to functions/procedures;
  • A "Text Class" to manipulate list of strings(in object pascal, there is TStringList);

TCodeScanner Class Full Body

This is how the class looks after finished. But we will work step by step. Just take a look to the full picture. To make easy to understand, variables with the prefix 'F' are those ones inside the class, they are the FIELD variables,. Variables with a "B" are BLOCK variables, they are local.

The following types are pointers to procedures:

  // TGetWordEvent is a important pointer to a procedure;
  TGetWordEvent = procedure(Sender: TObject; BWord: String; var BIndexText: Integer; var BIndexStringFirst : Integer; var BIndexStringLast : Integer) of object;

The body of the class TCodeScanner:

  TCodeScanner = class(TObject)
  public
    FIndexCharP : PChar; // Index and Pointer to Char;
    FIndexChar : Integer; // Only Index to Char;
    FIndexStringFirst, FIndexStringLast : Integer; // Indexes and/or counters; 
    FListKeyWord : TStringList; //List of string containing the key words;
    FListSeparator : TStringList; //List of string containing the separators(symbols);
 
    //FOnGetWord is used to make the analysis of the syntax;
    FOnGetWord : TGetWordEvent;
 
    FConsiderBlanckSpace : Boolean; //Determines if blank spaces are considered at the scanning;
next line; It is useful, if the current line is not of interest;
    FBreak : Boolean; //Force to stop the scanning;
    FJumpLine : Boolean; //Like FStringListToJump, force to jump the line;
 
    {Important variables, they are used to make analisys of the code the way the user wants to}
    FParamTypeToFind : String;
    FParamTypeResult : String;
 
    FParamTypeNameToFind : String;
    FParamTypeNameResult : String;
 
    FParamLeftToFind : String;
    FParamLeftResult : String;
 
    FParamLeftTypeToFind : String;
    FParamLeftTypeResult : String;
 
    FParamRightToFind : String;
    FParamRightResult : String;
 
    FParamBlockToFind : String;
    FParamBlockResult : String;
 
    function IsSeparator(BChar : String) : Boolean; //Check if the BChar is in FListSeparator;
    function IsKeyWord(BWord : String) : Boolean; //Check if the BChar is in FListKeyWord; 
    function IsNumber(BWord : String) : Boolean; //Check if it is a number;
 
    function GetWord : String; // Get next word of the text to be analysed;
 
    procedure ScanStringList(BStringList : TStrings); // Start the process of scanning;
    constructor Create;
    destructor Destroy;
  end;

Building the Class - Part 1

First function we will create is GetWord. GetWord get the next word in the code, just it. This function don't make the syntax analysis.

  TCodeScanner = class(TObject)
  public
    FIndexCharP : PChar; // Index and Pointer to Char;
    FIndexChar : Integer; // Only Index to Char;
    FIndexStringFirst, FIndexStringLast : Integer; // Indexes and/or counters; 
 
    {Pointers to procedures. These variables is the user of the class who determines the value}
    //OnGetWord is used to make the analysis of the syntax;
    FConsiderBlanckSpace : Boolean; //Determines if blank spaces are considered at the scanning;
    function GetWord : String; // Get next word of the text to be analysed;
  end;
 
function TCodeScanner.GetWord: String;
var
  BString : String; // Our temporary string.
begin
  BString := '';
  // OnGetWord works two different ways, considering and unconsidering blank spaces;
 
  // Make the scanning, but ignoring any blank space; 
  if (Self.FConsiderBlanckSpace = False) then
  begin
    while (Self.FIndexCharP^ = #32) do
    begin
      Inc(Self.FIndexCharP);
      Inc(Self.FIndexChar);
      // The first "while-begin-end" block is important. If the character FIndexCharP
      // is a blank space[#32], the variable address is increased by 1, looking for the
      // next character. It repeats until the value of FIndexCharP not be a blank space; 
    end;
 
    // If the chracter is not a blank space, but is a separator;
    if (IsSeparator(Self.FIndexCharP^) = True) then
    begin
      BString := Self.FIndexCharP^;
      // Returns the separator, increase the address of the character by 1,
      // and exit the function OnGetWord;
      Result := BString;
      Inc(Self.FIndexCharP);
      Inc(Self.FIndexChar);
      Exit;
    end;
    // If the character found is not a separator,and while it is
    // different from a blank space, add the character to the temporary
    // string, because it is a word, or a number!
 
    while (Self.FIndexCharP^ <> #32) and (Self.IsSeparator(Self.FIndexCharP^) = False) do
    begin
      BString := BString + Self.FIndexCharP^;
      Inc(Self.FIndexCharP);
      Inc(Self.FIndexChar);
      // Don't need to return the value now, it is returned at the end of the function;
    end;
  end;
 
  // Make the scanning, considering blank space; 
  if (Self.FConsiderBlanckSpace = True) then
  begin
    // If the character is a separator;
    if (IsSeparator(Self.FIndexCharP^) = True) then
    begin
      BString := Self.FIndexCharP^;
      // Returns the separator, increase the address of the character by 1,
      // and exit the function OnGetWord;
      Result := BString;
      Inc(Self.FIndexCharP);
      Inc(Self.FIndexChar);
      Exit;
    end;
    // While the character is a separator
    while (Self.IsSeparator(Self.FIndexCharP^) = False) do
    begin
      // Add the character to the temporary string, even if it is a blank space;
      BString := BString + Self.FIndexCharP^;
      Inc(Self.FIndexCharP);
      Inc(Self.FIndexChar);
    end;
  end;
 
  // The result, or it is a separator, or it is a word finishing
  // before find a separator or blank space(if the blank spaces are ignored);
 
  Result := BString;
end;

As you can see, the function OnGetWord has no parameter to point to the line we want to scan. The variable FIndexCharP do it. it is pointed, before scanning the line, to the first character of the line. You will see it later.

Now, we make the implementation of the functions IsKeyWord, IsNumber, IsSeparator:

function TMECodeScanner.IsKeyWord(BWord: String): Boolean;
begin
  // Look if the BWord string is in the FListKeyWord, if don't, then the result is false;
  if (Self.FListKeyWord.IndexOf(BWord) > -1) then
  begin
    Result := True;
  end;
end;
 
function TMECodeScanner.IsNumber(BWord: String): Boolean;
var
  BNumberResult : Integer;
begin
  // Try convert the BWord string to a integer, if don't, then the result is false;
  Result := TryStrToInt(BWord, BNumberResult);
end;
 
function TMECodeScanner.IsSeparator(BChar: String): Boolean;
begin
  Result := False;
  // I have created the FListSeparator, but I will not use it.
  // The following code check if BChar is one of these separators.
  if (BChar = ';') or (BChar = #39) or (BChar = '!') or (BChar = '-') or (BChar = '=') or (BChar = '+')
  or (BChar = '[') or (BChar = ']') or (BChar = '/') or (BChar = '/') or (BChar = '/') or (BChar = '@')
  or (BChar = '(') or (BChar = ')') or (BChar = '.') or (BChar = ':') or (BChar = '*') or (BChar = '@')
  or (BChar = '#') or (BChar = '$') or (BChar = '%') or (BChar = '&') or (BChar = '?') or (BChar = '\')
  or (BChar = '<') or (BChar = '>') or (BChar = ',') then
  begin
    Result := True;
  end;
end;

Building the Class - Part 2

Now, let's create the procedure ScanStringList. When you want to scan a text, you will call the procedure.

procedure TMECodeScanner.ScanStringList(BStringList: TStrings);
var
  BSourceCode : TStrings;
  BIndexText : Integer;
  BWord : String; // BWord is the word returned from OnGetWord;
begin
  BProjectSourceCode := TStringList.Create;
  BProjectSourceCode.Assign(BStringList);
  Self.FIndexChar := 0;
  BIndexText := 0;
  Self.BIndexStringFirst := -1;
  Self.BIndexStringLast := -1;
  Self.FBreak := False;
 
  // A loop inside other loop. One loops through the lines of the text, and,
  // for each line, the other loops through the characters;
 
  // From the first line to the last line of the text(and if it was not forced to break), do it:
  while (BIndexText <= BProjectSourceCode.Count - 1) and (Self.FBreak = False) do
  begin
    Self.FJumpLine := False;
 
    // Initialize FIndexCharP, pointing to the first character of the line;
    Self.FIndexCharP := PChar(BStringList.Strings[BIndexText]);
    Self.FIndexChar := 0;
 
    // While FIndexCharP is not in the last character in the line(remember, last char of a
    // string is a null character[#0]), and it was not forced to
    // jump the line, do it:
    while (Self.FIndexCharP^ <> #0) and (Self.FJumpLine = False) do
    begin
      BWord := Self.GetWord;
 
      // Remember, the variable FOnGetWord is a pointer to a procedure. You will create
      // this procedure by your self(I will make some examples), because this procedure
      // defines how the language is interpreted;
      // If FOnGetWord is not nil/nullm do it:
      if Assigned(Self.FOnGetWord) then
      begin
        // Call the pointed procedure setting the parameters.
        // 1 Param = THIS object, the TCodeScanner;
        // 2 Param = The current line of the text;
        // 3&4 Param = First and Last line of part of the code, that
        // you will define by your self how to get it(there will be examples); 
        Self.FOnGetWord(Self, BWord, BIndexText, BIndexStringFirst, BIndexStringLast);
      end;
    end;
    // Finish the process in this line, and jump to the next;
    BIndexText := BIndexText + 1;
  end;
end;

Making the Class Work - Example

To make the example, let's make a simple code, it is a invented language:

PROGRAM MyProgramName;

TYPES  
  CLASS MyClassName;
    FUNCTION MyFunction;

  END MyClassName;

IMPLEMENTS

FUNCTION MyClassName>MyFunction;
  // Line 01
  // Line 02
  // Line 03
  // Line 04
FUNCEND MyClassName>MyFunction;

PROGRAMEND;

Now, we must know that PROGRAM, TYPES, and IMPLEMENTS are our blocks. So Let's say that we want to copy part of the code from the source-code, to another TStringList. We will copy the function MyFunction, for example. First, we create the procedure that will be pointed by the variable FOnGetWord

procedure OnGetWordReadClasFunction(Sender: TObject; BWord: String;
  var BIndexText: Integer; var BIndexStringFirst : Integer; var BIndexStringLast : Integer);
begin
  // See in wich block we are;
  if (BWord = 'PROGRAM') or (BWord = 'TYPES') or (BWord = 'IMPLEMENTS') then
  begin
    // Store the block name in the result param FParamBlockResult!
    FCodeScanner.FParamBlockResult := BWord;
  end;
 
  // Since we want the part of the text that contains the function code,
  // and it is in the block "IMPLEMENTS", the next block is called only if
  // the block result is "IMPLEMENTS". Pay attention, the following code is
  // not called just after the code above, because we didn't called GetWord again.
  // First, the code above is executed, then this procedure is quitted. Next time,
  // when this procedure is called again, the block result will be already "IMPLEMENTS",
  // and then the following code will be executed!
 
  if (Self.FCodeScanner.FParamBlockResult = 'Self.FCodeScanner.FParamBlockToFind') then
  begin
    if (BWord = 'FUNCTION') then
    begin
      BWord := Self.FCodeScanner.GetWord;
      // Check if the function belongs to MyClassName, to make this possible
      // we make FParamLeftToFind = "MyClassName" before start the scanning;
      if (BWord = FParamLeftToFind) then
      begin
        BWord := Self.FCodeScanner.GetWord;
        if (BWord = '>') then
        begin
          BWord := Self.FCodeScanner.GetWord;
          if (BWord = Self.FCodeScanner.FParamRightToFind) then
          begin
            BWord := Self.FCodeScanner.GetWord;
            if (BWord = ';') then
            begin
              // If at the end of the line ";" is found, then the code is correct,
              // and the index of first line of the function is stored; 
              Self.FCodeScanner.BIndexStringFirst := BIndexText;
            end;
          end;
        end;
      end;
    end;
 
    // Now, let's take the index of the last line
    if (BWord = 'FUNCEND') then
    begin
      BWord := Self.FCodeScanner.GetWord;
      // Check if the function belongs to MyClassName, to make this possible
      // we make FParamLeftToFind = "MyClassName" before start the scanning;
      if (BWord = FParamLeftToFind) then
      begin
        BWord := Self.FCodeScanner.GetWord;
        if (BWord = '>') then
        begin
          BWord := Self.FCodeScanner.GetWord;
          if (BWord = Self.FCodeScanner.FParamRightToFind) then
          begin
            BWord := Self.FCodeScanner.GetWord;
            if (BWord = ';') then
            begin
              // If at the end of the line ";" is found, then the code is correct,
              // and the index of first line of the function is stored; 
              Self.FCodeScanner.BIndexStringLast := BIndexText;
            end;
          end;
        end;
      end;
    end;
 
    // Note: When and how to use FConsiderBlackSpace?
    // It's like the code above. When the Scanner finds a "'"(BWord = #39),
    // you set FConsiderBlankSpace to True, because strings usualy have spaces;
    // Make the code run until find another "'" and then set
    // FConsiderBlankSpace to False again.
 
    // Since we want to check the code only inside the "IMPLEMENTS" block, we
    // make it jump the lines inside the others blocks:
    if (Self.FCodeScanner.FParamBlockResult = 'PROGRAM') or (Self.FCodeScanner.FParamBlockResult <> 'TYPES') then
    begin
        Self.FCodeScanner.FJumpLine := True;
    end;
   end;
 end;

Now, we must use this procedure, that is how:

  FCodeScanner.FParamBlockToFind := 'IMPLEMENTS';
  FCodeScanner.FParamLeftToFind := 'MyClassName';
  FCodeScanner.FParamRightToFind := 'MyFunction'; 
 
  FCodeScanner.OnGetWord := OnGetWordReadClassProcedure;
  // Start the scanning;
  FCodeScanner.ScanStringList(FSourceCode);
 
  // Now you can create a code to copy the lines of FSourceCode
  // from BIndexStringFirst to BIndexStringLast

This is just an example to copy part of the code to another TStringList, but it could be anything: the code could be interpreted, since you could put codes inside the FOnGetWord procedure that woul be executed according to the scanned words; Could be a resource reader; Use your imagination.

Full Source-Code

You can see the code here: [1]

88x31.png

The Source-Code by Felipe Ferreira is licensed under Creative Commons License 3.0 Unported. Tutorial and Source-Code by FelipeFS.

You can use this and distribute, if you want, you can modify this. Please, just keep the author name.

Final Note

GUITutorial warn.gif Notice
This class is about to be implemented.