Thursday, February 21, 2013

Grammar, why the formal language?

In formal language theory, a grammar consists of
  • A set of characters (terminal symbols), the alphabet of the language
  • A set of variables (non-terminal symbols)
  • A non-terminal symbol (the start symbol or variable)
  • A set of productions or rules consisting of pairs of finite strings, where the strings consist of terminal and non-terminal symbols and the first string (the left-hand side) contains at least one non-terminal symbol

A production is usually written with a right-pointing arrow between the left-hand and right-hand sides and a right-hand side that is the null string is represented by a lower-case Greek epsilon.

A grammar is classified according to its productions
  • If the left-hand side of each production consists of a single non-terminal symbol and the right-hand side consists of
    1. one terminal symbol or
    2. the empty string or
    3. one terminal symbol followed by one non-terminal symbol, and all other productions that do not fit in (1) or (2) fit in (3), or
    4. one non-terminal symbol followed by one terminal symbol, and all other productions that do not fit in (1) or (2) fit in (4)
    the grammar is regular (type 3). 
  •  If the left-hand side of each production consists of a single non-terminal symbol and the grammar is not regular, the grammar is context-free (type 2).
  • If the left-hand side of each production that does not consist of a single non-terminal symbol contains a non-terminal symbol and the right-hand side consists of
    • the substring preceding that non-terminal symbol in the left-hand side, followed by
    • at least two (terminal or non-terminal) symbols, followed by
    • the substring following that non-terminal symbol in the left-hand side
    the grammar is context-sensitive (type 1).
  •  If the grammar is not regular, context-free, or context-sensitive, it is type 0.
  • If the sentential variable does not appear in any right-hand side and the only production with an empty right-hand side has a left-hand side consisting solely of the sentential variable, the grammar is said to be epsilon-free.
An example of a grammar:
  • the terminal symbols are a and b
  • The non-terminal symbols are S, A, and B
  • The sentential variable is S
  • The productions are S -> a, S -> aA, S -> b, S -> bB, A -> a, A -> aA, B -> b, B -> bB
This grammar generates the language { a, aa, aaa, . . ., b, bb, bbb, . . . } which can be represented by the regular expression a+ | b+.

A direct derivation is a pair of strings where the second string is the result of replacing an occurrence within the first string of the left-hand side of some production with the right-hand side of that production (for example aaaA => aaaaA).  A derivation is a sequence of strings where each pair of consecutive string is a direct derivation (for example aaA => aaaA => aaaaA => aaaaa), the derivation can be summarized by its first and last strings (in this example aaA =>* aaaaa). The language generated by a grammar is the set of strings of terminals that can be derived from the sentential variable.

A language that can be generated by a regular grammar is regular, a language that can be generated by a context-free grammar is context-free, and a language that can be generated by a context-sensitive grammar is context-sensitive.

0 Comments:

Post a Comment

Please enter your comment here. Comments wil be reviewed before being published.

Subscribe to Post Comments [Atom]

<< Home