Lexicon, syntax and semantics of the programming language. Syntax and semantics are pascal. Extended Backus-Naur form

The syntax is checked early in the translation. In interpreted programming languages, syntax checking is performed either during the interpretation (execution) process, or during the preliminary compilation into intermediate code. In addition, the syntax can be checked directly when editing the source code of programs using the IDE.

Function writing syntax

Function writing syntax - the formal rules to be met by a function definition or call record; function notation form. If the syntax of the function is incorrect, the compiler will return an error and the program will not be built until the error is fixed.

For example, syntax errors in writing a function include:

the spelling of the name of the function when it is called, which does not correspond to the grammar of the language (incorrect case of characters for case-sensitive languages);
use of literals when calling or defining a function that do not correspond to the grammar of the language (other types of brackets, argument separator);
the absence of a data type returned by the function (for those languages \u200b\u200bfor which this is defined by the grammar).

Syntax and semantics of programming languages

Every programming language, like any natural language, has its own syntax and semantics.

Syntax- a set of rules of a certain language that determine the formation of its elements. In other words, it is a set of rules for the formation of semantically significant sequences of characters in a given language. Syntax is specified using rules that describe the concepts of a certain language. Examples of concepts are: variable, expression, operator, procedure. The sequence of concepts and their admissible use in the rules determines the syntactically correct structures that form the programs. It is the hierarchy of objects, and not how they interact with each other, is defined through syntax. For example, a statement can only appear in a procedure, and an expression in a statement, a variable can consist of a name and optional indices, etc. The syntax is not associated with such phenomena in the program as "type mismatch" or "variable with the given name is not defined". This is what semantics does.

Semantics- rules and conditions that determine the relationship between elements of the language and their semantic meanings, as well as the interpretation of the meaningful meaning of syntactic structures of the language. Objects of a programming language are not only placed in the text in accordance with a certain hierarchy, but are also additionally linked to each other through other concepts that form various associations. For example, a variable for which the syntax defines a valid location only in declarations and some operators has a certain type, can be used with a limited set of operations, has an address, size, and must be described before being used in a program.

The source code text in high-level language is a regular test file. To "read" it and turn it into a sequence of machine instructions, first of all, a parsing of the program text is performed.

Syntactical analyzer- a compiler component that checks the original operators for compliance with the syntactic rules and semantics of a given programming language. Despite the name, the analyzer is concerned with checking both syntax and semantics. It consists of several blocks, each of which solves its own problems.

Programming languages \u200b\u200bare quite different from each other in purpose, structure, semantic complexity, implementation methods. This imposes its own specific features on the development of specific translators. The structure of a language characterizes the hierarchical relationships between its concepts, which are described by syntactic rules. Programming languages \u200b\u200bcan be very different from each other in the organization of individual concepts and in the relationships between them. For example, the C ++ language allows the declaration of variables at any point in the program before using it for the first time, while in Pascal the variables must be defined in a special description area. Depending on the decision made, the translator can analyze the program in one or several passes, which affects the broadcast speed.

The semantics of programming languages \u200b\u200bvary widely. They differ not only in the peculiarities of the implementation of individual operations, but also in programming paradigms that determine fundamental differences in the methods of program development. The specifics of the implementation of operations can relate to both the structure of the data being processed and the rules for processing the same data types. Even when performing the addition of two integers, languages \u200b\u200bsuch as C and Pascal can behave differently.

The same language can be implemented in several ways. This is due to the fact that the theory of formal grammars allows for different methods of parsing the same sentences. Accordingly, translators in different ways can get the same result (object program) from the original source code. There are several Pascal compilers: Turbo Pascal, MS Pascal, Pascal with Objects, Delphi, Builder. However, all programming languages \u200b\u200bhave a number of common characteristics and parameters. This commonality also determines the principles of organizing translators, which are similar for all languages.

For any language, its creators define:

Lots of symbols that can be used to write correct programs (alphabet);

Lots of correct programs (syntax);

- the "meaning" of each correct program (semantics).

Let's look at an example of parsing. Let the formula a + (b + c) * d appear in the source code of the program. In most programming languages, such a formula defines a hierarchy of software objects that can be displayed in the form of a tree (Fig. 17.1). The circles represent symbols used as elementary constructions, and the rectangles represent compound concepts that have a hierarchical and possibly recursive structure.

A syntactic structure that is correct for one language may be wrong for another. For example, in Lisp, the above expression will not be recognized. However, for this language the expression (* (+ a b c) d) will be correct.

Figure: 21.1. Parse tree.

Another characteristic feature of all languages \u200b\u200bis their semantics. It determines the meaning of the operations of the language, the correctness of the operands. Chains that have the same syntactic structure in different programming languages \u200b\u200bmay differ in semantics (which, for example, is observed in C ++, Pascal, Basic for the above fragment of an arithmetic expression). Knowledge of the semantics of the language allows you to separate it from its syntax and use it for conversion to another language (to generate code). The description of the semantics and recognition of its correctness is usually the most laborious and voluminous part of the translator, since it is necessary to enumerate and analyze the set of options for admissible combinations of operations and operands.

Lesson summary

Two aspects of languages

A programming language has two aspects:
- Syntax (coding rules, spelling, word order)
Some programming languages \u200b\u200bhave similar syntax
Some languages \u200b\u200bhave exotic, unusual syntax
Semantics are hard to see, implicit

Syntax and semantics

In modern languages good code means easy to understand semantics
If you understand what makes the code difficult, then the code is not very good.
The syntax is easy to learn
You need to know the syntax, but it's not enough

Language selection

Not that important what language to start with
You will switch to different languages \u200b\u200band use multiple languages \u200b\u200band technologies at the same time. This is the reality of modern programming.
We chose JavaScript because it is simple, very popular, and works almost everywhere.
JavaScript is commonly used for writing websites, mobile apps, server software, and more.
This site is using JavaScript this very second.
JavaScript programs are running on your computers now.

Additionally

You will be writing programs in a modern programming language and most of the time you will not be faced with the binary system - those zeros and ones or bits... But you must understand idea, which underlies binary numbers. Here's a short and simple illustrated explanation:

Lesson transcript

We called the button-pressing system "language". The lever, apparently, is a separate thing, it is like the "START" command. We enter the code with the buttons and START it with the lever.

Do you know how linguists discuss grammar, word structure and the like? They are not particularly interested in novels, songs or stories, they are more interested in the language that is used for these novels, songs and stories. They are interested in the code. Most people, by contrast, are interested in stories and meaning. Not only in books and films, but also in life. When I ask my girlfriend to buy me a new album, because I make ridiculous drawings for these lessons, I am interested in the result, the goal, and not the etymology and structure of the word "album".

We can think of a language as having two ... components or two features: grammar and purpose. Programming languages \u200b\u200bare similar in this sense, but since they are much simpler than human languages, grammar is not in their first place, but syntax - word order and word formation - is important. And for the purpose, for the concept of "meaning" programmers use the buzzword "semantics".

Let's try to compare the language of Tota's magic box with some modern programming language.

This box has a very complex syntax, and the X and O characters are difficult to work with. But this modern code looks ... hmm, like English! This syntax is much easier to learn, at least you can guess what each word means.

The set of rules that describe how symbols and words can be used is syntax.

You will see that some programming languages \u200b\u200bhave similar syntax, and some exotic, unusual.

The semantics or meaning is harder to see because it is implicit. What is the purpose of this code? This is a fiery flash, as we have already understood. What is the purpose of this code? You might have guessed it: it prints the phrase backwards. The meaning, the end result of running code is semantics.

In modern programming languages, the relationship between code and its apparent purpose can be used to judge the quality of the code. If you look at the code and quickly grasp its purpose, then this is good code. If, when you look at the code, you think "what the hell is this ?!", it is probably not very good. This brings us to an important idea: code is written for people. Computers don't care if the code is easy to read: for them, any code is easy to read.

You might be thinking - well, I want to write applications and create websites, so naturally the purpose is important to me - semantics, just as for a writer - plot, not linguistics. So why bother with syntax? A programming language is a tool that you use to tell your story, whether it be a website, an app, or a bot. And the better you know your instrument, the less you think about it and the more you can do. Just like a writer should be able to express ideas in the right words and use syntactic constructions that people will understand.

Fortunately, programming languages \u200b\u200bhave very simple syntax compared to the languages \u200b\u200bpeople speak. So don't worry, even though we have to learn the syntax, this task will be quite simple.

So ... is programming easy? If computers are dumb and do only what we tell them to do, and the syntax of a programming language is simple, it should all be easy enough together, right?

Um ... no. To be honest, programming is not that easy. Well, yes, writing a school essay is easy compared to War and Peace. And a doctoral dissertation on quantum physics is a whole different level. So don't generalize. Any of the listed activities are variants of written presentation, but it is not advisable to compare them and judge "written presentation". Programming can be simple or difficult, depending on who is doing what.

You will quickly find that the syntax is easy to learn, but it won't help you on its own. This is a necessary thing, but not self-sufficient.

Over the course of the next lessons, we will focus on semantics, purpose, and look at some of the cool ideas that allowed computers, the Internet, robots, and mobile phones to emerge. In parallel, we will explore the syntax.

The last thing we will touch on before diving is, uh, which language to choose? There are so many of them and it can seem like a critical moment. The moment, of course, is critical, but not because "we need to make a final decision that will affect the rest of our lives," but because we must understand that choosing a programming language is like choosing a tool for typing text, not a human language.

You can write something with a pen on paper, use a typewriter, computer, or blackboard. Each tool has its own capabilities and limitations. And if you want to become a writer, it doesn't really matter what you use to enter text - pen or keyboard buttons. We want to learn programming, not just a programming language.

Choose a language that is good enough, simple enough, well-known and with good capabilities. In the process of professional growth, you WILL switch between languages, use several languages \u200b\u200band technologies at once, and this will not be a problem for you, just as switching from a typewriter to Microsoft Word is not a problem.

We choose JavaScript as the first programming language and as a tool for learning programming. JavaScript programs run on your computer almost all the time, since most websites, including the one you watch this video on, use JavaScript. It is incredibly popular and is becoming more popular every year.

Well, let's start programming!

The basic elements of any programming language are its alphabet, syntax and semantics.

Alphabet - a set of characters displayed on printing devices and screens and / or entered from the terminal keyboard. This is usually the Latin-1 character set with the exception of control characters. Sometimes this set includes non-displayable characters with indication of the rules for writing them (combining into tokens).

Vocabulary - a set of rules for the formation of chains of symbols (tokens) that form identifiers (variables and labels), operators, operations and other lexical components of the language. This also includes reserved (forbidden, keywords) programming language words designed to denote operators, built-in functions, etc. Sometimes, equivalent tokens, depending on the programming language, can be denoted by one or more alphabet characters. For example, the operation of assigning a value in the C language is denoted as "\u003d", and in the Pascal language - ": \u003d". Operator brackets in the C language are specified by the symbols "(" and ")", and in Pascal - by begin and end. The boundary between the vocabulary and the alphabet, therefore, is very conditional, especially since the compiler usually replaces recognized keywords with internal code at the lexical analysis phase (for example, begin - 512, end - 513) and further considers them as separate characters.

Syntax - a set of rules for the formation of language structures, or sentences of a programming language - blocks, procedures, compound operators, conditional operators, loop operators, etc. A feature of the syntax is the principle of nesting (recursiveness) of rules for constructing structures. This means that a language syntax element in its definition directly or indirectly in one of its parts contains itself. For example, in the definition of a loop operator, the body of a loop is an operator, a special case of which is the same loop operator.

Strict adherence to the rules of spelling (syntax) of the program is required. In particular, Pascal clearly defines the purpose of punctuation marks. The semicolon (;) is placed at the end of the program header, at the end of the variable declaration section, after each statement. You can omit the semicolon before the word End. The comma (,) is a separator of elements in various lists: the list of variables in the description section, the list of input and output values.

A strict syntax in a programming language is required primarily for a translator. A translator is a program that is formally executed. If, for example, a comma must be the separator in the list of variables, then any other character will be perceived as an error. If a semicolon is a separator of statements, the translator interprets as an operator the entire part of the program text from one semicolon to another. If you forgot to put this sign between any two operators, the translator will take them as one, which will inevitably lead to an error.

The main purpose of syntactic rules is to give unambiguous meaning to language constructions. If any construction can be interpreted ambiguously, then it must contain an error. It is better not to rely on intuition, but to learn the rules of the language.

To describe the syntax of a programming language, you also need some kind of language. In this case, we are talking about a metalanguage ("supralanguage") intended to describe other languages. The most common metalanguages \u200b\u200bin the programming literature are Backus-Naur metalinguistic formulas (BNF language) and syntax diagrams. The language of syntax diagrams is more visual, easier to understand.

In BNF, any syntactic concept is described in the form of a formula consisting of the right and left parts connected by the sign :: \u003d, the meaning of which is equivalent to the words "by definition there is". To the left of the sign :: \u003d, the name of the defined concept (metavariable) is written, which is enclosed in angle brackets< >, and on the right side a formula or diagram is written that defines the entire set of values \u200b\u200bthat a metavariable can take.

The syntax of a language is described by the sequential complication of concepts: first, the simplest (basic) ones are determined, then more and more complex ones, which include the previous concepts as components.

In this sequence, obviously, the final defined concept must be the concept of a program.

Certain conventions are adopted in meta-formula records. For example, the BNF formula defining the concept of "binary digit" is as follows:

<двоичная цифра>::=0|1

Icon "|" is equivalent to the word "or".

In diagrams, arrows indicate the sequence of syntax elements; the symbols that are present in the construction are circled.

BNF describes the concept of "binary code" as a non-empty sequence of binary digits as follows:

<двоичный код>::=<двоичная цифра>|<двоичный

code\u003e<двоичная цифра>

A definition in which a certain concept is defined through itself is called recursive. Recursive definitions are typical for BNF.

Back arrow indicates multiple repetitions. Obviously, the diagram is more illustrative than BNF.

Syntax diagrams were introduced by N. Wirth and used to describe the Pascal language he created.

Semantics - the semantic content of structures, sentences of the language, semantic analysis is a test of the semantic correctness of the structure. For example, if we use a variable in an expression, then it must be defined earlier in the program text, and from this definition its type can be obtained. Based on the type of the variable, we can talk about the admissibility of the operation with this variable. Semantic errors occur when operations, arrays, functions, operators, etc. are used inappropriately.

Any language, including a programming language, obeys a number of rules. They are usually divided into rules that determine the syntax of the language and rules that determine its semantics.

Language syntax - a set of rules that determine admissible constructions (words, sentences) of a language, its shape.

Semantics of the language - a set of rules that determine the meaning of syntactically correct language constructions, its content.

Programming languages \u200b\u200bbelong to the group of formal languages, for which, in contrast to natural languages, syntax and semantics are uniquely defined. The description of the syntax of the language includes the definition of the alphabet and rules for constructing various constructions of the language from the symbols of the alphabet and simpler constructions. For this, they usually use backus-Naur form (BNF) or syntax diagrams... The description of a construction in BNF consists of the symbols of the alphabet of the language, the names of simpler structures and two special characters:

· «::=» - reads like "can be replaced by",

· «|» - reads like "or".

In this case, the symbols of the alphabet of the language, which are often called terminal symbols or terminals, are recorded unchanged. The names of language constructs (nonterminal symbols or nonterminals), which are defined through some other symbols, are enclosed in angle brackets ("< », « >»).

BNF example

Construction building rules<Целое>recorded in

BNF may look like this:

<Целое> ::= <3нак> <Целое без знака> | <Целое без знака>

<Целое без знака> ::= <Целое без знака> <Цифра> | <Цифра>

<Цифра> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

<3нак> ::= + | -

To indicate that the design<Целое без знака> can include an unlimited number of digits, the rule with left-hand recursion is used. Multiple application of this rule allows you to construct an integer with any number of digits.

Syntax diagrams display the rules for constructing structures in a more visual form. On such a diagram, the alphabet symbols are depicted in blocks in oval frames, the names of structures - in rectangular frames, and the rules for constructing structures - in the form of lines with arrows at the ends. Moreover, if the line is included in the block, then the corresponding symbol should be included in the described structure. Forking a line means that there are options when building a structure. In fig. 2.1 is a syntax diagram illustrating the first two rules for describing a structure<Целое>... The diagram shows that an integer can be written with or without a sign and include an arbitrary number of digits.

To describe the syntactic constructions of his language, N. Wirth used syntax diagrams, therefore, in those cases when the verbal description of the syntax of a construction is long and unclear, we will use syntax diagrams.

The Borland Pascal 7.0 programming language alphabet includes:

1. lowercase, uppercase letters of the Latin alphabet (a..z, A..Z) and the underscore (_), which in many cases is also considered a letter ( lowercase and uppercase letters do not differ);

2.numbers (0 ... 9);

3.Special characters consisting of one and two characters:

. ,+ - * / = : < > { } () ^ @ $ #<> <= >= := (* *)

4.Service words (these combinations are considered a single whole and cannot be used in the program in any other capacity): (examples)

Various constructions are built from the symbols of the alphabet in accordance with the rules of syntax. The simplest of these is the construction<Идентификатор>.

This construct is used in many more complex constructs to denote the names of program objects (data fields, procedures, functions, etc.).

At Borland Pascal identifier is a sequence of letters of the Latin alphabet (including the underscore character) and numbers, which necessarily begins with a letter.

The syntax diagram of the identifier is shown in Fig. 2.2. The rest of the constructions will be discussed in the following sections. The semantics of a programming language are embedded in its compiler. Thus, a syntactically correct program written in a programming language, after converting it into a sequence of machine instructions, will provide the computer with the required operations.

Program structure

A Borland Pascal program consists of three parts: a title, a descriptions section, and a statements section.

Program header optional, it consists of a service word program and an identifier - the name of the program.
Description section contains descriptions of all used resource program(data fields, routines, etc.).
Operators section consists in the so-called operator bracketsbegin ... end and ends with a dot. Control statements of the program are written between the operator brackets, which are separated by a special character - a semicolon ";". If semicolon comes before end, then the semicolon is considered to be followed by an "empty" statement.
In the text of the program, comments are possible, which are placed in curly braces.

An example program that implements Euclid's algorithm to determine the greatest common divisor of two natural numbers.

Program example; (program header)

(description section)

Uses crt;

Var a, b: integer; (variable declaration)

(operators section)

Write ("Enter two natural numbers:"); (asking for data entry)

Readln (a, b); (enter values)

while a<>b do (loop-bye a<>b)

if a\u003e b then a: \u003d a-b (if a\u003e b then a: \u003d a-b)

else b: \u003d b-a; (otherwise b: \u003d b-a)

Writeln (‘Hau large common divisor is’, a); (displaying the result)

End. (end of program)

The program is named "example". The description section in this case includes only the description of the variables (see section 2.3). The operators section contains operators for inputting initial data, calculating and outputting results. Let's start our consideration of the peculiarities of programming in the Borland Pascal language with the problem of describing data.