Written Languages of Arts

Alexander Liss


Introduction *

  • A Need *

    A Hope *

    Feast of Senses *

  • Reference Point *

  • Patterns and Constraints *

    Limitation of Perception *

    Unfolding in Time Messages *

    Mapping *

    Palette *

    Local Palette and Map *

  • Tools of Arts *

  • Rhythm *

    Harmony *

  • Riding Stream of Sounds *

  • Music *

    Elements *

    Notation *

    Speech *

    States and Graces *

    Basic States *

    Classification of States *

    Palette *

    Changing the Palette *

    Rhythm *

    Emphasis *

    States of Singing *

    Notation *

    English Phonetic Notation *

    Chain of Graces *

    Mapping *

    Singing *

  • Beyond Music *

  • General Description *

    Moving Color *

    Performance by Computer *

    Palette Management *

  • Appendix I: Russian Phonetics *




    Written forms of presentation of images, ideas, etc. in a particular area of social activity are a sign of maturity of this area. They are important by themselves - they are a part of the wealth of society.

    There are areas, as dance, which are still waiting for their accepted written language. There are areas, as combination of music and color, which are served only partially by written languages, or where existing languages are too different to be used simultaneously for the description of a complex phenomenon. Introduction of written languages in these areas gives an opportunity for fast development and general enrichment of society. For example, writing down the plan of performance in minute details (playwright) enriched the plays, allowed creation of great plays passed through generations.

    Written languages, as musical notation, are developed only when "critical mass" of people desires to preserve the messages of the sort and pass them along.

    Obviously, areas, which allow the direct reading of the written message without intermediary, as literature or even playwrights, have more chances to be developed early.

    However, the spread of particular form of culture is driven by the presence of written language. Hence, the culture and the language grow together. The earlier written language is introduced the faster is development in this area. Hence, in some cases, an introduction of written language can precede the clear demand for it.

    Introduction of computers in our lives affects all spheres of images and ideas creation and communicating. It affects movies, television, landscaping, interior design, creation of music, etc. However, we observe a noticeable lag in the development of written languages, which allow effective use of computers in the creation and communication of images of arts. What we observe is, mostly, an obvious support of existing languages, kind of editors for current notation.

    With computers, we saw already creation of demand by offering some interesting functionality. Now it is time to offer written languages, which support creation of new art forms, especially art forms, which can be performed with the help of computer and communicated across the Internet.


    A Need

    Music has already written language, however it is very inconvenient for the typing, and this limits its use needlessly. It is easy to come up with the variant of the written language of music equivalent to existing one. However we will do better - we expand it to be able to describe tricks of singing and unusual works with sound specific to contemporary music.

    Dance is the area, which needs written language. It will allow the education in the area of new forms of dance, which language can be taught much easier if it can be written down. It will allow the filling of the gap between simple forms of dance we all enjoy and sophisticated dance, available so far only to a few.

    As we organize our living areas, we need a common written language, which allows description of interior decisions and decisions in landscaping. Now we have to draw pictures to explain them. We need something faster to screen variants.

    Animation is playing more and more important role in advertising and in ordinary presentations. This trend is accelerated with the advertising on the Internet. To find a good animation with good rhythm we need to be able to screen cheaply a few variants of it. This should be done with written language.


    A Hope

    Matiss had liberated the line. He found the way to convey the images with lines seemingly independent from figures. We suspect that he was able to tap into deep structures of mind where lines-boundaries are analyzed separately from figures and image of figures emerges from it. It was a long way of study of possibilities of such messages and gradual acceptance of it by the public.

    Kandinsky had liberated the color. Similar to deep basic perception of lines-boundaries independent from figures, color is analyzed by the mind separately from the lines and only after it forms the image of a figure. The same it was a long way of finding the proper balanced presentation of "color only" pictures, which evoke meaningful images. Also, the process went together with discovery and acceptance by the public of this new artistic phenomenon.

    It is only natural to expect, that someone creates the art of moving line and art of moving color liberated from moving figures. However, it is hard to expect that this can be done fast - it took years and considerable talent and courage to achieve previous breaks through, with the static line and color.

    There is a serious obstacle on the way of such achievement. Painters have ready accessible and well-developed tools. There are no such tools to experiment widely with moving line and moving color.

    We hope that the written language, which allows the creation of computer generated "movies" where the creator has considerable control over independent from figure movement of lines and color, will open up a new area of creative development.

    Note that tied to figure movement we know well - movies, animation, etc. We need a different capability - direct control over the line and color. We need accessible tools of such control.


    Feast of Senses

    There is a common underlying structure in the music and in the way we speak, in the dance and in the way we walk, in the architecture and in the way we arrange things in the living room, in the painting and in the few seconds commercial on television. This is the common way we address senses to catch and organize attention; this is the structure, which is a basis for the message we convey.

    If we want to have someone's attention, we have to please the person's senses. From the music we know basic tools of such pleasing - the rhythm in time and the harmony. From the painting, we know the importance of the carefully chosen palette. From the architecture, we know the rhythm in space. These elements are present in any captivating message (image), especially in the art forms. The message (image) is much more than these elements, but they are basic blocks used to construct it.

    Now more than ever we have the expanding ability of creating images with tools affecting many senses. These tools are getting more developed and more affordable every day. If we can judge from history, this will lead to the mass involvement in the creation of images and messages with tools manipulating different media. This in turn will lead to artful organization of designs of interiors, furniture, dress, etc. This will affect the way we speak and gesticulate; this will organize our workplace and everyday life. For example, music already penetrated our lives: it affected the speech - it is more rhythmic and melodic, and it developed our hearing ability - we can analyze sounds better.

    With new tools come new opportunities of creating massages and images aimed to affect many senses simultaneously. However, we have developed presentation only for separate areas: sound (music), color (painting), moving form (dance, movie), etc. Hence, we need a universal presentation, which allows binding of all this forms together.

    Creation of synergy - a combination of elements usually manipulated by different types of art into one coherent art form, is a tough undertaking. There is a little experience in producing it and in perceiving it. The set of languages, which have a common basis and can be used together, can help in this work.


    Reference Point

    We describe here some basic principles common to successfully used written languages. These principles we put as a foundation of new written languages.


    Patterns and Constraints

    Written message is a special hint that governs performance. In the case of English, the written phrase, when it is read, governs the speech. In the case of music, the written musical phrase governs instrumental performance or singing, etc. There are variations of proper performance, but important characteristics stay the same. However, it is possible to have improper performance, which we discard as non-performance.

    It shows that written message by itself, actually, does not define performance sufficiently well. There is something else involved. It is context, circumstances.

    There is another important observation. The performer usually does not have a clear guidance of performance from the written message. The performer always has to derive this guidance from the message and circumstances (which include the possibilities of performance), and sometimes it is not easy. Always this is a kind of "problem solving" activity.

    We describe this phenomenon in an abstract form. The message is a description of some pattern. There are constraints of possible performance (vocal limitations, for example) and constraints of circumstances. The performance is a two step process

    1. Derivation of the clear guidance from the pattern and constraints, which is a problem solving process,
    2. Performing according to this guidance.

    The description of such patterns can differ substantially even for similar languages. Compare for example the English phrase, and Hebrew phrase.

    Usually this combination of a written message and constraints leaves a possibility to derive many different performances. The degree of variability of possible performances determines how restrictive the particular language is.

    Sometimes this variability is counterproductive - many writers of plays have in mind particular actors for particular characters (later the playwright takes the life of its own).

    Some languages, as English for example, serve as tools of coordination of efforts, as tools of consensus building. They have to be restrictive in our definition.

    The languages of Arts leave a lot of freedom to performer, because Arts are used for expansion of our understanding and have to be tuned to circumstances. This places a heavy burden on performer, who suppose to pass the "spirit" of the message, which is not written in it. Hence, the performer has to have some additional special cultural background, which is not required from the message receiver.


    Limitation of Perception

    The way we perceive information is strictly limited. Our mind binds information in units, where elements of information are organized hierarchically. We can juggle simultaneously only very small amount of such units (actually, not more than nine). We manage fine, nevertheless, by binding independent units together in a bigger unit and reducing automatically the number of independent units, which we have to juggle.

    When we work with the message, we have to pay attention to other surrounding information (potentially dangerous); hence, we have even fewer resources left for the message analysis.

    Therefore, the message inevitably has a strict hierarchical structure. On the top level, we have a few of big units (not more than nine!). Each unit can be presented as a complex of units of next level and their interconnections. Again for each unit of top level not more than nine units of the next level are present in it.

    In many cases, units of top level do not share units of the next level, and they are defined uniquely.

    However, sometimes there is a possibility to structure the message in the different ways, and sometimes units of given level share units of lower level. The good "writing style" prevents such ambiguity, but sometimes it is done purposely to create special effects.


    Unfolding in Time Messages

    Unfolding in time messages, as music, have common features.

    There are distinctive characteristics, which allow us to trace the chain of message elements, which belong to one class. For example, sounds of one instrument in orchestra, or sounds of one's voice in the street quarrel.

    Each message element has a beginning and the end - it is span in time. Its duration is its major characteristic. Different elements in the chain can partially overlap. This produces special transitional effects.

    Even if message elements do not overlap, they overlap in our memory - mind automatically compares the element and the previous one.

    An overlap of elements (real one or in the memory) creates special relationship between consecutive elements, similar to one which we observe between concurrent elements. Thus, the harmonic combinations of musical tones are the same in accord and in the consecutive tones.

    Different chains of elements are combined and interplay. Some elements in different chains receive strong connections. These sets of combined chains of elements we call tress.

    Tresses are limited in time - they have the beginning and the end. They are perceived as units of higher level in the hierarchical organization of the message.

    This is a common structure, on which we build our written languages.



    There is a universal method of description of a pattern - mapping. The pattern is described as some finite structure (using some universal presentation) and the appearance of this pattern in the particular area is presented as a correspondence between the elements of this finite structure and elements of this particular area. Thus, the same pattern can be seen in different areas.

    In music, the simple song, which uses only three tones, can be presented with two constructions. One is a pattern, where we operate with three elements - A, B, C, and pause D taken with their duration. The other is mapping of elements A, B, and C into musical tones.

    This tool is especially helpful with the Arts languages, which are not restrictive. Elements A, B, and C from example above can be mapped into musical tones produces by different instruments, and hence, having different "timbre" of the sound. If we play this song with different instruments, we still recognize it as the same song. We can play this song by mapping in some non-musical sounds with a hint of relationship between tones A, B, and C of original performance and we still are able to recognize the song. Hence, this presentation is useful.

    We present the message unit through two independent constructions:

    The potential mapping itself can be restricted. For example, if we look at written English phrase as a guidance for the speech, we see, that potential variants of streams of sounds vary insignificantly. Sound elements are well determined - there is not much freedom in mapping the pattern element presented in the written phrase to sound element chosen by the speaker.

    Mapping gives us an additional tool of message (pattern) description.



    Only relatively low level elements of message are subject of mapping. The higher level units are perceived on abstract level independent from particular mapping.

    All potential elements, which are subject of mapping, are a palette of particular form of communication.

    As we had mentioned above, English has very restricted mapping and it has a big palette.

    If the musician allows oneself free experimentation with the forms of used sound, then the mapping has fewer limitations. Also usually, the palette is small in this case.

    This is not accidental.

    The palette has to be kept in mind all at once. If mapping is not restrictive, then elements of the palette have week interrelations (we have to be ready to unusual, unexpected new relations between elements of real word where the elements of message are projected). If elements of palette are independent, we can keep in mind only a few of them simultaneously. This does not bother an artist, whose main work is in special mapping, for example creation unusual special sounds.

    Restrictive mapping allows accumulation of relationships between elements of palette. Mostly these are associative relationships, but they bind the palette together, they allow our mind better manipulation with it. They allow perception of it as one unit of information (or a small number of units of information).

    Musical culture through generations developed its extensive palette, which can be used to construct sophisticated messages. Here the artist finds the area of expression not in special sound effects, but in huge variety of potential constructions based on rigid mapping and extensive palette.

    Such rigid mapping is not unique. Usually there are a few different mappings in use. For example, we have a culture of Western music and Chinese music. Both are well developed and they are different.

    Often in Western art forms, the artist presents a palette and a mapping first and builds the structure using them as foundation. In some other forms, a sophisticated palette and rigid mapping assumed to be known to receiver and the artist takes off from that point. Some researches claim that classical Chinese art follows this second approach.


    Local Palette and Map

    The particular art message often uses only a part of the available palette and definitely some map. It is customary to arrange a kind of introduction of this basis of communication. In music - the beginning of the message or a series of messages is actually dedicated to such introduction.

    Sometimes similar approach is applied to distinctive parts of the message - for each part there is its local palette and local map, which are introduced in the beginning of this part.


    Tools of Arts



    Rhythm is an important part of the artful message. We need a general description of this phenomenon.

    In a broad sense, rhythm is a repetition of the pattern. Repetitive pattern is in the very essence of our learning, hence rhythms are important to us, and they are captivating. Rhythms are the main structure of art forms. Other structures are built on the rhythmic structure. As long we want an artificial object to be pleasing, and hence more acceptable, we built into it some rhythmic structure. Rhythms of windows on the building, rhythm of buildings on the street, rhythm of changing traffic lights etc. produce the rhythmic structure of the city. Rhythm of chapters, paragraphs, sentences and parts of sentences produce the rhythmic structure of the publication, especially a publication, which is an art form, a literature.

    The messages, which unfold in time - speech, music, movies, etc., rely on the rhythm as structure, which assures the integrity of the message. These forms need the freedom of incorporating rhythm into the message. They need variability of message, which does not affect the "semantic" structure of the message and can be used to create the rhythm.

    This variability we can find in the relationship between parameters of message elements, as their duration in time, and placement of accents.

    In music, periods of elements are coordinated - there is one period, that all periods of elements cam be presented as its multiples. In many buildings, sizes of windows are equal and distances between them are equal, etc. This provides a basic rhythmic structure.

    Generally, this structure can be presented in a simple form.


    p1, p2, ...

    are values of given parameter of elements, then there are some natural numbers

    n1, n2, ...


    p0 = p1/n1 = p2/n2 = ... .

    Usually, for a rhythm to be recognizable, numbers

    n1, n2, ...

    should be small.

    Different parameters can be used to create different coexisting rhythms in a message, as a height and width of window.

    Accents are built into the message by alteration some of the message parameters.

    Only alteration of the special nature is used to build accents. We will call them strongly harmonic changes.

    For the sound, this strongly harmonic change is a change in the sound loudness without changing its tone. Hence, accented message elements are louder than the rest, or quieter than the rest. Pauses can be used as accented message elements.

    For the color it could be the change of the color intensity or it can be change of color saturation (as when we add white paint to the chromatic paint), or both.

    The important part is - there are types of changes of the parameters of message elements, which we can use to build accents.

    Accents themselves can be of different type, as different tone of drams, or different architectural elements. The particular rhythm is associated with the set of accents of the same type. Different sets of accents of the same type form different rhythmic groups.

    We want to formulate some special property, which allows us the recognition of the particular set of accents as a rhythmic group. This should be defined in general terms applicable to the temporal rhythms - rhythms of the messages, which unfold in time as narration, music, dance, movie, etc. as well as to the rhythms of messages in space, which we observe in pictures, architecture, dance, etc.

    We define a periodic set. It needs a basic figure and basic movement. The elements of periodic set can be generated from basic figure with basic movement.

    In the rhythmic group of unfolding in time messages, the basic figure is the period between two accented elements and the basic movement is its translation.

    In the rhythmic message on the plane or in space, we look at the placement of accent points. We find a basic figure with its accent points (as triangle and its corners) and a simple set of basic movements (as attaching similar triangle to existing one), that we can get any accent point using only these tools. If this exists, we recognize the rhythm.

    This is a nature of human perception of the rhythm that each periodic set has to have at least four elements to be perceived as rhythm (and not as a set of unrelated objects or events).

    In the rhythmic unfolding in time messages, we measure the period from the beginning of one accented element to the other. We get a set of numbers

    p1, p2, ...

    and if these numbers satisfy the requirement, which we described above, then we have a rhythm.

    Types of accents can interrelate, as when the tones of drams are harmonically related, or height of windows of the building are exactly two times bigger than width. This creates an internal structure of the rhythm.

    In Arts, parameters of the rhythm have a point of reference - human body. Time periods are compared to the rhythm of human heard or human breathing, space periods are compared to the size of human body and sizes of its limbs. Rhythms of our body create a context for the rhythms of the message. Thus, we have perception about fast and slow rhythm, and we have main units of rhythm. In artful messages, rhythmic units form a rhythm with main units. If the rhythmic unit is u and main unit is m, then exist small natural numbers nu and nm that

    u/nu = m/nm .

    Now we can write down a rhythmic structure of a message.

    1. We separate rhythms of parameters of message elements and rhythms of accents.
    2. We separate each rhythmic group, usually each group can be distinguished easily - it is designed to be recognizable.
    3. In each rhythmic group, we define the parameter, which we have to measure, and the rhythmic unit. A unit can be chosen in many ways - a few times bigger or smaller, they all are equivalent. Values of parameters which we measure for this group, should be multiples or divisors of this unit.
    4. We present the relationship between rhythmic units and main units.

    For example, in waltz (3:4 rhythm),



    Harmony is an important relation between elements of a palette. Message elements inevitably have it also.

    Harmony creates a structure in the palette and this allows the palette to be big - we can keep it in the focus of our attention all at once.

    We want to describe harmony generally, for different types of palette.

    We are familiar with the harmony in music - the basic relation between our perception of harmony in sound and physical parameters of sound were discovered two millennia ago. Around the same time was discovered basic relation between our perception of harmony of points on the plain and distances between these points. We have some understanding of the harmony of colors.

    Harmony is a kind of relation between objects, which can be easily recognized by our minds, hence is serves as a basis for the analysis of more complicated relations.

    Musical tone is a sound (a vibration of air), which has a leading frequency.

    Two tones are harmonious if there is the third one, which frequency can be divided by frequencies of each and the quotients are small. This is if there is a main vibration with this slow third frequency, and our two vibrations are its "overtones". We perceive as harmonious tones, which can be presented as different tones generated by vibrating body.

    Harmonious pairs of distances A and B are ones with special ratios

    A : B = 1


    A : B = (A+B) : A .

    We estimate the distance A relative to distance B with the ratio of times it takes the eye to scan each distance. This gives an explanation why these harmonious ratios are easy to spot and use as a basis for further estimates.

    Harmony of colors is a thorny issue. We bring here our definition and explanation.

    First, there is a general "neutral" color - white-gray-black, depending how intense or dark it is. We will call it white/black. This color goes together with any chromatic color.

    Second, there is a natural ability to mix colors in our sensory apparatus and our minds: when we look at one color and immediately after we look at the other color, we perceive the third color - their mixture. (The traditional experiment is to look for a long time at the color and switch to the white wall: white wall will not look white. The mechanics of this phenomenon are not important here.)

    When there is a boundary between colors, we scan it as we scan any distinctive object in our view. While we scan it, we see three colors - two original colors and one, which is their mixture.

    We insist that colors are harmonious when this color-mixture is white/black. The same reasoning can be applied to three or more colors - they are harmonious if their mixture is white/black.

    We are very forgiving in the definition of harmonious combinations because our perception is imprecise.

    In addition, there is a wide area of application of combinations, which we perceive as non-harmonious, but close to harmonious, as we know from music. Our mind supposedly analyses them in two steps: first it finds the closest harmonious combination of elements, and second it analyzes small deviation of the original combination from harmonious one. If the nature of this deviation is simple, the mental analysis is simple also and the entire combination well structured and can be kept in focus.

    This construction sometimes has a curious effect. The same palette sometimes can have a few different presentations with this structure of harmonious set and its corrections. Hence, we feel this palette structurally related to a few very different palettes.


    Riding Stream of Sounds

    Before we move to unfamiliar area, we universally describe patterns of messages riding the streams of sounds - music, speech and singing.





    Music consists of chains of musical elements. Here we uniformly describe these elements and ways of their chaining. We base our description on the basic musical concepts, as:

    Timbre of the sound is a parameter of the music element. Usually it is not described through physical parameters but through classes - as sound of piano, for example. However, we can have other specifications as well. The musical accord (with harmonious tones) has a leading tone and often can be interpreted as a tone with a special timbre.

    A set of characteristics of musical accord (which can consist only of one tone) we call state. We add to such defined states a basic state - silence.

    We define a hold as a sound of given state and duration. A hold related to the basic state is pause.

    We define a swing as a movement of sound from one state to the other of given duration. Its pitch, loudness and timbre can change. Swings are widely used in singing and speech.

    While a swing is an element, which is relatively easy to produce, a hold is a rarity in a natural world, and it captures attention by itself. Holds rarely can be produced in their pure form, they have a preceding swing - a head, from silence (basic state) to the given loudness, and a trailing swing - a tail, from the given loudness back to silence (basic state). The head and the tail have the same pitch (or set of pitches) and the same timbre as their hold.

    The combination head-hold-tail we call a leap. We combine leaps into the staccato sequence, where holds do not overlap, however, the tail of the previous leap can overlap with the head of the next leap, and this produces specific motion of the sound.

    A few leaps of the same duration can form a leap-accord.

    Special swing, which timbre stays the same and a definite hint of its beginning and ending state, can be used as an independent musical element; we call it the grace. In its movement, it departs from the beginning state with zero speed and arrives to the other state with zero speed. The change accelerates in the beginning and decelerates in the end. In between, the speed of change gives no special hints.

    Note that head and tail, while they are not separate music elements, have the dynamic of change similar to a grace - zero speed of change in the beginning and in the end.

    A grace can connect two states of the same type. It is similar to a hold, only it appears in a different context - in a chain of graces.

    It is possible to combine a few graces of the same duration into a grace-accord.

    It is possible to combine a grace-accord with a leap-accord.

    A sequence of musical elements is a chain.

    A set of simultaneously performed interrelated chains is a tress.



    We can describe a set of states used in particular piece of music through layers of indirection:

    To make the description complete we need to combine abstractly named elements {a,b,c,d,e} with description of duration.

    In music, we describe duration as a ratio to some rhythm unit.

    For example, the chain can be described like this:

    a(1), b(2), a(1), b(1), b(1), a(1), b(1), b(1), ...,


    a 1, b 2, a 1, b 1, b 1, a 1, b 1, b 1, ...

    where numbers describe how many rhythmic units is the duration of an element.

    We have to add the description of the relation of the rhythmic unit to some main rhythmic unit, as heartbeat. In this case, we specify that in one main unit there are four units of this rhythm.

    We add accents ` to this description

    `a 1, b 2, `a 1, b 1, b 1, `a 1, b 1, b 1, ...

    Accents create their rhythm - three units. We have a familiar 3:4 rhythm.

    If there is other rhythm in the chain, its accents we mark with `` and so on. For example

    `a 1, ``b 2, `a 1, b 1, b 1, `a 1, b 1/2, ``c 1, b 1/2

    As shorthand, we drop number "one" from the description:

    `a, ``b 2, `a, b, b, `a, b 1/2, ``c, b 1/2.

    Chain of graces is defined by states and periods it takes for the sound to move from one state to the other. We distinguish such chain with "{ }":

    {a, 2, b, 1, a, `1, b}.

    An accented grace we mark similar to a leap with ` or `` and so on. Again, as shorthand, we can drop number "one" from the description:

    {a, 2, b, a, `, b}.

    The chain can include leaps and graces:

    `a 1, ``b 2, `a 1, ``{b, 1, c, 1, b},

    or in short:

    `a, ``b 2, `a, ``{b, c, b}.


    We describe a tress as a set of parallel lines of description of its chains. We separate tresses with the lines of comments (which can be empty), which start with the sign "//" as in this tress of two chains:


    a 1, ``b 2, a 1, ``{b, 1, c, 1, b}, ...

    e 1, d 2, e 1, d 2, ...


    Synchronization of chains in a tress is achieved automatically, but to simplify reading we add visual synchronization markers - a few symbols (a word or a number) between two symbols "/*" and "*/":


    /*0*/ a 1, ``b 2, /*1*/ a 1, ``{b, 1, c, 1, b}, /*2*/...

    /*0*/ `e 1, d 2, /*1*/ `e 1, d 2, /*2*/ ...


    In a tress, all chains share the same palette, map and abstract names of elements. The description of each tress has to be preceded with the general description of this structure for its chains. As shorthand, this description can use references to some specially organized description.

    This is a notation, which is simple to type and easy to play.

    This description illuminates an internal structure of the musical message. It can be easily translated into traditional notation, where duration of an element is presented in terms of main unit, and elements are tied to the palette.



    We posses a speech producing apparatus in the form of lips, tongue, nose, throat and vocal cord, and a fine control system of this apparatus, and we skillfully use it. The air stream finely pushed by the muscles of the chest and diaphragm is shaped by the speech producing apparatus and generates a stream of sounds with distinctive characteristics. These sounds are used as message elements.

    Speech is a rhythmic stream of distinctive sounds.

    States and Graces

    We offer here the new phonetic notation. It is based on the concepts of state and swing, which we used above to describe music. It is different from traditional phonetic notation and more powerful than traditional notation - it allows a close reproduction of the sound of speech, while traditional notation does not allow it. Also, it is easy to type, which is important, when such notation is used as a communication tool - it uses only symbols of a computer keyboard.

    We present speech as a chain of special graces where the sound moves from one state to the other. The set of states is relatively small and set of possible graces is defined by it. The number of elements of the set of graces is about a square of number of elements of the set of states (all possible pairs).

    For each particular language, we specify states and graces, which are actually used.

    Unlike in music, where the tone is a main characteristic of a state, in speech the specific characteristics, which we call timbre in music, are main characteristics of a state.

    These characteristics are difficult to describe, but we do not need to. We only establish classes-states and leave the particular performance of each state to performers. This is similar to typing. We type characters, but their actual visual appearance is defined by the chosen palette of fonts. Here, we type our special symbols, but their actual sound appearance is defined by the particular palette of graces, which is different for each individual.

    Basic States

    We start with the basic states. Those are states where there is no meaningful sound, only some unrelated to a message noises. Some of basic states block briefly the stream of air, others do just an opposite - they allow free flow of air. The period between two basic states can be used by the speaker for "technical" activities, as exhaling the rest of air in lungs and inhaling a new portion of air.

    Classification of States

    There are a few general factors, which can be used for the classification of states.

    There are parts of the stream of air and sound, which can be controlled by speech producing apparatus more than other parts. These parts are vocalized and we can control their timbre, pitch, loudness and duration.

    Traditional phonetic often describes the combination of such state and preceding and following graces as vowel.

    There are other parts, which have sound produced by the vocal cord or by the movement of the air through narrow passages, but we cannot control their loudness. Sound "m" is an example.

    There are purely interruptions of the stream of air of a particular type (depends on which organs we use for that).

    Some states are distinguished by the special vibration of the sound (this is not used in English).

    The distinctive characteristics of the sound of speech element are produced by special positioning and movement of


    The parameters of positioning of these instruments of speech producing apparatus vary from individual to individual. There is only limited amount of combinations, which can be easily distinguished in spite of such variations. These combinations are specific to the language; they form a palette of a language.

    In speech, the pitch is utilized in a particular way. If there is a part of the stream, where the tone can be controlled, then the tone either set to one of a few chosen tones (they are a part of a palette), or swings from one of these set tones to the other. Hence, some states can differ only by the tone.

    Different languages use tones differently. English language uses two tones. Russian language uses only one tone.

    Changing the Palette

    The same speaker during the same message often uses a few different palettes.

    As in the musical message, a tone part of a palette can be changed, when we move to the other "module" of the speech. Sometimes it is done to differentiate these "modules".

    When the speech goes through a part of a phrase, which does not carry substantial amount of new information, the "fuzzy" palette can be used, which is easier to produce, but its elements are not distinguished well enough. When the part of the phrase, which conveys a new thought has to be spoken, then a speaker uses the other palette, where all elements are distinctive.

    In the languages with high level of redundancy (as Russian) similar switch of the palette happens in the middle of the word - the ending of the word is produced with "fuzzy" palette.


    A rhythm of a message is formed differently in different languages. There are a few tools, which can be used in speech to produce a rhythm.

    Vowels are clearly distinctive in the speech. They can be a basis of a rhythm - a rhythm of syllables. Russian uses it as a basic rhythmic structure of a speech.

    Emphasis (accent) is clearly a candidate for carrying a rhythm, as it is done in Western music. In this case, the duration of some elements has to be adjustable, that accented elements can form a rhythm. English uses this mechanism to create a basic rhythmic structure of a message.

    Contemporary speech is fast and its rhythm is dictated by the heartbeat. We have reason to believe that years ago speech was much slower and it was governed by the rhythm of breathing. The air intake in a contemporary speech is coordinated with the semantic pauses, in an old speech the pauses of an air intake were a part of the organization of speech.

    In an artful speech, the rhythm of speech is coordinated with both the heartbeat and the breathing. Because this is a highly artificial construction, it makes a strong impression on listeners.

    It is interesting, that this form of speech produces a strong impression on the speaker as well.

    Coordination of the heartbeat and breathing brings the body and the mind in an unusual state. It is used consistently in yoga, poetic reading, singing and prayer. Often players of wind musical instruments achieve it also.


    One mechanism of emphasis is based on loudness.

    There is a fixed level of loudness in the "module" of speech. (Obviously, it only can be observed in vowels, which loudness can be controlled.) Some parts are louder than this level - they are accented.

    In English speech, there is an additional emphasis mechanism. If there are two states, which differ only in their tone, and the grace moves from one such state to the other, then it is an emphasis.

    States of Singing

    Singing extends the set of tones used in the speech messages - it extends the set of states. Many states of the speech give rise to a set of states of singing. In singing, the movement of the pitch in the set of selected tones, duration of graces, and the rhythm are governed by the music only and not by semantics.


    Different languages require different notation. Even when elements of their palette are similar, the internal associations and logical relationships between palette elements are often different.

    There is one common feature in our phonetic description. We present a message as a chain of graces, and present it in writing as a chain of states. We use special symbols to denote different accents.

    For each language, we present a palette of states specific to this language. We denote these states with symbols available on a computer keyboard - Latin letters and combinations of a special symbol and a letter.


    English Phonetic Notation


    Chain of Graces

    The traditional phonetic notation uses Latin alphabet to describe something similar to what we call states. This part of notation is convenient for typing and we will use it. Instead of special symbols used in traditional notation, we use pairs of symbols - a Latin letter with preceding symbol from a small set:

    ~ # < > _

    States which differ only in the tone we differentiate with "_":

    a _a

    Note that we have to differentiate even states, which interrupt the stream of the air as state p, because the sound preceding or following this interrupt has a tone (one or the other).

    Emphasis by loudness affects the sound preceding the state and the sound following the state. We denote this fact by marking this state with the symbol "`" (primary emphasis) or symbol "``" (secondary emphasis):

    `a ``a

    Emphasis by the movement from one tone to the other goes on two swings one preceding a state and the other following it. A special class of states can carry such emphasis. The combination swing-state-swing is called vowel for the state from this class. In the case we have such emphasis, we mark corresponding state with "^":


    Basic state we denote with the "space".

    We separate rhythmic groups of graces with ".", as:

    b_ob .g`iv mi.

    Actually, state preceding the dot is a boundary of a rhythmic unit.

    In the example above, we have a few graces:

    first group:

    1. b_o with the change of tone
    2. _ob with the change of tone back
    3. pause

    second group:

    1. gi
    2. iv
    3. pause
    4. mi

    The other example:

    b`ob ._w`_o_r_k.


    b^_ob ._w`_o_r_k.

    the second rhythmic group has tone different from the first group.

    It can be

    b^_o_b ._w`_o_r_k.

    where the change of tone happens on the first vowel.


    Now we proceed with the description of states, without their variation with the tone and emphasis.


    & - Shewa, a hint of a vowel

    Interrupts (a grace, which starts and ends with an interrupt is very short), 4 pairs:

    Without vocal support:

    k - tongue back

    t - tongue front

    f - lips-teeth

    p - lips

    With vocal support:

    g - tongue back

    d - tongue front

    v - lips-teeth

    b - lips

    Semi-interrupts, 4 pairs and two special:

    Without vocal support:

    #s - as in lunch, tongue

    ~s - as in shore, tongue

    s - tongue, front

    *s - as in three, tongue-teeth

    h - throat

    With vocal support:

    #z - as in judge, tongue

    ~z - as in leisure, tongue

    z - tongue, front

    *z - as in this, tongue-teeth

    j - tongue, middle


    Vocal semi-interrupts (often stretched in time), six:

    ~n - as in morning, back of tongue - nose

    n - front of tongue - nose

    l - tongue front

    r - tongue front

    m - lips - nose

    w - lips


    States, which create vowels, 11 states:

    a - unrestricted

    e - throat

    o - tongue's back with help of lips

    u - tongue's back with help of lips

    i - tongue, middle

    >a - a constrained by the tongue's back

    >e - e constrained by the tongue's back and front

    >o - o constrained by the tongue's back

    >u - u constrained by the tongue's back

    >i - constrained by the middle of tongue

    <a - wider opening

    With differentiation with a tone - 70 states, state &, and a basic state. This produces a palette of about 5,000 graces.



    In our notation, we had omitted description of duration of graces. We did it because it can be computed.

    In the artful Western speech, rhythmic blocks are about the same duration, and this duration is related to heartbeat.

    The distribution of duration inside the rhythmic block is governed by rules. These rules vary from individual to individual. If we want to reproduce a particular style of speech, we describe these rules (as a special "map").

    There are some common features in these rules.

    The grace connecting two interrupters has minimal duration.

    In the distribution of duration between block elements

    The differentiation of graces, which differ only in how the tone is managed, can be described separately also. We have four variants of such graces:

    ab, _a_b, a_b, _ab.

    Now we are left with mapping of all graces, which start with one basic state and end with the other (or the same). States ' and & can form only limited combinations with other states, but the other 36 states (including pause) can appear in about any combination. This leaves us with about 1,300 basic variants, which have to be mapped.

    The basic palette of English speech (all basic variants of graces) includes about 5,000 elements.


    We modify our phonetic notation to make it applicable to description of singing.

    In singing we use more than two tones, hence we do not use the symbol "_" to differentiate between two states, which differ in a tone, but instead we use tone specification for each state in "( )", as


    We specify the tone according to chosen notation (preferably our musical notation).

    We need to specify duration of graces, and we do it as follows, between commas:

    (b)b,1,(c)o,1,(b)l,1,(b)l, ...

    We do not specify speech rhythmic groups in singing, because we specify the duration of graces explicitly.


    Beyond Music

    General Description

    We derive from our observation of music and speech some general description of the structure of the message. We present it here.

    Concept of a state is very convenient for the general description of message elements.

    States are distinctive highly artificial patterns, which are used to create message elements.

    States are combined into groups. Inside one group, they are distinguished with parameters. This gives us an ability to produce states consistently.

    Some states are basic states - states from which message emerges.

    All possible states form State Space. State Space can have a complex structure - it can consist of a few different groups of states.

    A hold is a performance closely related to state, it is highly artificial and brings recognition of the unique state, which it represents.

    A swing is a performance-movement from one state to the other. It can serve as a connector or as a message element.

    In dance, a hold is a particular artificial position or movement (as rotation).

    In landscaping, a hold is an artificial flat horizontal area. Landscaped overpass is

    Some messages unfold in time: their elements have a beginning and an end and have a period of existence - duration. Other messages are presented on the surface or in space: their elements have boundaries, which confine them to some limited area on the surface or volume in space.

    Inevitably, elements have parameters - characteristics of time, surface, space, etc., as duration for elements unfolding in time or characteristic of size and shape for surface and space elements. These parameters are natural candidates for the organization by the rhythm or geometric harmony.

    Usually, a hold is supported by one or a few swings. If these swings connect it to some basic hold, then we call this combination a leap.

    A special swing, which is restricted enough to be easily recognizable and be used as message element, we call a grace.

    Unfolding in time grace carries a hint of two states. Speed of change in the grace is zero in the beginning (a hint of a first state), fast in between and deceleration to zero in the end (a hint of a second state). A special type of grace is similar to a hold.

    We define a set of elements, as a collection of elements, which do not intersect. In the case of the unfolding in time message, set is a chain. A set can consist of one element.

    A tress is a distinctive combination of overlapping sets. In music, it has a beginning and an end. In landscaping, it is restricted in area. In animation, it is both restricted in time and space.


    Moving Color

    Here we concentrate on the designing of the language, which can be used to describe the artful message brought by means of moving color on rectangular screen.

    A state in this case is a particular color; the corresponding hold is a small area on the screen, which has this color during some period. This is similar to music where a state is an accord and a hold is that accord kept during some period.

    We have to arrive to this hold; hence, we need a basic state. To create it we dedicate the area near the border of the screen to some color, which does not change during the performance of one performance-part. Now, the basic color (basic state) is a color strongly related to the color of this border.

    When the small area on the screen, which corresponds to the hold, changes from the base color to the color of this hold, when it stays for awhile like that, and when it changes back to the basic color.

    We have to connect this hold to other holds on the screen. Hence, we create a gradual change of intensity of the color, which we have in the hold's small area, to the border.

    We created a leap. It has a hold - a definite color in a dedicated area during given period and supportive swings. We have three swings here. One during the hold's period, when color on screen does nor change with time, but fades away as we move from hold's area. The other is in the beginning, when the color "grows" on entire screen from the base color to the distribution, which we had described above (a head). The third is opposite to it, when the color on the screen diminishes from this distribution back to the base color (a tail).

    If we have a few leaps on the same screen, their colors mix up. This produces a special mixture of colors between holds. This mixture can be unpleasant, if colors of different holds on screen are not harmonious.

    The color should fade away fast enough when we move away from the hold's small area, to allow the leap to be distinctive.

    We could define a few types of graces. In the simplest grace, the rule of geometric distribution of color stays the same and intensity of color stays the same, only quality of color changes according to rules of grace.

    A palette is a set of pairs - colors and points on screen. These are points where we place holds of the message and their colors. Colors have to be harmonious and points have to be in harmonious relationship between them and with the border.


    Performance by Computer

    Our presentation is ready for the performance by computer. This is convenient. One can write music and pass it with e-mail. One can write complete verbal presentation in phonetic notation and let computer read it. The listener can adjust the loudness, the speed, the voice, which reads it, etc. Similar with the moving color. This can be a part of captivating web commercial.

    We have written languages and notation, which allow simultaneous description for sound and display. We have a chance to develop a culture of synergetic art forms, which performance is driven by computer.

    Here we present an organization of a player of the message in our notation. It is simple, and it does not require big resources.

    Such player needs a long-term storage (disk, flash memory) of the information related to a palette and mapping. This information does not change.

    The local palette and mapping are derived from the message and stored, preferably, in fast memory (RAM).

    The actual device, which produces sound and visual images, needs input written on its specific "language". Hence, when we receive a message in our notation, we translate it into the "language" of the device and store it. If this information is big, it can be stored in slow memory.

    Now we can play this stored information.

    Often it is possible to overlay two processes - translation and playing, if translation has enough head start.


    Palette Management

    We designed the palette that we can chain its elements into a message. We have restrictions on palette elements. The end of one element can be combined with the beginning of the other - they have the same loudness, intensity, etc. and the speed of change is zero in the beginning and in the end of palette element.

    While the entire palette available for message creation can be huge, the quantity of palette elements used in one semantic unit of a message is manageable, as we had shown above. Hence, it is reasonable to supply a new small palette, when we move from one unit to the other, which uses a different palette. The set of such palettes used in a message is manageable. It can be defined in the beginning of the message (or its part) and each time we change the palette we should supply the reference to a proper element of this set.

    We have an effective method of supplying of such sets of palettes if we have a method of their generation according to a set of parameters.

    For example, in music, we can generate a palette of leaps and graces, which corresponds to given musical instrument and given style of performance, which includes specified musical tones. In speech, we can generate a palette of graces, which corresponds to a particular speaker, given level of loudness, given speed of pronunciation, and given basic tones.

    Palette generation can be a time consuming process, hence it has to be done in advance, before performance of a message. If there is a palette generation method available, it is enough to pass along with the message only value of parameters of this method, to specify performance precisely.


    Appendix I: Russian Phonetics

    Unlike English speech, which uses two tones, Russian speech uses only one, however emphasis can produce change of tone.

    Unlike in English speech, the basic rhythm of Russian speech is governed by syllables, defined by vowels.

    Russian emphasis is semantic - it marks words.

    Russian words are long, they have a clear structure - prefixes-root-suffixes-ending with emphasis in the root, and level of redundancy in Russian speech is substantially higher, than in English speech. (Level of redundancy shows how many speech elements need to be lost before we loose the meaning.) This causes an interesting effect - different mapping is applied to different parts of the word: more distinctive mapping is applied to the root and less distinctive to ending and suffixes.

    Instead of special symbols used in traditional notation, we use pairs of symbols - a Latin letter with preceding symbol from the small set:

    ~ # * > /

    Emphasis we denote with "`" as:


    There is a reinforced semantic emphasis, which we denote with "^" as:


    Consonant forming states of Russian speech come in pairs. The second state is constrained by the tongue (this type of states is not present in English speech). We differentiate them with "~" as:

    k - basic and /k - constrained.

    Because of this convention, sometimes we have to use three symbols to describe a state - two special symbols and a letter.

    Now we describe states, using Latin letters.

    Shewa "&" is used in Russian speech, as in English speech.

    Interrupts, four pairs of pairs:

    Without vocal support:

    k, /k - tongue back

    t, /t - tongue front (slightly different from English)

    f, /f - lips-teeth

    p, /p - lips

    With vocal support:

    g, /g - tongue back

    d, /d - tongue front (slightly different from English)

    v, /v - lips-teeth

    b, /b - lips



    Without vocal support, six pairs:

    q, /q - tongue, back (similar to English h)

    *s, /*s - as in Russian word "/*suka" (pike)

    #s, /#s - as in lunch, tongue

    ~s, /~s - as in shore, tongue

    s, /s - tongue, front

    x, /x - tongue, front as in Russian word "xyp/lonok" (chicken)


    With vocal support, two pairs and special:

    ~z, /~z - as in leisure, tongue

    z, /z - tongue, front

    j - tongue, middle


    Vocal semi-interrupts, four pairs:


    r, /r - tongue (jumps, different from English)

    l, /l - tongue (slightly different from English)

    n, /n - tongue-nose

    m, /m - lips-nose


    States, which create vowels, six:

    e - throat

    o - lips (different from English)

    u - lips (different from English)

    i - tongue, middle (similar to English ~i)

    y - constrained by the tongue's back (similar to English i)

    a - unrestricted

    All together - 49 states (with a basic state), which produces about 2,400 graces of a palette.

    Following states of English speech are not present in Russian speech (in the notation for English speech)

    *s, *z, h, i, r, o, u, >a, >o, >e, >u, <a

    they are difficult for the person, which native language is Russian, and who learns to speak English.