Marker is a bridge between behavior and the real reward
To have an impact on the behavior, the consequences must be immediate - otherwise it is impossible to associate them with behavior. People can explain distant consequences of their actions using speech, with the animals the only possibility is the immediacy of reinforcement or punishment.
For simple, slow behaviors such possibility exists - we can provide reinforcement (eg food), while a dog sits down, horse bows etc. However, many behaviors cannot be reinforced immediately (ie, in the course of happening), because it is technically impossible - they are either too fast (eg, pecking chicken), or do not provide a way to give rewards (eg jumping horse).
We are helped by a marker - a neutral stimulus that we can deliver at any time (whistling, clicking, flash lights, etc.) and which have been previously strongly associated with one of the reinforcements. Using marker you can precisely convey information about what behavior is reinforced, while creating a bridge between the behavior of a real reward (which may be provided later).
In fact, it does not matter whether we use audio, optical or touch marker. Some are obviously better than others in a particular case (eg clicker as a marker for a deaf dog or a fish does not make sense.) If we can use the audio marker (clicker, whistle), it is more convenient mainly due to the fact that we do not need to have visual contact with the trained animal. The most commonly used marker for the sake of convenience of use and simple design is the clicker, accessible in thousands of varieties - from the simplest mechanical box to electronic ones with a choice of several types of sounds.
Theoretically a word such as "good" can also become a marker. Studies show, however, that it is a very imperfect marker for two reasons. First, it is much less precise - try to say "good!" exactly when the chicken's beak hits the table. Second - for the marker to become a conditioned stimulus, which processing does not involve the cerebral cortex (and is therefore faster), it must always be identical and neutral. Human speech is neither (by definition, is not neutral, because it is used to communicate information), and is always subject to interpretation by the cerebral cortex. Although such marker works, the results achieved with it are significantly worse.
