Tool for making captions for audio stream

A new tool for extracting text from audio stream has been released. Audio stream can be captured either from a microphone or from the speakers by using a stereo mixer of a virtual cable.

Most of the settings are clear enough.
settings of the tool
The setting Length of phrase buffer limits the maximum length of the chunk of recognition audio and in most cases can be set to 300. The Noise protection setting prevents jam speech recognition for noisy audio. It must be set to disabled while using microphone.

If you can not find your language in the drop down list Sign up and add the desired speech input language in the User account.

Speech input errors

Google speech engine errors

The voice notebook uses Google’s speech recognition engine, so the errors that are displayed at the field Confidence level, come from Google.

The most frequent errors: blocked, no speech, network error, audio capture error, aborted.

Error blocked will appear, if the user press block button in his first visit the site. Or if the microphone is simply out of order.
blocl to use microphone

If you press block button by mistake, go to upper left corner of the browser and click to the camera icon.
allowing to use microphone

Error no speech occurs when for some reason there is no signal from the microphone. In this case it is recommended to check if the microphone is turned on and if the signal level is sufficient. Sometimes this error is caused by a long silence. Sometimes the microphone is not connected to the browser. To check the microphone connected to the browser, go to chrome://settings/content and scroll through the window to the microphone setting.
chrome mike setting

Error network means that there is no Internet connection with the Google’s servers, so it isn’t the possibility of transferring the sound to the Google’s servers and getting the text back. Sometimes, this error also may be caused by the accumulation of the text in the preview buffer (probably, in this case too much data is transferred through the network). The accumulation in the buffer can be caused slurred speech or using a virtual audio cable (when transcribed audio). To control buffer overflow, it is necessary either to improve diction, or reduce the preview buffer size.

Error audio capture and Error aborted means that the Chrome speech recognition engine can not process your voice. This may be due to the fact that it is already processing someone request (voice), for example in another window. In this case, the Speech Pad window will blink. Closing the second working window will help.

Other errors

Delay of transferring text from the preview field to the output field is more than 2-3 seconds. Such delay may be caused by wrong microphone settings, for example, the recording level is very low. You can make sound indicator visible in UI setting page and check microphone level by this indicator. Also you must uncheck Noise Suppression checkbox, if this one is checked in the microphone properties.
Noise Suppression

Linux integration – direct voice input in Ubuntu and others Linux

What is Linux integration

This post about Linux system, If you interested also in Windows integration see this article.

Linux integration allows voice typing directly to Linux application.


1. Install Google Chrome or Chromium browser.

2. Install the voice notebook extension from the Chrome webstore.

3. Download the Linux integration module suitable to your Linux: module for 32 bit Linux from 07.11.2016,module for 64 bit Linux from 07.11.2016. Unzip to a folder, check the executable permissions of the and run this script.

4. Register in and login to the site.
Login to site

5. Go to user account (the link will appear) and press the Try it! button.

6. Go again to, Check the OS integration checkbox and select your language from drop-down list, then press the Start recording button.

7. Go into Gedit or another application and start your dictation.

8. If you like and want to continue using integration after your free trial, then make an order!.

Install speech input module in Ubuntu

Remove the module

If you do not want to use integration module follow these steps: check the executable permissions of the script in the Linux integration module folder and run this script, then remove the folder.

Using the Linux integration mode

Using the Linux integration is similar to using Windows integration, except that the speech input depends of the keyboard state of your computer. So, for example, if you have two languages support in your computer, you must switch your keyboard layout to desired language and then dictate text in that language. Also this language must be default for your system (first in the keyboard layout list), it is true for the most of Linux (in Ubuntu it does not matter).

The voice shortcuts feature is not implemented in the Linux integration module.

Version history

13.06.2016. First release.

05.11.2016. Severe bug has been resolved.

07.11.2016. Improved punctuation and numbers handling.

A new utility for converting subtitles to speech

Tools for text to speech conversion

New tools SRT Speaker and TTS Picker has been added to site. These tools can be usefull to voice video or text.

SRT Speaker

A new tool, SRT Speaker has been added to site. The utility is designed for converting and debugging subtitles in SubRip (SRT) format in the real time to speech.

The tool can be used with voice notebook transcription module for creating video clips in foreign languages. For example, I can make a video clip in Russian, then transcribe it, and translate the subtitles to English. Then I can play the English subtitles in SRTspeaker, and record audio with the help of the virtual audio cable and any sound recorder. After that, I can change the audio track of my video to the new audio with the help of the video editor.

You can see the example of this technology in this video.

TTS Picker

A Chrome application TTS Picker allows to select paragraph and read it by the choosen voice.
утилита TTS Picker

You can set keyboard shortcuts for the buttons in chrome://extensions/ page.

Speech input languages

Authorized users can add custom speech recognition languages (“Speech languages” page in the user account). Language codes must be constructed, according the bcp47 specification. For example for USA English this code is en-US
speech recognition language setting.
Be attention with the case of the letters.

You can hide predefined languages from drop-down list in the page by pressing Hide predefined languages button. In this case, only your languages will be shown. The first added language will be selected when Voice notebook starts.

You can use voice commands Change language 1 and Change language 2 to select a next language from the list (the next language after the last is first). For example, if we added two language: English and French, then we can use keyword “change language” for the command while dictating in English, and “changer la langue” if the French language is used.

You can add the parameter pagelang=YourLangCode to the query string to start Notebook with the desired language. If the language is added by the user, then the user must be logged into the site (must not to press log out when he quit the site). For example this link will open Voice notebook and set German language

Below are the language codes that you can use (the same codes uses Notebook extension):

af-ZA          Afrikaans    
id-ID          Bahasa Indonesia    
ms-MY          Bahasa Melayu    
ca-ES          Català    
cs-CZ          Čeština    
da-DK          Dansk    
de-DE          Deutsch    
en-GB          English (United Kingdom)    
en-US          English (United States)    
es-ES          Español (España)    
es-419          Español (Latinoamérica)    
eu-ES          Euskara    
fil-PH          Filipino    
fr-FR          Français    
gl-ES          Galego    
hr-HR          hrvatski    
zu-ZA          IsiZulu    
is-IS          Íslenska    
it-IT          italiano    
lt-LT          Lietuvių    
hu-HU          Magyar    
nl-NL          Nederlands    
nb-NO          Norsk (Bokmål)    
pl-PL          Polski    
pt-BR          Português (Brasil)    
pt-PT          Português (Portugal)    
ro-RO          Română    
sk-SK          Slovenščina    
sl-SI          Slovenčina    
fi-FI          Suomi    
sv-SE          Svenska    
vi-VN          Tiếng Việt    
tr-TR          Türkçe    
el-GR          Ελληνικά    
bg-BG          български    
ru-RU          Pусский    
sr-RS          Српски    
uk-UA          Українська    
he-IL          עברית    
ar-x-gulf      العربية     
fa-IR          فارسی     
hi-IN          हिन्दी     
th-TH          ไทย     
cmn-Hans-CN    中文(中国)    
cmn-Hant-TW    中文(台灣)    
yue-Hant-HK    中文(香港)    
ja-JP          日本語    
ko-KR          한국어    

Also you can use the codes from the google speech recognition demos. The language codes are in the second column, for example af-ZA. It seems to me that this list is not actual now, because new language codes now will work too, uk-UA for the Ukrainian language for example.

[['Afrikaans',       ['af-ZA']],
 ['Bahasa Indonesia',['id-ID']],
 ['Bahasa Melayu',   ['ms-MY']],
 ['Català',          ['ca-ES']],
 ['Čeština',         ['cs-CZ']],
 ['Deutsch',         ['de-DE']],
 ['English',         ['en-AU', 'Australia'],
                     ['en-CA', 'Canada'],
                     ['en-IN', 'India'],
                     ['en-NZ', 'New Zealand'],
                     ['en-ZA', 'South Africa'],
                     ['en-GB', 'United Kingdom'],
                     ['en-US', 'United States']],
 ['Español',         ['es-AR', 'Argentina'],
                     ['es-BO', 'Bolivia'],
                     ['es-CL', 'Chile'],
                     ['es-CO', 'Colombia'],
                     ['es-CR', 'Costa Rica'],
                     ['es-EC', 'Ecuador'],
                     ['es-SV', 'El Salvador'],
                     ['es-ES', 'España'],
                     ['es-US', 'Estados Unidos'],
                     ['es-GT', 'Guatemala'],
                     ['es-HN', 'Honduras'],
                     ['es-MX', 'México'],
                     ['es-NI', 'Nicaragua'],
                     ['es-PA', 'Panamá'],
                     ['es-PY', 'Paraguay'],
                     ['es-PE', 'Perú'],
                     ['es-PR', 'Puerto Rico'],
                     ['es-DO', 'República Dominicana'],
                     ['es-UY', 'Uruguay'],
                     ['es-VE', 'Venezuela']],
 ['Euskara',         ['eu-ES']],
 ['Français',        ['fr-FR']],
 ['Galego',          ['gl-ES']],
 ['Hrvatski',        ['hr-HR']],
 ['IsiZulu',         ['zu-ZA']],
 ['Íslenska',        ['is-IS']],
 ['Italiano',        ['it-IT', 'Italia'],
                     ['it-CH', 'Svizzera']],
 ['Magyar',          ['hu-HU']],
 ['Nederlands',      ['nl-NL']],
 ['Norsk bokmål',    ['nb-NO']],
 ['Polski',          ['pl-PL']],
 ['Português',       ['pt-BR', 'Brasil'],
                     ['pt-PT', 'Portugal']],
 ['Română',          ['ro-RO']],
 ['Slovenčina',      ['sk-SK']],
 ['Suomi',           ['fi-FI']],
 ['Svenska',         ['sv-SE']],
 ['Türkçe',          ['tr-TR']],
 ['български',       ['bg-BG']],
 ['Pусский',         ['ru-RU']],
 ['Српски',          ['sr-RS']],
 ['한국어',            ['ko-KR']],
 ['中文',             ['cmn-Hans-CN', '普通话 (中国大陆)'],
                     ['cmn-Hans-HK', '普通话 (香港)'],
                     ['cmn-Hant-TW', '中文 (台灣)'],
                     ['yue-Hant-HK', '粵語 (香港)']],
 ['日本語',           ['ja-JP']],
 ['Lingua latīna',   ['la']]];

09.08.2016. Below are the language codes, that use Google Cloud Speech API. It seems to me that we can use them too.

Language language_code Language (English name)
Afrikaans (Suid-Afrika) af-ZA Afrikaans (South Africa)
Bahasa Indonesia (Indonesia) id-ID Indonesian (Indonesia)
Bahasa Melayu (Malaysia) ms-MY Malay (Malaysia)
Català (Espanya) ca-ES Catalan (Spain)
Čeština (Česká republika) cs-CZ Czech (Czech Republic)
Dansk (Danmark) da-DK Danish (Denmark)
Deutsch (Deutschland) de-DE German (Germany)
English (Australia) en-AU English (Australia)
English (Canada) en-CA English (Canada)
English (Great Britain) en-GB English (United Kingdom)
English (India) en-IN English (India)
English (Ireland) en-IE English (Ireland)
English (New Zealand) en-NZ English (New Zealand)
English (Philippines) en-PH English (Philippines)
English (South Africa) en-ZA English (South Africa)
English (United States) en-US English (United States)
Español (Argentina) es-AR Spanish (Argentina)
Español (Bolivia) es-BO Spanish (Bolivia)
Español (Chile) es-CL Spanish (Chile)
Español (Colombia) es-CO Spanish (Colombia)
Español (Costa Rica) es-CR Spanish (Costa Rica)
Español (Ecuador) es-EC Spanish (Ecuador)
Español (El Salvador) es-SV Spanish (El Salvador)
Español (España) es-ES Spanish (Spain)
Español (Estados Unidos) es-US Spanish (United States)
Español (Guatemala) es-GT Spanish (Guatemala)
Español (Honduras) es-HN Spanish (Honduras)
Español (México) es-MX Spanish (Mexico)
Español (Nicaragua) es-NI Spanish (Nicaragua)
Español (Panamá) es-PA Spanish (Panama)
Español (Paraguay) es-PY Spanish (Paraguay)
Español (Perú) es-PE Spanish (Peru)
Español (Puerto Rico) es-PR Spanish (Puerto Rico)
Español (República Dominicana) es-DO Spanish (Dominican Republic)
Español (Uruguay) es-UY Spanish (Uruguay)
Español (Venezuela) es-VE Spanish (Venezuela)
Euskara (Espainia) eu-ES Basque (Spain)
Filipino (Pilipinas) fil-PH Filipino (Philippines)
Français (France) fr-FR French (France)
Galego (España) gl-ES Galician (Spain)
Hrvatski (Hrvatska) hr-HR Croatian (Croatia)
IsiZulu (Ningizimu Afrika) zu-ZA Zulu (South Africa)
Íslenska (Ísland) is-IS Icelandic (Iceland)
Italiano (Italia) it-IT Italian (Italy)
Lietuvių (Lietuva) lt-LT Lithuanian (Lithuania)
Magyar (Magyarország) hu-HU Hungarian (Hungary)
Nederlands (Nederland) nl-NL Dutch (Netherlands)
Norsk bokmål (Norge) nb-NO Norwegian Bokmål (Norway)
Polski (Polska) pl-PL Polish (Poland)
Português (Brasil) pt-BR Portuguese (Brazil)
Português (Portugal) pt-PT Portuguese (Portugal)
Română (România) ro-RO Romanian (Romania)
Slovenčina (Slovensko) sk-SK Slovak (Slovakia)
Slovenščina (Slovenija) sl-SI Slovenian (Slovenia)
Suomi (Suomi) fi-FI Finnish (Finland)
Svenska (Sverige) sv-SE Swedish (Sweden)
Tiếng Việt (Việt Nam) vi-VN Vietnamese (Vietnam)
Türkçe (Türkiye) tr-TR Turkish (Turkey)
Ελληνικά (Ελλάδα) el-GR Greek (Greece)
Български (България) bg-BG Bulgarian (Bulgaria)
Русский (Россия) ru-RU Russian (Russia)
Српски (Србија) sr-RS Serbian (Serbia)
Українська (Україна) uk-UA Ukrainian (Ukraine)
עברית (ישראל) he-IL Hebrew (Israel)
العربية (إسرائيل) ar-IL Arabic (Israel)
العربية (الأردن) ar-JO Arabic (Jordan)
العربية (الإمارات) ar-AE Arabic (United Arab Emirates)
العربية (البحرين) ar-BH Arabic (Bahrain)
العربية (الجزائر) ar-DZ Arabic (Algeria)
العربية (السعودية) ar-SA Arabic (Saudi Arabia)
العربية (العراق) ar-IQ Arabic (Iraq)
العربية (الكويت) ar-KW Arabic (Kuwait)
العربية (المغرب) ar-MA Arabic (Morocco)
العربية (تونس) ar-TN Arabic (Tunisia)
العربية (عُمان) ar-OM Arabic (Oman)
العربية (فلسطين) ar-PS Arabic (State of Palestine)
العربية (قطر) ar-QA Arabic (Qatar)
العربية (لبنان) ar-LB Arabic (Lebanon)
العربية (مصر) ar-EG Arabic (Egypt)
فارسی (ایران) fa-IR Persian (Iran)
हिन्दी (भारत) hi-IN Hindi (India)
ไทย (ประเทศไทย) th-TH Thai (Thailand)
한국어 (대한민국) ko-KR Korean (South Korea)
國語 (台灣) cmn-Hant-TW Chinese, Mandarin (Traditional, Taiwan)
廣東話 (香港) yue-Hant-HK Chinese, Cantonese (Traditional, Hong Kong)
日本語(日本) ja-JP Japanese (Japan)
普通話 (香港) cmn-Hans-HK Chinese, Mandarin (Simplified, Hong Kong)
普通话 (中国大陆) cmn-Hans-CN Chinese, Mandarin (Simplified, China)

Use voice input to activate hotkeys in Windows

You can now use voice input to activate hotkeys in the windows integration mode. The sequence of keystrokes, can be specified in the list of replacement words. Pressing each virtual key is prefixed \\0x (double backslash, zero, small Latin x), followed by two letters of a hexadecimal key code (key code is case insensitive).

For example codes: \\0x11 is for the Ctrl key, 0x1B – for ESC. Spaces and other characters in this sequence are not allowed. The following figure shows an example of assignment of such sequences.

Composing Shortcuts

The pattern \\0x14 will activate the Caps Lock key. The pattern \\0x11\\0x10\\0x1b means Ctrl Shift Esc, which leads to open the Windows Task Manager. The following three lines open the search window (Ctrl F), switch the input language (Ctrl Shift) and open a help window (F1).

You can get the full list of all the virtual keys on the site (virtual keys for mouse pad will not work).

p.s. You need to update the Windows integration module, if it is dated prior to 06.03.2016. Download the zip archive ( and replace your ru-speechpad-host.exe with the new one.

Run voice notebook on Top of Windows

28.02.2016. The new option Stay SpeechPad on Top of Windows has been added now to the extension options dialog. If this checkbox is checked, then the Speech Pad window will be started on top of other windows.

checkbox stay on top

Users must install the SpeechPad extension and integration module to provide this functionality, but no need paid OS integration in their accounts.

Running SpeechPad on top of the other windows is useful for text input in the office applications. This trick can be accomplished in Windows OS (before the new option has been worked out) with the help of special programs: DeskPins, Windows Topmost control (works in latest Windows).

In Linux, you can fix windows on top with the help of internal system tools (right click on the window title and select “On Top” item in the shortcut menu).

Use of Chrome Shortcuts for the SpeechPad URLs with parameters makes the Speechpad window independed of other Chrome windows, and the Speechpad window serves as a small “start/stop” panel in the integration mode. The picture below illustrates this capability.

Speechpad on Top of other windows

Speech recognition stucks periodically

15.11.2016. Users say that the bug has been corrected by Google now, so I add the “disabled” item in the select list. Setting this parameter to 0 in the URL will disable it automatically –

Since 15.12.2015 there is a problem in Speech Recognition services based on Google Chrome. After some time of dictation, depended of the language (i.e. about 300 symbols of text for Russian and about 1000 symbols of text for English), voice recognition stucks for 30-40 or more seconds.

Users can press Start/Stop recording button to restart recognition in SpeechPad in dictating mode.

We open a bug in the chromium group.

24.12.2015. Made some changes in our code to bypass this problem. May be you need to press Ctrl + F5 in to refresh the page, and new changes take effect.

01.04.2015. Add a drop-down list Number of symbols before restart. This setting determines the number of symbols transferred to the output field before speech recognition will be restarted. Set this number to smaller if recognition stuck periodically.

You can set this number in the URL, using the maxsymb parameter, i.e.