BrowseHTMList is an application developed by Mahmoud Abunasser using MS Visual C++ 2005. Its main function is to load a list of HTML pages, each page is associated with an ID. The application allows a user to browse through the HTML pages in the order they
are included in the list. An additional function of the application is to track browsing history times. This is done by starting a timer at the beginning of each session, and the application logs the starting and ending time for viewing each page relative
to the timer that was started at the beginning of the session. The accuracy the timer is in the range of ±16 milliseconds according to Microsoft MSDN™. The time log is used later to automatically segment the recording.
Part of the list provided as input to the BrowseHTMList application. The first column contains Page ids, and the second column, tab separated, contains the HTML page file names:
no_content_intro_SWADESH_012 SWADESH_012_01_intro.html
SWADESH_012_eng_utter_01 SWADESH_012_02_utter_1.html
SWADESH_012_eng_utter_02 SWADESH_012_03_utter_2.html
SWADESH_012_eng_utter_03 SWADESH_012_04_utter_3.html
no_content_other_var_SWADESH_012 SWADESH_012_05_other_varieties.html
SWADESH_012_var_utter_01 SWADESH_012_02_utter_1.html
SWADESH_012_var_utter_02 SWADESH_012_03_utter_2.html
SWADESH_012_var_utter_03 SWADESH_012_04_utter_3.html
Part of the list produced as output from the BrowseHTMList application. The first column contains Swadesh list item ids, the second and third columns contain the timestamps in milliseconds of browsing the page relative to the time of loading the list:
no_content_intro_SWADESH_012 998.406000 1006.986000
SWADESH_012_eng_utter_01 1006.986000 1008.780000
SWADESH_012_eng_utter_02 1008.780000 1011.027000
SWADESH_012_eng_utter_03 1011.027000 1013.289000
no_content_other_var_SWADESH_012 1013.289000 1056.423000
SWADESH_012_var_utter_01 1056.423000 1059.231000
SWADESH_012_var_utter_02 1059.231000 1064.410000
SWADESH_012_var_utter_01 1064.410000 1069.090000
BrowseHTMList application is designed and developed to be as generic as possible to benefit the research community running similar data collection sessions. To achieve this goal, it is made open source under a GNU license agreement . Also, it is designed to
be easy to customize. The most customizable feature is its ability to host HTML files. HTML provides extensive formatting that is application independent: the font can be changed, pictures can be added, tables can be inserted, and much more.
A data collection session facilitated by BrowseHTMList would start by starting the sound recording device then loading the list of HTML pages to be browsed during the session. This list should be saved in a text file in the same folder where the HTML pages are located. The list is loading by the “File --> Open” menu item, loading the list instantiates a timer that will be used to track the time, loads the first page in the list, and specifies the number of pages that will be browsed in the session in the status bar of the application along with a sequence number of the currently browsed page. After loading the list, I play a beep by the “Item --> SyncBeep” menu item. This beep is used to segment the WAV file as discussed in section 2.4.2. Then the participant or the researcher browse through the list by clicking on the menu items “Next” and “Previous”. It was found that the mouse clicks generate undesirable noise in the WAV signal. To avoid this noise, I added the functionality of accessing the menu items using keyboard shortcuts where the keyboard was found to generate less noise. The keyboard shortcuts are “Ctrl+N” for “Next”, and “Ctrl+P” for “Previous”. To add more flexibility, I added three more menu items in addition the “SyncBeep” menu item under the “Item” menu item to allow the user to skip an item, redo an item, or mark an item as a bad item. These are accessed through “Item --> Skip” or “Ctrl+S”, “Item --> Retry” or “Ctrl+R”, and “Item --> MarkBad” or “Ctrl+B” respectively. Skipping an item is useful in cases where the participant is not recording anything related to the HTML page he/she is viewing. In cases of a mistake or mispronunciation, the participant can mark the current item as bad which shows on the status bar of the application. Then, redo the recording of that item. There is 0.5 second delay added to the transition between HTML pages when the user moves to the next or previous pages. This delay is to enforce a pause by the participants especially in cases when they are asked to repeat the same word three times because they must wait for the next page to load before each utterance. The data collection session ends by playing another beep then stopping the recording device. The application generates a log file about the browsing session and saves it in the same folder that has HTML pages.
Refer to Section 2.4 of Mahmoud Abunasser's dissertation "COMPUTATIONAL MEASURES OF LINGUISTIC VARIATION: A STUDY OF ARABIC VARIETIES" for more information about the of this application
This project is partially funded by:
- National Science Foundation (NSF) grant BCS-0826672 (E. Benmamoun, PI)
- Qatar National Research Fund (QNRF) grant NPRP-09-410-1-069 (M. Hasegawa-Johnson, PI)