BrowseHTMList is an application developed by Mahmoud Abunasser using MS Visual C++ 2005. Its main function is to load a list of HTML pages, each page is associated with an ID. The application allows a user to browse through the HTML pages in the order they are included in the list. An additional function of the application is to track browsing history times. This is done by starting a timer at the beginning of each session, and the application logs the starting and ending time for viewing each page relative to the timer that was started at the beginning of the session. The accuracy the timer is in the range of ±16 milliseconds according to Microsoft MSDN™. The time log is used later to automatically segment the recording.

Part of the list provided as input to the BrowseHTMList application. The first column contains Page ids, and the second column, tab separated, contains the HTML page file names:

no_content_intro_SWADESH_012      SWADESH_012_01_intro.html

SWADESH_012_eng_utter_01          SWADESH_012_02_utter_1.html

SWADESH_012_eng_utter_02          SWADESH_012_03_utter_2.html

SWADESH_012_eng_utter_03          SWADESH_012_04_utter_3.html

no_content_other_var_SWADESH_012  SWADESH_012_05_other_varieties.html

SWADESH_012_var_utter_01          SWADESH_012_02_utter_1.html

SWADESH_012_var_utter_02          SWADESH_012_03_utter_2.html

SWADESH_012_var_utter_03          SWADESH_012_04_utter_3.html




Part of the list produced as output from the BrowseHTMList application. The first column contains Swadesh list item ids, the second and third columns contain the timestamps in milliseconds of browsing the page relative to the time of loading the list:

no_content_intro_SWADESH_012      998.406000    1006.986000

SWADESH_012_eng_utter_01          1006.986000   1008.780000

SWADESH_012_eng_utter_02          1008.780000   1011.027000

SWADESH_012_eng_utter_03          1011.027000   1013.289000

no_content_other_var_SWADESH_012  1013.289000   1056.423000

SWADESH_012_var_utter_01          1056.423000   1059.231000

SWADESH_012_var_utter_02          1059.231000   1064.410000

SWADESH_012_var_utter_01          1064.410000   1069.090000



BrowseHTMList application is designed and developed to be as generic as possible to benefit the research community running similar data collection sessions. To achieve this goal, it is made open source under a GNU license agreement . Also, it is designed to be easy to customize. The most customizable feature is its ability to host HTML files. HTML provides extensive formatting that is application independent: the font can be changed, pictures can be added, tables can be inserted, and much more.

A data collection session facilitated by BrowseHTMList would start by starting the sound recording device then loading the list of HTML pages to be browsed during the session. This list should be saved in a text file in the same folder where the HTML pages are located. The list is loading by the “File -->  Open” menu item, loading the list instantiates a timer that will be used to track the time, loads the first page in the list, and specifies the number of pages that will be browsed in the session in the status bar of the application along with a sequence number of the currently browsed page. After loading the list, I play a beep by the “Item --> SyncBeep” menu item. This beep is used to segment the WAV file as discussed in section 2.4.2. Then the participant or the researcher browse through the list by clicking on the menu items “Next” and “Previous”. It was found that the mouse clicks generate undesirable noise in the WAV signal. To avoid this noise, I added the functionality of accessing the menu items using keyboard shortcuts where the keyboard was found to generate less noise. The keyboard shortcuts are “Ctrl+N” for “Next”, and “Ctrl+P” for “Previous”. To add more flexibility, I added three more menu items in addition the “SyncBeep” menu item under the “Item” menu item to allow the user to skip an item, redo an item, or mark an item as a bad item. These are accessed through “Item --> Skip” or “Ctrl+S”, “Item --> Retry” or “Ctrl+R”, and “Item --> MarkBad” or “Ctrl+B” respectively. Skipping an item is useful in cases where the participant is not recording anything related to the HTML page he/she is viewing. In cases of a mistake or mispronunciation, the participant can mark the current item as bad which shows on the status bar of the application. Then, redo the recording of that item. There is 0.5 second delay added to the transition between HTML pages when the user moves to the next or previous pages. This delay is to enforce a pause by the participants especially in cases when they are asked to repeat the same word three times because they must wait for the next page to load before each utterance. The data collection session ends by playing another beep then stopping the recording device. The application generates a log file about the browsing session and saves it in the same folder that has HTML pages.

 


Refer to Section 2.4 of Mahmoud Abunasser's dissertation "COMPUTATIONAL MEASURES OF LINGUISTIC VARIATION: A STUDY OF ARABIC VARIETIES" for more information about the of this application

 

This project is partially funded by:

- National Science Foundation (NSF) grant BCS-0826672 (E. Benmamoun, PI)

- Qatar National Research Fund (QNRF) grant NPRP-09-410-1-069 (M. Hasegawa-Johnson, PI)