qanalyze Copyright (C) 2001 Marty White This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. Marty White http://www.soc.cornell.edu/computing/ mailto:lmw22@cornell.edu 380 Uris Hall, Cornell University -- TECH DETAILS The `qanalyze` programs are being written in Perl by Marty White. http://www.soc.cornell.edu/qanalyze/ Perl, while not an ideal programming language for academics, is an ideal language for text processing. The program is split into several parts, plus several data files. `qanalyze` is the interactive front-end. Type `perl qanalyze` to get help on how to use it. This program uses a text viewer of your choice, and uses gnuplot to display graphs of analysis results. `plex` is a program that automatically marks-up an ASCII text in the mark-up notation developed by Professor Hayes for use in his earlier programs. Such mark-up must be at least partly done by a human for proper scientific results, but the plex program can do the better half of the work by using a few simple rules-of-thumb (for instance, any word consisting of two or more capital letters is probably an acronym and therefore a proper noun and should be marked "illegitimate" for the purposes of analysis). `qx` does the statistical analysis of the given text and generates report files. The data files consist of a common-word lexicon, containing the 10,000 most commonly used words and their associated U-values (usage per million), and one or more ordinary spell-checking files listing as many legitimate words as can be had. The common-word lexicon is used for statistical analysis, while the spell-checking words are used simply to decide whether a given sequence of letters is a word or not. Words in the spell-checking file that are capitalized are assumed to be proper names and this is used by plex to help decide which words are or are not proper names.