qanalyze Copyright (C) 2001 Marty White

This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License
as published by the Free Software Foundation; either version 2
of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.

Marty White
http://www.soc.cornell.edu/computing/
mailto:lmw22@cornell.edu
380 Uris Hall, Cornell University

--

TECH DETAILS

The `qanalyze` programs are being written in Perl by Marty White.
  http://www.soc.cornell.edu/qanalyze/

Perl, while not an ideal programming language for academics, is
an ideal language for text processing.

The program is split into several parts, plus several data files.

`qanalyze` is the interactive front-end.  Type `perl qanalyze` to get help on
how to use it.  This program uses a text viewer of your choice, and uses
gnuplot to display graphs of analysis results.

`plex` is a program that automatically marks-up an ASCII text in the mark-up
notation developed by Professor Hayes for use in his earlier programs.  Such
mark-up must be at least partly done by a human for proper scientific results,
but the plex program can do the better half of the work by using a few simple
rules-of-thumb (for instance, any word consisting of two or more capital
letters is probably an acronym and therefore a proper noun and should be marked
"illegitimate" for the purposes of analysis).

`qx` does the statistical analysis of the given text and generates report
files.

The data files consist of a common-word lexicon, containing the 10,000 most
commonly used words and their associated U-values (usage per million), and one
or more ordinary spell-checking files listing as many legitimate words as can
be had.  The common-word lexicon is used for statistical analysis, while the
spell-checking words are used simply to decide whether a given sequence of
letters is a word or not.  Words in the spell-checking file that are
capitalized are assumed to be proper names and this is used by plex to help
decide which words are or are not proper names.