This page is for people who want to search or research the EIR using a special set of indices. The three indices were compiled using a set of computer programs developed by Viridian Technologies specifically for the El Toro Airport.org website. To make use of the index you need to have Microsoft Access loaded on your machine (sorry, a discussion on installing and use of Access is beyond the scope of this write-up.)
There are three databases in the index collection. Two of the files are approximately 10 megs zipped, so they will take a little while to download. The third file is only 200K or so. Below is a description of the files and ideas for using them to search the EIR.
kwic-all-words: This Microsoft Access database contains a table that is approximately 75 megabytes. It contains a list of approximately 400,000 words and numbers extracted from Volumes 1, 2A, 2B and Appendix F of the EIR. Each word is listed "in context", hence the name of the index - Keyword In Context (KWIC). Each indexed word is shown with the 40 characters preceeding and following it. This makes it easy to see if the word you are looking up is being used in the context you intended.
kwic-keywords: This second Access database is a subset of the kwic-all-words table. Approximately 70,000 numeric references were deleted from the kwic-all-words table, leaving about 330,000 words. We had intended to filter this index down further, but have not yet decided on a way to eliminate unimportant or low-value words.
kwic-frequencies: This third Access database contains a frequency count of each word in the kwic-all-words index.
Ideas for researching the EIR and generating questions for the County:
1. If you just want to look up a word like "exceedance", you could go straight to the kwic-keyword table, click on the Word column, Ctrl/Find and type in Exceedance. There you'll find 94 references to this word. Usually you can tell from the 100 character context fragment whether or not this reference to the word is interesting to you. (You need to be connected to the net, and have your browser running for this step to work). If it is interesting, then click on the blue hypertext link in the "link" column next to the word "exceedance". Access will talk to your browser, download the volume of the EIR in question, and jump you to the page where that word is used. The first time you perform a lookup, it will take a few seconds (depending on your internet connection speed) to download the full html document. Once you have the document cached on your machine, subsequent jumps should be sub-second. You should be able to check out 94 references in 94 seconds.
2. If you suspect the EIR contains euphemisms for air/noise/water/traffic problems then look down the kwic-frequencies table till you find a word that jumps out at you. This table is sorted in descending order by word frequency. There are 17,210 unique words in this table that cover the 4 volumes mentioned above. It will only take a minute or so of skimming this list before you find a word that jumps off the page - words like solvents, could, should, nonattainment, significant, severe, Assumptions, Ozone, acceptable, Cleanup, remediation, contaminant, Musick, Police, tanker, insignificant, etc... If you see something with 100 references it might be worth checking out. Switch over to the kwic-keywords table that is sorted in word order, and follow the procedure described above. Again, you sould be able to check out 100 references across all 4 documents in 100 seconds.
3. Once you find some interesting text in the HTML version of the EIR volume, you should double-check that text against the Acrobat version. Remember that the HTML pages and the database indices were computer generated after reverse-engineering the EIR pages using Optical Character Recognition software. We did not want to introduce human errors by attempting to fix up mis-recognized OCR words. Before submitting your question to the county, you should check the Acrobat version of the page to see if the non-OCR version is a little more readable. As mentioned elsewhere, the OCR software has a particularly hard time recognizing italicized words in the EIR, words in tables, bold letters, special symbols like [ ] ( ) - & # @, and words that contain letters that touch (e.g. the simple word "from" is almost always unrecognizable because the F and R touch).
If you have questions, suggestions or need technical help in accessing any of these files please don't hesitate to e-mail the Viridian help desk at John@ViridianTech.com.