By downloading this code you agree to the
Source Code Use License (PDF). |
{ version = 2.40; (* of imgalt.p 2017 Apr 06}
(* begin module describe.imgalt *)
(*
name
imgalt: html image alt detection and upgrading for 508 requirements
synopsis
imgalt(orihtml: in, imgaltp: in,
imagereport: out, imagenames: out,
althtml: out, altstrings: out,
stop: out,
output: out)
files
orihtml: the original html file to be analyzed.
imgaltp: parameters to control the program. The file must contain the
following parameters, one per line:
parameterversion: The version number of the program. This allows the
user to be warned if an old parameter file is used.
Second line: showconversion (charcter),
showparse (charcter):
If the first character is a 'c', then each conversion name
and alt tag is shown as it is read in. This could be a long
list. They look like: 'new.gif -> "new"'.
If the second character is an 's', then the orihtml file is
copied to the output with parsing information displayed.
Each line number being examined is given and then a set of
characters show the parsing information:
not '-'
in comment 'C'
in quote 'Q'
in image 'I'
found src 's'
found src= '='
found alt 'a'
found alt= '='
Third line: temporarytag (string): This string is the default
'temporary tag' alt string to provide if the user does not
specify one and there is no alt tag. Suggestion:
ALTERNATE_TEXT
The following lines of the file contain pairs consisting of a
file name (e.g. colorbar.gif) and an alt string in double quotes
to use for that file name (e.g. "a color bar").
imagereport: report of the results
imagenames: names of images.
The first line begins with '*' and identifies the program.
The second line begins with '*' and identified the columns. The
rest of the file has these columns:
1. type: If the line begins with
'alt ' then the image already has an alt and the alt is not
listed in imgaltp.
'--- ' then the image does not have an alt and none is listed
in imgaltp.
'+++ ' then the image alt will be supplied from imgaltp in
creating althtml from orihtml.
2. url: The image url is given as reported in the orihtml. Note
that this may not be a full url.
3. Ourl: The line in the orihtml file where the image url is
given.
4. Oalt: The line in the orihtml file where the alternative text
string is.
5. Aurl: The line in the althtml file where the image url is
given. This is different from the lines in the orihtml file
because extra lines are added to althtml to put each alternative
text string on a fresh line.
6. Aalt: The line in the althtml file where the alternative text
string is.
althtml: a copy of orihtml with alt strings inserted as needed.
There are four cases to handle depending on whether or not there
already is an alt tag and whether or not what to do is specified
in the imgaltp.
--- no alt tag, alternative not supplied: fill in with temporary tag
+++ no alt tag, alternative supplied: use tag supplied in imgaltp
alt alt tag, alternative not supplied: leave alone
+++ alt tag, alternative supplied: use tag supplied in imgaltp
For the first case, when a temporary tag is generated, it will
contain the string specified by the user in imgaltp, for example
"ALTERNATE_TEXT".
Note that an empty alt tag is replaced with data from imgaltp
only if there is a file name match. This preserves empty alt
tags.
In this version of imgalt, the <img ... > is rewritten in
althtml so that src and alt are at the end of the <img ... >.
Both the src and alt are on their own lines. Also, the alt
string is wrapped and the current indentation is retained.
Any spaces on the ends of the orihtml are removed from lines of
althtml. This allows comparison of the results. If the imgaltp
is not changed, then moving althtml back to orihtml and running
imgalt should, in theory, make orihtml and althtml identical.
altstrings: image file names and alt strings in quotes, one per line
These are taken from both imgaltp and orihtml and duplicates
are removed. Priority is given to the first file name/alt
string found in the list so that what a user defines overrides
values in the orihtml.
stop: The stop file is written only if there is a program error.
This allows other programs to know that imgalt has crashed and
handle the situation gracefully.
output: messages to the user
description
The 508 requirements require that every image (img) in an html page
have an alternative (alt) description tag. This program, imgalt,
reads an html page (orihtml), detects the images, identifies the
source of the image and whether there is an alt tag. It rewrites
the page so that the image has the source followed by alt tags that
the user specifies in a master list (provided in imgaltp). The
revised master list is created (altstrings). If there is a
problem, the stop file is created which can be used to halt
recursive reading through directories (see the tree script in my
toolbox). Other files give information about the image tags.
In an html web page, orihtml, identify images that do not have
"alt" (alternative) descriptions. Create a revised html, called
althtml, that inserts alts with tags that can be fixed. The
initial list of alt tags to use is given in the parameter file
imgaltp.
The program outputs a list of the images, called imagenames, marked
with whether or not they have an alt description. This list gives
the line numbers for the url and alt tag in the both the orihtml
and althtml files.
NOTE: only the actual file name, following all slashes ('/') in the
URL is used to identify the image name in orihtml, but the whole
URL is passed to althtml.
The program also generates a list all the file names with their alt
tags into the altstrings file. The list starts with pairs given by
imgaltp and is followed by ones found in the file. This list can
be appended to the end of the imgaltp parameter file. Duplicate
file names are removed from the altstrings list. Duplicates that
are further down the list are removed, so priority is given to the
top of the list. This allows a user to define a new alt tag for a
given file, and this will override all cases subsequently found.
The program does a single pass through the html file. To replace
an alt tag, the file has to be identified before the alt tag is
given. One approach would be to requre that the src string tag be
before the alt tag. However, the program is smarter than that.
When it encounters an img that is not inside a comment, it copies
the contents but skips and remembers the alt and src. Then when
the end of the img is found, it prints them on their own lines with
the same indentation as the original code. The src is given before
the alt.
examples
Example imgaltp file:
1.36 version of imgalt that this parameter file is designed for.
sn 's' means show the conversion lines as read in
ALTERNATE_TEXT
colorbar.gif "a color bar"
donor.gif "sequence logo for human donor splice junctions"
documentation
see also
google search for 'html alt'
http://www.google.com/search?hl=en&source=hp&q=html+alt
A useful tutorial, "The Rules of ALT":
http://html.com/images/rules-of-alt/
A useful tutorial, "Guidelines on alt texts in img elements":
http://www.cs.tut.fi/~jkorpela/html/alt.html
Program that removes blanks from ends of lines: rembla.p
**** Related Scripts ****
mh sets up for correcting alts in a single html file
alt called by my or directly to correct alts
mkalttags uses tree to analyze all htmls in a directory
mkalttagsfunction - called by mkalttags through the tree script
tree general recursive directory processing
masteralt provides the alt string for a given image
author
Thomas Dana Schneider
bugs
technical notes
The program imgalt reads through the original html (orihtml) and
copies to the altered html (althtml). When it is inside an image
'<img' it copies until it sees a source file 'src=' and it then
captures that. Likewise it captures an alt string 'alt='. Case of
the src and alt and spacing to the equals do not matter. These are
captured but not sent to the output. Instead, they are held until
the end of the img is found at '>'. At this point the src is
output followed by the alt. So if the src is after the alt, the
order will be reversed.
*)
(* end module describe.imgalt *)
{This manual page was created by makman 1.45}
{created by htmlink 1.62}