root/trunk/gregarius/extlib/README.kses

Revision 260, 7.1 kB (checked in by mbonetti, 4 years ago)

upgraded to MagpieRSS 0.71 (patched), kses 0.2.2; moved exteral files to extlib

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
Line 
1kses 0.2.2 README  [kses strips evil scripts!]
2=================
3
4
5* INTRODUCTION *
6
7
8Welcome to kses - an HTML/XHTML filter written in PHP. It removes all unwanted
9HTML elements and attributes, no matter how malformed HTML input you give it.
10It also does several checks on attribute values. kses can be used to avoid
11Cross-Site Scripting (XSS), Buffer Overflows and Denial of Service attacks,
12among other things.
13
14The program is released under the terms of the GNU General Public License. You
15should look into what that means, before using kses in your programs. You can
16find the full text of the license in the file COPYING.
17
18
19* FEATURES *
20
21
22Some of kses' current features are:
23
24* It will only allow the HTML elements and attributes that it was explicitly
25told to allow.
26
27* Element and attribute names are case-insensitive (a href vs A HREF).
28
29* It will understand and process whitespace correctly.
30
31* Attribute values can be surrounded with quotes, apostrophes or nothing.
32
33* It will accept valueless attributes with just names and no values (selected).
34
35* It will accept XHTML's closing " /" marks.
36
37* Attribute values that are surrounded with nothing will get quotes to avoid
38producing non-W3C conforming HTML
39(<a href=http://sourceforge.net/projects/kses> works but isn't valid HTML).
40
41* It handles lots of types of malformed HTML, by interpreting the existing
42code the best it can and then rebuilding new code from it. That's a better
43approach than trying to process existing code, as you're bound to forget about
44some weird special case somewhere. It handles problems like never-ending
45quotes and tags gracefully.
46
47* It will remove additional "<" and ">" characters that people may try to
48sneak in somewhere.
49
50* It supports checking attribute values for minimum/maximum length and
51minimum/maximum value, to protect against Buffer Overflows and Denial of
52Service attacks against WWW clients and various servers. You can stop
53<iframe src= width= height=> from having too high values for width and height,
54for instance.
55
56* It has got a system for whitelisting URL protocols. You can say that
57attribute values may only start with http:, https:, ftp: and gopher:, but no
58other URL protocols (javascript:, java:, about:, telnet:..). The functions that
59do this work handle whitespace, upper/lower case, HTML entities
60("jav&#97;script:") and repeated entries ("javascript:javascript:alert(57)").
61It also normalizes HTML entities as a nice side effect.
62
63* It removes Netscape 4's JavaScript entities ("&{alert(57)};").
64
65* It handles NULL bytes and Opera's chr(173) whitespace characters.
66
67* There is a procedural version and two object-oriented versions (for PHP 4
68  and PHP 5) of kses.
69
70
71* USE IT *
72
73
74It's very easy to use kses in your own PHP web application! Basic usage looks
75like this:
76
77
78<?php
79
80include 'kses.php';
81
82$allowed = array('b' => array(),
83                 'i' => array(),
84                 'a' => array('href' => 1, 'title' => 1),
85                 'p' => array('align' => 1),
86                 'br' => array());
87
88$val = $_POST['val'];
89if (get_magic_quotes_gpc())
90  $val = stripslashes($val);
91# You must strip slashes from magic quotes, or kses will get confused.
92
93$val = kses($val, $allowed); # The filtering takes place here.
94
95# Do something with $val.
96
97?>
98
99
100This definition of $allowed means that only the elements B, I, A, P and BR are
101allowed (along with their closing tags /B, /I, /A, /P and /BR). B, I and BR
102may not have any attributes. A may only have the attributes HREF and TITLE,
103while P may only have the attribute ALIGN. You can list the elements and
104attributes in the array in any mixture of upper and lower case. kses will also
105recognize HTML code that uses both lower and upper case.
106
107It's important to select the right allowed attributes, so you won't open up
108an XSS hole by mistake. Some important attributes that you mustn't allow
109include but are not limited to: 1) style, and 2) all intrinsic events
110attributes (onMouseOver and so on, on* really). I'll write more about this in
111the documentation that will be distributed with future versions of kses.
112
113It's also important to note that kses' HTML input must be cleaned of all
114slashes coming from magic quotes. If the rest of your code requires these
115slashes to be present, you can always add them again after calling kses with
116a simple addslashes() call.
117
118You should take a look at the documentation in the docs/ directory and the
119examples in the examples/ directory, to get more information on how to use
120kses. The object-oriented versions of kses are also worth checking out, and
121they're included in the oop/ directory.
122
123
124* UPGRADING TO 0.2.2 *
125
126
127kses 0.2.2 is backwards compatible with all previous releases, so upgrading
128should just be a matter of using a new version of kses.php instead of an old
129one.
130
131
132* NEW VERSIONS, MAILING LISTS AND BUG REPORTS *
133
134
135If you want to download new versions, subscribe to the kses-general mailing
136list or even take part in the development of kses, we refer you to its
137homepage at  http://sourceforge.net/projects/kses . New developers and beta
138testers are more than welcome!
139
140If you have any bug reports, suggestions for improvement or simply want to tell
141us that you use kses for some project, feel free to post to the kses-general
142mailing list. If you have found any security problems (particularly XSS,
143naturally) in kses, please contact Ulf privately at  metaur at users dot
144sourceforge dot net  so he can correct it before you or someone else tells the
145public about it.
146
147(No, it's not a security problem in kses if some program that uses it allows a
148bad attribute, silly. If kses is told to accept the element body with the
149attributes style and onLoad, it will accept them, even if that's a really bad
150idea, securitywise.)
151
152
153* OTHER HTML FILTERS *
154
155
156Here are the other stand-alone, open source HTML filters that we currently know
157of:
158
159* Htmlfilter for PHP - the filter from Squirrelmail
160  PHP
161  Konstantin Riabitsev
162  http://linux.duke.edu/projects/mini/htmlfilter/
163
164* HTML::StripScripts and related CPAN modules
165  Perl
166  Nick Cleaton
167  http://search.cpan.org/perldoc?HTML%3A%3AStripScripts
168
169* SafeHtmlChecker [is this really open source?]
170  PHP
171  Simon Willison
172  http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker
173
174There are also a lot of HTML filters that were written specifically for some
175program. Some of them are better than others.
176
177Please write to the kses-general mailing list if you know of any other
178stand-alone, open-source filters.
179
180
181* DEDICATION *
182
183
184kses 0.2.2 is dedicated to Audrey Tautou and Jean-Pierre Jeunet.
185
186
187* MISC *
188
189
190The kses code is based on an HTML filter that Ulf wrote on his own back in 2002
191for the open-source project Gnuheter ( http://savannah.nongnu.org/projects/
192gnuheter ). Gnuheter is a fork from PHP-Nuke. The HTML filter has been
193improved a lot since then.
194
195To stop people from having sleepless nights, we feel the urgent need to state
196that kses doesn't have anything to do with the KDE project, despite having a
197name that starts with a K.
198
199In case someone was wondering, Ulf is available for kses-related consulting.
200
201Finally, the name kses comes from the terms XSS and access. It's also a
202recursive acronym (every open-source project should have one!) for "kses
203strips evil scripts".
204
205
206// Ulf and the kses development group, February 2005
Note: See TracBrowser for help on using the browser.