| 1 | kses 0.2.2 README [kses strips evil scripts!] |
|---|
| 2 | ================= |
|---|
| 3 | |
|---|
| 4 | |
|---|
| 5 | * INTRODUCTION * |
|---|
| 6 | |
|---|
| 7 | |
|---|
| 8 | Welcome to kses - an HTML/XHTML filter written in PHP. It removes all unwanted |
|---|
| 9 | HTML elements and attributes, no matter how malformed HTML input you give it. |
|---|
| 10 | It also does several checks on attribute values. kses can be used to avoid |
|---|
| 11 | Cross-Site Scripting (XSS), Buffer Overflows and Denial of Service attacks, |
|---|
| 12 | among other things. |
|---|
| 13 | |
|---|
| 14 | The program is released under the terms of the GNU General Public License. You |
|---|
| 15 | should look into what that means, before using kses in your programs. You can |
|---|
| 16 | find the full text of the license in the file COPYING. |
|---|
| 17 | |
|---|
| 18 | |
|---|
| 19 | * FEATURES * |
|---|
| 20 | |
|---|
| 21 | |
|---|
| 22 | Some of kses' current features are: |
|---|
| 23 | |
|---|
| 24 | * It will only allow the HTML elements and attributes that it was explicitly |
|---|
| 25 | told to allow. |
|---|
| 26 | |
|---|
| 27 | * Element and attribute names are case-insensitive (a href vs A HREF). |
|---|
| 28 | |
|---|
| 29 | * It will understand and process whitespace correctly. |
|---|
| 30 | |
|---|
| 31 | * Attribute values can be surrounded with quotes, apostrophes or nothing. |
|---|
| 32 | |
|---|
| 33 | * It will accept valueless attributes with just names and no values (selected). |
|---|
| 34 | |
|---|
| 35 | * It will accept XHTML's closing " /" marks. |
|---|
| 36 | |
|---|
| 37 | * Attribute values that are surrounded with nothing will get quotes to avoid |
|---|
| 38 | producing non-W3C conforming HTML |
|---|
| 39 | (<a href=http://sourceforge.net/projects/kses> works but isn't valid HTML). |
|---|
| 40 | |
|---|
| 41 | * It handles lots of types of malformed HTML, by interpreting the existing |
|---|
| 42 | code the best it can and then rebuilding new code from it. That's a better |
|---|
| 43 | approach than trying to process existing code, as you're bound to forget about |
|---|
| 44 | some weird special case somewhere. It handles problems like never-ending |
|---|
| 45 | quotes and tags gracefully. |
|---|
| 46 | |
|---|
| 47 | * It will remove additional "<" and ">" characters that people may try to |
|---|
| 48 | sneak in somewhere. |
|---|
| 49 | |
|---|
| 50 | * It supports checking attribute values for minimum/maximum length and |
|---|
| 51 | minimum/maximum value, to protect against Buffer Overflows and Denial of |
|---|
| 52 | Service attacks against WWW clients and various servers. You can stop |
|---|
| 53 | <iframe src= width= height=> from having too high values for width and height, |
|---|
| 54 | for instance. |
|---|
| 55 | |
|---|
| 56 | * It has got a system for whitelisting URL protocols. You can say that |
|---|
| 57 | attribute values may only start with http:, https:, ftp: and gopher:, but no |
|---|
| 58 | other URL protocols (javascript:, java:, about:, telnet:..). The functions that |
|---|
| 59 | do this work handle whitespace, upper/lower case, HTML entities |
|---|
| 60 | ("javascript:") and repeated entries ("javascript:javascript:alert(57)"). |
|---|
| 61 | It also normalizes HTML entities as a nice side effect. |
|---|
| 62 | |
|---|
| 63 | * It removes Netscape 4's JavaScript entities ("&{alert(57)};"). |
|---|
| 64 | |
|---|
| 65 | * It handles NULL bytes and Opera's chr(173) whitespace characters. |
|---|
| 66 | |
|---|
| 67 | * There is a procedural version and two object-oriented versions (for PHP 4 |
|---|
| 68 | and PHP 5) of kses. |
|---|
| 69 | |
|---|
| 70 | |
|---|
| 71 | * USE IT * |
|---|
| 72 | |
|---|
| 73 | |
|---|
| 74 | It's very easy to use kses in your own PHP web application! Basic usage looks |
|---|
| 75 | like this: |
|---|
| 76 | |
|---|
| 77 | |
|---|
| 78 | <?php |
|---|
| 79 | |
|---|
| 80 | include 'kses.php'; |
|---|
| 81 | |
|---|
| 82 | $allowed = array('b' => array(), |
|---|
| 83 | 'i' => array(), |
|---|
| 84 | 'a' => array('href' => 1, 'title' => 1), |
|---|
| 85 | 'p' => array('align' => 1), |
|---|
| 86 | 'br' => array()); |
|---|
| 87 | |
|---|
| 88 | $val = $_POST['val']; |
|---|
| 89 | if (get_magic_quotes_gpc()) |
|---|
| 90 | $val = stripslashes($val); |
|---|
| 91 | # You must strip slashes from magic quotes, or kses will get confused. |
|---|
| 92 | |
|---|
| 93 | $val = kses($val, $allowed); # The filtering takes place here. |
|---|
| 94 | |
|---|
| 95 | # Do something with $val. |
|---|
| 96 | |
|---|
| 97 | ?> |
|---|
| 98 | |
|---|
| 99 | |
|---|
| 100 | This definition of $allowed means that only the elements B, I, A, P and BR are |
|---|
| 101 | allowed (along with their closing tags /B, /I, /A, /P and /BR). B, I and BR |
|---|
| 102 | may not have any attributes. A may only have the attributes HREF and TITLE, |
|---|
| 103 | while P may only have the attribute ALIGN. You can list the elements and |
|---|
| 104 | attributes in the array in any mixture of upper and lower case. kses will also |
|---|
| 105 | recognize HTML code that uses both lower and upper case. |
|---|
| 106 | |
|---|
| 107 | It's important to select the right allowed attributes, so you won't open up |
|---|
| 108 | an XSS hole by mistake. Some important attributes that you mustn't allow |
|---|
| 109 | include but are not limited to: 1) style, and 2) all intrinsic events |
|---|
| 110 | attributes (onMouseOver and so on, on* really). I'll write more about this in |
|---|
| 111 | the documentation that will be distributed with future versions of kses. |
|---|
| 112 | |
|---|
| 113 | It's also important to note that kses' HTML input must be cleaned of all |
|---|
| 114 | slashes coming from magic quotes. If the rest of your code requires these |
|---|
| 115 | slashes to be present, you can always add them again after calling kses with |
|---|
| 116 | a simple addslashes() call. |
|---|
| 117 | |
|---|
| 118 | You should take a look at the documentation in the docs/ directory and the |
|---|
| 119 | examples in the examples/ directory, to get more information on how to use |
|---|
| 120 | kses. The object-oriented versions of kses are also worth checking out, and |
|---|
| 121 | they're included in the oop/ directory. |
|---|
| 122 | |
|---|
| 123 | |
|---|
| 124 | * UPGRADING TO 0.2.2 * |
|---|
| 125 | |
|---|
| 126 | |
|---|
| 127 | kses 0.2.2 is backwards compatible with all previous releases, so upgrading |
|---|
| 128 | should just be a matter of using a new version of kses.php instead of an old |
|---|
| 129 | one. |
|---|
| 130 | |
|---|
| 131 | |
|---|
| 132 | * NEW VERSIONS, MAILING LISTS AND BUG REPORTS * |
|---|
| 133 | |
|---|
| 134 | |
|---|
| 135 | If you want to download new versions, subscribe to the kses-general mailing |
|---|
| 136 | list or even take part in the development of kses, we refer you to its |
|---|
| 137 | homepage at http://sourceforge.net/projects/kses . New developers and beta |
|---|
| 138 | testers are more than welcome! |
|---|
| 139 | |
|---|
| 140 | If you have any bug reports, suggestions for improvement or simply want to tell |
|---|
| 141 | us that you use kses for some project, feel free to post to the kses-general |
|---|
| 142 | mailing list. If you have found any security problems (particularly XSS, |
|---|
| 143 | naturally) in kses, please contact Ulf privately at metaur at users dot |
|---|
| 144 | sourceforge dot net so he can correct it before you or someone else tells the |
|---|
| 145 | public about it. |
|---|
| 146 | |
|---|
| 147 | (No, it's not a security problem in kses if some program that uses it allows a |
|---|
| 148 | bad attribute, silly. If kses is told to accept the element body with the |
|---|
| 149 | attributes style and onLoad, it will accept them, even if that's a really bad |
|---|
| 150 | idea, securitywise.) |
|---|
| 151 | |
|---|
| 152 | |
|---|
| 153 | * OTHER HTML FILTERS * |
|---|
| 154 | |
|---|
| 155 | |
|---|
| 156 | Here are the other stand-alone, open source HTML filters that we currently know |
|---|
| 157 | of: |
|---|
| 158 | |
|---|
| 159 | * Htmlfilter for PHP - the filter from Squirrelmail |
|---|
| 160 | PHP |
|---|
| 161 | Konstantin Riabitsev |
|---|
| 162 | http://linux.duke.edu/projects/mini/htmlfilter/ |
|---|
| 163 | |
|---|
| 164 | * HTML::StripScripts and related CPAN modules |
|---|
| 165 | Perl |
|---|
| 166 | Nick Cleaton |
|---|
| 167 | http://search.cpan.org/perldoc?HTML%3A%3AStripScripts |
|---|
| 168 | |
|---|
| 169 | * SafeHtmlChecker [is this really open source?] |
|---|
| 170 | PHP |
|---|
| 171 | Simon Willison |
|---|
| 172 | http://simon.incutio.com/archive/2003/02/23/safeHtmlChecker |
|---|
| 173 | |
|---|
| 174 | There are also a lot of HTML filters that were written specifically for some |
|---|
| 175 | program. Some of them are better than others. |
|---|
| 176 | |
|---|
| 177 | Please write to the kses-general mailing list if you know of any other |
|---|
| 178 | stand-alone, open-source filters. |
|---|
| 179 | |
|---|
| 180 | |
|---|
| 181 | * DEDICATION * |
|---|
| 182 | |
|---|
| 183 | |
|---|
| 184 | kses 0.2.2 is dedicated to Audrey Tautou and Jean-Pierre Jeunet. |
|---|
| 185 | |
|---|
| 186 | |
|---|
| 187 | * MISC * |
|---|
| 188 | |
|---|
| 189 | |
|---|
| 190 | The kses code is based on an HTML filter that Ulf wrote on his own back in 2002 |
|---|
| 191 | for the open-source project Gnuheter ( http://savannah.nongnu.org/projects/ |
|---|
| 192 | gnuheter ). Gnuheter is a fork from PHP-Nuke. The HTML filter has been |
|---|
| 193 | improved a lot since then. |
|---|
| 194 | |
|---|
| 195 | To stop people from having sleepless nights, we feel the urgent need to state |
|---|
| 196 | that kses doesn't have anything to do with the KDE project, despite having a |
|---|
| 197 | name that starts with a K. |
|---|
| 198 | |
|---|
| 199 | In case someone was wondering, Ulf is available for kses-related consulting. |
|---|
| 200 | |
|---|
| 201 | Finally, the name kses comes from the terms XSS and access. It's also a |
|---|
| 202 | recursive acronym (every open-source project should have one!) for "kses |
|---|
| 203 | strips evil scripts". |
|---|
| 204 | |
|---|
| 205 | |
|---|
| 206 | // Ulf and the kses development group, February 2005 |
|---|