Coding standards
Security in ResourceSpace
Developer reference
Database
Action functions
Admin functions
Ajax functions
Annotation functions
API functions
Collections functions
Comment functions
Config functions
CSV export functions
Dash functions
Debug functions
Encryption functions
Facial recognition functions
File functions
General functions
Language functions
Log functions
Login functions
Message functions
Migration functions
Node functions
PDF functions
Plugin functions
Render functions
Reporting functions
Request functions
Research functions
Slideshow functions
Theme permission functions
User functions
Video functions
Database functions
Metadata functions
Resource functions
Search functions
Map functions
Job functions
Tab functions
Test functions

cleanse_string()

Parameters

ColumnTypeDefaultDescription
$string
$preserve_separators
$preserve_hyphen false
$is_html false

Location

include/search_functions.php lines 2203 to 2248

Definition

 
function cleanse_string($string$preserve_separators$preserve_hyphen false$is_html false)
{
    
# Removes characters from a string prior to keyword splitting, for example full stops
    # Also makes the string lower case ready for indexing.
    
global $config_separators;
    
$separators $config_separators;

    
// Replace some HTML entities with empty space
    // Most of them should already be in $config_separators
    // but others, like ­ don't have an actual character that we can copy and paste
    // to $config_separators
    
$string htmlentities($stringENT_QUOTES ENT_SUBSTITUTE'UTF-8');
    
$string str_replace(' '' '$string);
    
$string str_replace('­'' '$string);
    
$string str_replace('‘'' '$string);
    
$string str_replace('’'' '$string);
    
$string str_replace('“'' '$string);
    
$string str_replace('”'' '$string);
    
$string str_replace('–'' '$string);

    
// Revert the htmlentities as otherwise we lose ability to identify certain text e.g. diacritics
    
$string html_entity_decode($stringENT_QUOTES'UTF-8');

    if (
        
$preserve_hyphen
        
&& (substr($string01) == "-" || strpos($string" -") !== false/*support minus as first character for simple NOT searches */
        
&& strpos($string" - ") === false
    
) {
        
# Preserve hyphen - used when NOT indexing so we know which keywords to omit from the search.
        
$separators array_diff($separators, array("-")); # Remove hyphen from separator array.
    
}
    if (
substr($string01) == "!" && strpos(substr($string1), "!") === false) {
            
// If we have the exclamation mark configured as a config separator but we are doing a special search we don't want to remove it
            
$separators array_diff($separators, array("!"));
    }

    if (
$preserve_separators) {
            return 
mb_strtolower(trim_spaces(str_replace($separators" "$string)), 'UTF-8');
    } else {
            
# Also strip out the separators used when specifying multiple field/keyword pairs (comma and colon)
            
$s $separators;
            
$s[] = ",";
            
$s[] = ":";
            return 
mb_strtolower(trim_spaces(str_replace($s" "$string)), 'UTF-8');
    }
}

This article was last updated 26th October 2025 19:05 Europe/London time based on the source file dated 24th October 2025 09:15 Europe/London time.