Get First Sentence in PHP (Improved Solution)

Finding the first sentence in a string of text is a trickier task than it appears at face value. Typically the end of a sentence will be a full stop, but it could also be an exclamation or question mark. Also, what if the first sentence contains abbreviations with full stops such as "Dr.", "Mr." or "e.g."?

 

To solve this problem (to a certain extent), we can use two PHP functions in conjunction; one for handling stop characters and abbreviations to ignore and another to get the position of the first sentence in the original text once the correct position has been determined.

 

function strposArray($haystack, $needles, $ignore, $offset = 0)
{
  if (is_array($needles)) {
    foreach ($needles as $needle) {
      $pos = $this->strposArray($haystack, $needle, $ignore);

      if ($pos !== false && ! in_array(substr($haystack, $pos -3, 4), $ignore)) {
        return $pos;
      }
    }
    return false;
  } else {
    return strpos($haystack, $needles, $offset);
  }
}

function firstSentence($string)
{
  $string = html_entity_decode(strip_tags($string));

  $stop = [". ","! ","? ",".\r\n","!\r\n","?\r\n"];
  $ignore = [' Dr.', 'Mrs.', ' Mr.', 'e.g.'];

  $pos = $this->strposArray($string, $stop, $ignore);

  if ($pos === false) {
    return $string;
  } else {
    return substr($string, 0, $pos+1);
  }
}

public function handle()
{
  $sentence = 'This, Dr. John is a sentence of text! This is another sentence.';

  print($this->firstSentence($sentence));
}
This, Dr. John is a sentence of text!

 

It is difficult to say how reliable the above solution will be, especially if a lot of unpredictably messy text is being supplied but for government work, it should be sufficient.

text