Removing all characters within HTML tag from a string iOS Development

In many applications , there is a need to retrieve a specific portion of data from a string or to remove a few lines of text present at various places inside a string.

This can be performed in various ways. I am taking HTML string and using various method replacing data present within the tag with empty string.

Below are the three method defined:

- (void) usingNSRange
{
NSString* str = [NSStringstringWithFormat:@"  
string1 1111
string 2 22222
string 3 33333 "]; NSString* startTag = @"
"; NSString* endTag = @""; NSString* replacementString = @""; while ([str rangeOfString:startTag].length != 0 && [str rangeOfString:endTag].length != 0) { NSRange range1 = [str rangeOfString:startTag]; NSRange range2 = [str rangeOfString:endTag]; if(range1.location>range2.location) break; NSRange newRange; newRange.length =range2.location-range1.location+range2.length; newRange.location = range1.location; str = [str stringByReplacingCharactersInRange:newRange withString:replacementString]; } }

In the above code, logic is as :

— While loop will be executed unless start and end tag present in it. And operator is there , so if any of the statement returns false , it will stop execution.

— Next, we will get the range of the first occurance of start tag and end tag.

— In case, the location retrieved for start tag is greater then it will break out from the loop. Take this case:

@” test test test test  

In this case, end tag appears before start tag and in this case condition will return TRUE.

— Next using start and end location, we calculate the range of the string to be eliminated.

LengthOfString = End Tag Location – Start Tag Location + length of end tag

— And start location of the data to be removed will be the location of start tag.

— And next we will replace the string coming in the calculated range with an empty string.

Method – 2

 (void) usingNSScanner
{
    NSString* str = [NSStringstringWithFormat:@"  
string1 1111
string 2 22222
string 3 33333 "]; NSString* startTag = @"
"; NSString* endTag = @""; NSString* text = nil; NSScanner* theScanner; while([str rangeOfString: startTag].length !=0 && [str rangeOfString: endTag].length !=0) { theScanner= [NSScanner scannerWithString:str]; [theScanner scanUpToString: startTag intoString:NULL]; [theScanner scanUpToString: endTag intoString:&text]; if (text.length > 0) { NSString* replacementString = [[NSString alloc] initWithFormat:@"%@%@",text, endTag]; str = [str stringByReplacingOccurrencesOfString:replacementString withString:@""]; [replacementString release]; } text =nil; } }

— This approach is similar to first one but instead of calculating start location and length of string, we are scanning the string and taking out the portion of text that has to be removed.

— while condition is same as first approach. Here the new thing is NSScanner, that is provided with html string.

— First, it will scan the string upto start tag and we don’t need that text so we are placing it null.

Next starting from that point where the start tag is found, it will scan upto end tag and place the string that will be pointed by text.

— And that portion of text will be replaced with an empty string in the actual string.

Method – 3

- (void) usingRegularExpression
{
     NSString* str = [NSStringstringWithFormat:@"  
string1 1111
string 2 22222
string 3 33333 "]; NSString* startTag = @"
"; NSString* endTag = @""; NSError* error; NSString* regularExpressionString = [[NSString alloc] initWithFormat:@"%@.*?%@", startTag, endTag]; NSRegularExpression *regex = [NSRegularExpressionregularExpressionWithPattern:regularExpressionString options:NSRegularExpressionCaseInsensitive error:&error]; NSString *modifiedString = [regex stringByReplacingMatchesInString:str options:0 range:NSMakeRange(0, [str length]) withTemplate:@""]; }

— In this we are using concept of Regular Expression, and replacing all portion of text that matches with the regular expression with an empty string.  .*? that I have used as regular expression is also known as lazy star.

And in this way all text in between the start and end tag will be removed out.

Same output , different ways.

And if you want to remove all tags from html string use start and end tag as < and >.

150 150 Burnignorance | Where Minds Meet And Sparks Fly!