Regular Expressions in Python:

Regular Expressions 1 (i2tutorials)

 

 

In this article I will go through the regular expressions which are very popular among programmers and these regular expressions can be applied in programming languages like c++,Java, python, php etc. these are mostly used in data cleaning and data wrangling . simply it is a default way of data cleaning.

Regular expressions are the sequence of characters and these are mainly used in to find or replace the patterns in a string or file.

In this we are using two types of characters.

1. Meta Characters.

2. Litterals. (1,2, a, b).

For this we are importing the python library “re” that helps with regular expressions.

 

The most common uses of Regular expressions are

1. Search a string (search and match)

2. Finding a string (findall)

3. Break string into a sub strings (split)

4. Replace part of a string (sub)

 

The library “re” is used to perform these tasks. This library provides different mrthods to perform the required tasks. The  most common methods are

– match()

– search()

– findall()

– split()

– sub()

– compile()

lets discuss one by one.

 

re.match():

this is used to find match in the start of the strings. For example “regular expressions in machinelearning” call the match() function to find the pattern “regular” so the regular will match because it is starting of the string and look for machinelearning it will not match because it is not the starting of the string.

if you want to find the “nlp” in given string lets see what we get.

 

re.search():

it is similar to match but there is no restrictions to find the string at starting position only.this method is able to find the string at any position in the sentence.

 

For example:

 

re.findall ():

by using the findall method to get all the list of matching patterns. In this there is no constrains to find the starting to ending positions.

For example:

 

re.split():

this will be used for the splitting.

For example:

In this split method we are having one more paramter that is maxsplit. It is dafalut by 0 if we increasing the that number ihe maximum splits that can be done.

 

re.sub():

this is used to search the pattern and replace with the new strting. If threte is no specific word to replace the sentence won’t  changed.

For example:

 

re.compile():

this is used to combine a regular expressions into pattern objects and used for the pattern matching.

Here we are having some operators to extract the characters for our conivient way. The most commonly used operators are :

 

1. Get the first word of a string:

In above space is also extracted for that we can avoid the space  by using \w instead of .

2.Extract the words:

Here also we have to avoid the spaces by using \w+ instead of \w*

 

3.Extract the words using (^)

If you want to fetch the starting and ending of words in the sentence.

The ending word fetched by using \w+$ instead of ^\w+.

4.First two characters of each word:

every word is divided into two characters using \w\w.

Fetching two characters in each word using \b\w instaed of \w\w.

5.Extract all the characters after @.

if you want to .com also then we can use @\w+.\w+

 

6.Fetching only domain name using @\w+.(\w+).

 

7.Fetching the words starts with alphabets.

in above the ‘egular’ is a meaning less . so  we can drop this by using \b

 

8.Extract information from html files between the <tr> and <td>.

Here the sample html file.

 

In this article we are clearly explain the regular expressions , meta characters and methods. Here am explaind most common useful regular expressions and cover all the most common methods to solve our regular expression problems.