Match URLs using regular expressions in Python

April 28, 2020

Match URLs using regular expressions in Python

Match URLs using regular expressions in Python:

In this article, I will explain you about how to match the URL using regular expressions in python.

What is Regular Expressions?

Regular expression generally represented as regex or regexp is a sequence of characters which can define a search pattern. A regular expression can be anything like date, numbers, lowercase and uppercase text, URL’s etc., and most of the times we will see a combination of all. We will use the regular expressions to perform string operations in real-world cases like input validation, validating email address, phone number etc.

Regular expressions are mostly used in search engines for crawling the text (web scraping). You can observe the regular expression capabilities in almost every programming language which might be available built-in or through installing libraries.

Here, we will learn how to understand the pattern and match the URL using python library “re“.

“re” is regular expression library that is available with python programming language. “re” is resourceful library to work with any type of patterns by its own provided methods and functions.

So, our main aim to extract the Urls from the given statement.

We have to Import the library ‘re‘ which is available in python by default.

import re

we need to assign the statement to the variable

var= 'You can understand the regular expressions in this link https://en.wikipedia.org/wiki/Regular_expression and you can get more practice using http://www.i2tutorials.com and you can get the python documentation from http://python.org'

Using “findall” method in re to extract the matched url patterns of different protocols like http/ftp/https and also different symbols that include Colon (:),forward slash( / )

re.findall(r'(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?', var)

Below is the complete program:

import re

var= 'You can understand the regular expressions in this link https://en.wikipedia.org/wiki/Regular_expression and you can get more practice using http://www.i2tutorials.com and you can get the python documentation from http://python.org'

re.findall(r'(http|ftp|https):\/\/([\w\-_]+(?:(?:\.[\w\-_]+)+))([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?', var)

Now, you can understand the output clearly shows us the extracted URL patterns from the given data as a list of tuples.

Output:

[('https', 'en.wikipedia.org', '/wiki/Regular_expression'),
 ('http', 'www.i2tutorials.com', ''),
 ('http', 'python.org', '')]

Tag: MATCH URLS USING REGULAR EXPRESSIONS IN PYTHON

Match URLs using regular expressions in Python

Leave a comment Cancel reply

Top Tutorials

Recent Posts

What Are The Benefits Of Doing Hard Drive Destruction?

How to Shorten an Essay Effectively (with An Essay Shortener)

A Complete Guide To How To Make A Food Ordering Website Know the Market, Features, And Process To Build A Food Ordering Website

Benefits of Creating a Blog to Market Your Product or Service

Machine Learning and Predictive Analytics

Work with us

Contact Us

Subscribe to Newsletter

Jobs

Match URLs using regular expressions in Python

Related Posts

Leave a comment Cancel reply