Python Extract Emails from Text

Minh Vu

By Minh Vu

Updated Jan 01, 2024

Figure: Python Extract Emails from Text

Disclaimer: All content on this website is derived directly from my own expertise and experiences. No AI-generated text or automated content creation tools are used.

Extracting emails from a text is a common task in Python, especially when you are cleaning the data, or building a list of emails based on a text document.

In this tutorial, I will show you how to extract emails from text in Python using regular expression (RegEx).

Contents

Extracting Emails from Text using RegEx

To extract emails from text in Python using RegEx, we can use the re module, which provides support for regular expressions. Here's a simple example:

main.py
import re def extract_emails(text): email_regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' return re.findall(email_regex, text) sample_text = "Please contact me at dminhvu.work@gmail.com or wisecode@gmail.com." emails = extract_emails(sample_text) print(emails)

This code snippet defines a function extract_emails with:

  • Input is a string text,
  • Output is a list of email addresses extracted from text based on the RegEx pattern \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b.

You can run this by using the python main.py command in the Terminal (or Command Prompt on Windows).

console
python main.py ['dminhvu.work@gmail.com', 'wisecode@gmail.com']

To understand why we can construct this email_regex pattern, you can learn more here, it has the explanation section in the right hand side.

Extracting Emails from a Text File

To extract emails from a text file, we'll read the file's content into a string using the read() method and then use the same extract_emails function defined earlier.

main.py
import re def extract_emails(text): email_regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' return re.findall(email_regex, text) def extract_emails_from_file(file_path): with open(file_path, 'r') as file: content = file.read() return extract_emails(content) file_path = 'example.txt' emails = extract_emails_from_file(file_path) print(emails)

In this example, extract_emails_from_file reads the entire content of the file located at file_path and then uses the extract_emails function to find all email addresses.

Extracting Emails from a Large Text File

When dealing with large text files, reading the entire file into memory might not be feasible. In such cases, we can process the file line by line:

extract_emails_from_large_file.py
import re def extract_emails(text): email_regex = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b' return re.findall(email_regex, text) def extract_emails_from_large_file(file_path): emails = [] with open(file_path, 'r') as file: for line in file: emails.extend(extract_emails(line)) return emails file_path = 'large_example.txt' emails = extract_emails_from_large_file(file_path) print(emails)

This function iterates over each line in the file, extracts emails from that line, and appends them to the emails list. This approach is more memory-efficient for large files.

Conclusion

In this tutorial, we've learned how to extract email addresses from strings and text files using Python.

In general, to extract email addresses from strings in Python:

  • Use the re package to extract emails,
  • with the \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b RegEx pattern to math the email pattern.

If you find the pattern does not work, please comment below so I will fix it. Thank you!

Minh Vu

Minh Vu

Software Engineer

Hi guys 👋, I'm a developer specializing in Elastic Stack and Next.js. My blog shares practical tutorials and insights based on 3+ years of hands-on experience. Open to freelance opportunities — let's get in touch!

Comments

Be the first to comment!

Leave a Comment

Receive Latest Updates 📬

Get every new post, special offers, and more via email. No fee required.