Text Processing in Python

Text Processing in Python
Language: EnglishFormat: CHM
Author: David Mertz File size: 848.4 KB
Publisher: Addison Wesley ProfessionalPublish date: June 2003
IT Books in Amazon

Text Processing in Python is an example-driven, hands-on tutorial that carefully teaches programmers how to accomplish numerous text processing tasks using the Python language. Filled with concrete examples, this book provides efficient and effective solutions to specific text processing problems and practical strategies for dealing with all types of text processing challenges.

Text Processing in Python begins with an introduction to text processing and contains a quick Python tutorial to get you up to speed. It then delves into essential text processing subject areas, including string operations, regular expressions, parsers and state machines, and Internet tools and techniques. Appendixes cover such important topics as data compression and Unicode. A comprehensive index and plentiful cross-referencing offer easy access to available information. In addition, exercises throughout the book provide readers with further opportunity to hone their skills either on their own or in the classroom. A companion Web site (http://gnosis.cx/TPiP) contains source code and examples from the book.

Here is some of what you will find in thie book:

* When do I use formal parsers to process structured and semi-structured data?

* How do I work with full text indexing?

* What patterns in text can be expressed using regular expressions?

* How do I find a URL or an email address in text?

* How do I process a report with a concrete state machine?

* How do I parse, create, and manipulate internet formats?

* How do I handle lossless and lossy compression?

* How do I find codepoints in Unicode?

Table of contents
  • Chapter 1. Python Basics
    • Section 1.1. Techniques and Patterns
    • Section 1.2. Standard Modules
    • Section 1.3. Other Modules in the Standard Library
  • Chapter 2. Basic String Operations
    • Section 2.1. Some Common Tasks
    • Section 2.2. Standard Modules
    • Section 2.3. Solving Problems
  • Chapter 3. Regular Expressions
    • Section 3.1. A Regular Expression Tutorial
    • Section 3.2. Some Common Tasks
    • Section 3.3. Standard Modules
  • Chapter 4. Parsers and State Machines
    • Section 4.1. An Introduction to Parsers
    • Section 4.2. An Introduction to State Machines
    • Section 4.3. Parser Libraries for Python
  • Chapter 5. Internet Tools and Techniques
    • Section 5.1. Working with Email and Newsgroups
    • Section 5.2. World Wide Web Applications
    • Section 5.3. Synopses of Other Internet Modules
    • Section 5.4. Understanding XML
  • Appendix A. A Selective and Impressionistic Short Review of Python
    • Section A.1. What Kind of Language Is Python?
    • Section A.2. Namespaces and Bindings
    • Section A.3. Datatypes
    • Section A.4. Flow Control
    • Section A.5. Functional Programming
  • Appendix B. A Data Compression Primer
    • Section B.1. Introduction
    • Section B.2. Lossless and Lossy Compression
    • Section B.3. A Data Set Example
    • Section B.4. Whitespace Compression
    • Section B.5. Run-Length Encoding
    • Section B.6. Huffman Encoding
    • Section B.7. Lempel Ziv-Compression
    • Section B.8. Solving the Right Problem
    • Section B.9. A Custom Text Compressor
    • Section B.10. References
  • Appendix C. Understanding Unicode
    • Section C.1. Some Background on Characters
    • Section C.2. What Is Unicode?
    • Section C.3. Encodings
    • Section C.4. Declarations
    • Section C.5. Finding Codepoints
    • Section C.6. Resources
  • Appendix D. A State Machine for Adding Markup to Text
  • Appendix E. Glossary