Tuesday, July 24, 2012

Shingling in Python

As a short companion piece to my post about Shingling in XQuery, and as an exercise to keep my Python skills sharp, I rewrote my XQuery example in Python.

Character Shingling in Python

It is slightly easier to find examples of shingling in Python. For example, I found this one line example of character shingling in this blog post

[word[i:i + n] for i in range(len(word) - n + 1)]

This makes use of Python's list comprehension technique to succinctly render n-shingles for a given word. I used that as inspiration for my word-based w-shingle method.

Word Shingling in Python

As before, I am only interested in shingles that start with what look like stop words (approximated as being words consisting of fewer than 4 characters).

theString = "the quick brown fox jumps over the lazy dog. now is the time for all good men to come to the aid of the party"
shingleLength = 3
tokens = theString.split()

print [tokens[i:i+shingleLength] for i in range(len(tokens) - shingleLength + 1) if len(tokens[i]) < 4]


No comments:

Post a Comment