vocabulary - Tensorflow vocabularyprocessor -

- April 15, 2011

i following wildml blog on text classification using tensorflow. not able understand purpose of max_document_length in code statement :

vocab_processor = learn.preprocessing.vocabularyprocessor(max_document_length)

also how can extract vocabulary vocab_processor

i have figured out how extract vocabulary vocabularyprocessor object. worked me.

import numpy np tensorflow.contrib import learn  x_text = ['this cat','this must boy', 'this a dog'] max_document_length = max([len(x.split(" ")) x in x_text])  ## create vocabularyprocessor object, setting max lengh of documents. vocab_processor = learn.preprocessing.vocabularyprocessor(max_document_length)  ## transform documents using vocabulary. x = np.array(list(vocab_processor.fit_transform(x_text)))      ## extract word:id mapping object. vocab_dict = vocab_processor.vocabulary_._mapping  ## sort vocabulary dictionary on basis of values(id). ## both statements perform same task. #sorted_vocab = sorted(vocab_dict.items(), key=operator.itemgetter(1)) sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1])  ## treat id's index list , create list of words in ascending order of id's ## word id goes @ index of list. vocabulary = list(list(zip(*sorted_vocab))[0])  print(vocabulary) print(x)

Search This Blog

QR

vocabulary - Tensorflow vocabularyprocessor -

Comments

Post a Comment

Popular posts from this blog

java - .class files under target/classes folder Maven -

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -