vocabulary - Tensorflow vocabularyprocessor -


i following wildml blog on text classification using tensorflow. not able understand purpose of max_document_length in code statement :

vocab_processor = learn.preprocessing.vocabularyprocessor(max_document_length) 

also how can extract vocabulary vocab_processor

i have figured out how extract vocabulary vocabularyprocessor object. worked me.

import numpy np tensorflow.contrib import learn  x_text = ['this cat','this must boy', 'this a dog'] max_document_length = max([len(x.split(" ")) x in x_text])  ## create vocabularyprocessor object, setting max lengh of documents. vocab_processor = learn.preprocessing.vocabularyprocessor(max_document_length)  ## transform documents using vocabulary. x = np.array(list(vocab_processor.fit_transform(x_text)))      ## extract word:id mapping object. vocab_dict = vocab_processor.vocabulary_._mapping  ## sort vocabulary dictionary on basis of values(id). ## both statements perform same task. #sorted_vocab = sorted(vocab_dict.items(), key=operator.itemgetter(1)) sorted_vocab = sorted(vocab_dict.items(), key = lambda x : x[1])  ## treat id's index list , create list of words in ascending order of id's ## word id goes @ index of list. vocabulary = list(list(zip(*sorted_vocab))[0])  print(vocabulary) print(x) 

Comments

Popular posts from this blog

account - Script error login visual studio DefaultLogin_PCore.js -

xcode - CocoaPod Storyboard error: -