Sentence representation in Transformer The Next CEO of Stack Overflow2019 Community Moderator ElectionImage Captioning example in keras Approach?How to user Keras's Embedding Layer properly?Logic in sentence : tree representationTensorflow regression predicting 1 for all inputsHow to concatenate feature vectors of different dimensions?Ways to Encode context for text classification?How to dual encode two sentences to show similarity scoreArchitecture for linear regression with variable input where each input is n-sized one-hot encodedArchitecture help for multivariate input and output LSTM modelsHow to choose the number of output channels in a convolutional layer?

Is there a difference between "Fahrstuhl" and "Aufzug"

What happens if you roll doubles 3 times then land on "Go to jail?"

What is the difference between "behavior" and "behaviour"?

Is it safe to use c_str() on a temporary string?

How to count occurrences of text in a file?

LWC - Unit Testing NavigationMixin.GenerateUrl

What is the purpose of the Evocation wizard's Potent Cantrip feature?

Monthly twice production release for my software project

What is the difference between Sanyaas and Vairagya?

How can I get through very long and very dry, but also very useful technical documents when learning a new tool?

Why do airplanes bank sharply to the right after air-to-air refueling?

How easy is it to start Magic from scratch?

Opposite of a diet

How to start emacs in "nothing" mode (`fundamental-mode`)

Robert Sheckley short story about vacation spots being overwhelmed

Is HostGator storing my password in plaintext?

Complex fractions

BOOM! All Clear for Mr. T

Why do we use the plural of movies in this phrase "We went to the movies last night."?

What can we do to stop prior company from asking us questions?

Can we say or write : "No, it'sn't"?

Why did we only see the N-1 starfighters in one film?

How to write the block matrix in LaTex?

Are there languages with no euphemisms?

Sentence representation in Transformer

The Next CEO of Stack Overflow

2019 Community Moderator ElectionImage Captioning example in keras Approach?How to user Keras's Embedding Layer properly?Logic in sentence : tree representationTensorflow regression predicting 1 for all inputsHow to concatenate feature vectors of different dimensions?Ways to Encode context for text classification?How to dual encode two sentences to show similarity scoreArchitecture for linear regression with variable input where each input is n-sized one-hot encodedArchitecture help for multivariate input and output LSTM modelsHow to choose the number of output channels in a convolutional layer?

I am trying my hand at BERT and I got so far that I can feed a sentence into BertTokenizer, and run that through the BERT model which gives me the output layers back. Modified from PyTorch code over at HuggingFace.

import logging

import torch
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler

from pytorch_pretrained_bert.tokenization import BertTokenizer
from pytorch_pretrained_bert.modeling import BertModel

logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', 
 datefmt = '%m/%d/%Y %H:%M:%S',
 level = logging.INFO)
logger = logging.getLogger(__name__)

class InputFeatures(object):
 def __init__(self, tokens, input_ids, input_mask, input_type_ids):
 self.tokens = tokens
 self.input_ids = input_ids
 self.input_mask = input_mask
 self.input_type_ids = input_type_ids

def convert_sentences_to_features(sentences, max_seq_length, tokenizer): 
 features = []
 for sentence in sentences:
 # tokenizer will also separate on punctuation
 # see https://github.com/google-research/bert#tokenization
 tokens = tokenizer.tokenize(sentence)

 # limit size of tokens
 if len(tokens) > max_seq_length - 2:
 tokens = tokens[0:(max_seq_length - 2)]

 # add [CLS] and [SEP], as expected in BERT
 tokens = ['[CLS]', *tokens, '[SEP]']


 input_type_ids = [0] * len(tokens)
 input_ids = tokenizer.convert_tokens_to_ids(tokens)

 # The mask has 1 for real tokens and 0 for padding tokens. Only real
 # tokens are attended to.
 input_mask = [1] * len(input_ids)

 # Zero-pad up to the sequence length.
 while len(input_ids) < max_seq_length:
 input_ids.append(0)
 input_mask.append(0)
 input_type_ids.append(0)

 features.append(InputFeatures(tokens=tokens, 
 input_ids=input_ids, 
 input_mask=input_mask, 
 input_type_ids=input_type_ids)
 )
 return features


def main(sentences, layers='-1, -2, -3, -4', max_seq_length=512, bert_model='bert-large-uncased', 
 do_lower_case=True, batch_size=32, no_cuda=False):
 device = torch.device('cuda' if torch.cuda.is_available() and not no_cuda else 'cpu')

 # 'layers' indicates which layers we want to concatenate
 layer_idxs = [int(l) for l in layers.split(',')]

 # init tokenizer
 tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=do_lower_case)

 # returns a list of 'InputFeatures'
 features = convert_sentences_to_features(sentences, max_seq_length, tokenizer)

 # init model and move to device
 model = BertModel.from_pretrained(bert_model)
 model.to(device) 

 # extract IDs and mask from the features
 all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
 all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)

 # prepare dataset and dataloader
 eval_data = TensorDataset(all_input_ids, all_input_mask)
 eval_sampler = SequentialSampler(eval_data)
 eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=batch_size)

 model.eval()

 for input_ids, input_mask in eval_dataloader:
 input_ids = input_ids.to(device)
 input_mask = input_mask.to(device)

 all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask)

 # put layers to concatenate in list, and use torch.cat
 layers_to_concat = [all_encoder_layers[idx] for idx in layer_idxs]
 concat = torch.cat(layers_to_concat, dim=-1)

 logger.info(concat.size())
 logger.info(concat)

if __name__ == "__main__":
 proc_args = 
 'sentences': ['I saw Bert today !', 'Do you like bananas ?', 'Some sentences are really horrendous to parse .'],
 'max_seq_length': 32
 
 main(**proc_args)

This works and gives me an output size of 3, 32, 4096 in this case. This comes down to batch_size, seq_length, layers*hidden_size. In practice, this means that I have a representation of each token in its context. But I would like to extract the sentence representation from this. In an (bidirectional) RNN, you would typically take the top leftmost node's output which contained the latest hidden state and latest output, but I am not sure whether this is also true for transformers, as the architecture is different.

What is the best way to extract the sentence representation from a sequence of representations in a transformer? The size would be batch_size, hidden_size.

asked Mar 22 at 11:37

Bram Vanroy

359

add a comment |

import logging

import torch
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler

from pytorch_pretrained_bert.tokenization import BertTokenizer
from pytorch_pretrained_bert.modeling import BertModel

logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', 
 datefmt = '%m/%d/%Y %H:%M:%S',
 level = logging.INFO)
logger = logging.getLogger(__name__)

class InputFeatures(object):
 def __init__(self, tokens, input_ids, input_mask, input_type_ids):
 self.tokens = tokens
 self.input_ids = input_ids
 self.input_mask = input_mask
 self.input_type_ids = input_type_ids

def convert_sentences_to_features(sentences, max_seq_length, tokenizer): 
 features = []
 for sentence in sentences:
 # tokenizer will also separate on punctuation
 # see https://github.com/google-research/bert#tokenization
 tokens = tokenizer.tokenize(sentence)

 # limit size of tokens
 if len(tokens) > max_seq_length - 2:
 tokens = tokens[0:(max_seq_length - 2)]

 # add [CLS] and [SEP], as expected in BERT
 tokens = ['[CLS]', *tokens, '[SEP]']


 input_type_ids = [0] * len(tokens)
 input_ids = tokenizer.convert_tokens_to_ids(tokens)

 # The mask has 1 for real tokens and 0 for padding tokens. Only real
 # tokens are attended to.
 input_mask = [1] * len(input_ids)

 # Zero-pad up to the sequence length.
 while len(input_ids) < max_seq_length:
 input_ids.append(0)
 input_mask.append(0)
 input_type_ids.append(0)

 features.append(InputFeatures(tokens=tokens, 
 input_ids=input_ids, 
 input_mask=input_mask, 
 input_type_ids=input_type_ids)
 )
 return features


def main(sentences, layers='-1, -2, -3, -4', max_seq_length=512, bert_model='bert-large-uncased', 
 do_lower_case=True, batch_size=32, no_cuda=False):
 device = torch.device('cuda' if torch.cuda.is_available() and not no_cuda else 'cpu')

 # 'layers' indicates which layers we want to concatenate
 layer_idxs = [int(l) for l in layers.split(',')]

 # init tokenizer
 tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=do_lower_case)

 # returns a list of 'InputFeatures'
 features = convert_sentences_to_features(sentences, max_seq_length, tokenizer)

 # init model and move to device
 model = BertModel.from_pretrained(bert_model)
 model.to(device) 

 # extract IDs and mask from the features
 all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
 all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)

 # prepare dataset and dataloader
 eval_data = TensorDataset(all_input_ids, all_input_mask)
 eval_sampler = SequentialSampler(eval_data)
 eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=batch_size)

 model.eval()

 for input_ids, input_mask in eval_dataloader:
 input_ids = input_ids.to(device)
 input_mask = input_mask.to(device)

 all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask)

 # put layers to concatenate in list, and use torch.cat
 layers_to_concat = [all_encoder_layers[idx] for idx in layer_idxs]
 concat = torch.cat(layers_to_concat, dim=-1)

 logger.info(concat.size())
 logger.info(concat)

if __name__ == "__main__":
 proc_args = 
 'sentences': ['I saw Bert today !', 'Do you like bananas ?', 'Some sentences are really horrendous to parse .'],
 'max_seq_length': 32
 
 main(**proc_args)

What is the best way to extract the sentence representation from a sequence of representations in a transformer? The size would be batch_size, hidden_size.

asked Mar 22 at 11:37

Bram Vanroy

359

add a comment |

import logging

import torch
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler

from pytorch_pretrained_bert.tokenization import BertTokenizer
from pytorch_pretrained_bert.modeling import BertModel

logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', 
 datefmt = '%m/%d/%Y %H:%M:%S',
 level = logging.INFO)
logger = logging.getLogger(__name__)

class InputFeatures(object):
 def __init__(self, tokens, input_ids, input_mask, input_type_ids):
 self.tokens = tokens
 self.input_ids = input_ids
 self.input_mask = input_mask
 self.input_type_ids = input_type_ids

def convert_sentences_to_features(sentences, max_seq_length, tokenizer): 
 features = []
 for sentence in sentences:
 # tokenizer will also separate on punctuation
 # see https://github.com/google-research/bert#tokenization
 tokens = tokenizer.tokenize(sentence)

 # limit size of tokens
 if len(tokens) > max_seq_length - 2:
 tokens = tokens[0:(max_seq_length - 2)]

 # add [CLS] and [SEP], as expected in BERT
 tokens = ['[CLS]', *tokens, '[SEP]']


 input_type_ids = [0] * len(tokens)
 input_ids = tokenizer.convert_tokens_to_ids(tokens)

 # The mask has 1 for real tokens and 0 for padding tokens. Only real
 # tokens are attended to.
 input_mask = [1] * len(input_ids)

 # Zero-pad up to the sequence length.
 while len(input_ids) < max_seq_length:
 input_ids.append(0)
 input_mask.append(0)
 input_type_ids.append(0)

 features.append(InputFeatures(tokens=tokens, 
 input_ids=input_ids, 
 input_mask=input_mask, 
 input_type_ids=input_type_ids)
 )
 return features


def main(sentences, layers='-1, -2, -3, -4', max_seq_length=512, bert_model='bert-large-uncased', 
 do_lower_case=True, batch_size=32, no_cuda=False):
 device = torch.device('cuda' if torch.cuda.is_available() and not no_cuda else 'cpu')

 # 'layers' indicates which layers we want to concatenate
 layer_idxs = [int(l) for l in layers.split(',')]

 # init tokenizer
 tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=do_lower_case)

 # returns a list of 'InputFeatures'
 features = convert_sentences_to_features(sentences, max_seq_length, tokenizer)

 # init model and move to device
 model = BertModel.from_pretrained(bert_model)
 model.to(device) 

 # extract IDs and mask from the features
 all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
 all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)

 # prepare dataset and dataloader
 eval_data = TensorDataset(all_input_ids, all_input_mask)
 eval_sampler = SequentialSampler(eval_data)
 eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=batch_size)

 model.eval()

 for input_ids, input_mask in eval_dataloader:
 input_ids = input_ids.to(device)
 input_mask = input_mask.to(device)

 all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask)

 # put layers to concatenate in list, and use torch.cat
 layers_to_concat = [all_encoder_layers[idx] for idx in layer_idxs]
 concat = torch.cat(layers_to_concat, dim=-1)

 logger.info(concat.size())
 logger.info(concat)

if __name__ == "__main__":
 proc_args = 
 'sentences': ['I saw Bert today !', 'Do you like bananas ?', 'Some sentences are really horrendous to parse .'],
 'max_seq_length': 32
 
 main(**proc_args)

What is the best way to extract the sentence representation from a sequence of representations in a transformer? The size would be batch_size, hidden_size.

asked Mar 22 at 11:37

Bram Vanroy

359

import logging

import torch
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler

from pytorch_pretrained_bert.tokenization import BertTokenizer
from pytorch_pretrained_bert.modeling import BertModel

logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s', 
 datefmt = '%m/%d/%Y %H:%M:%S',
 level = logging.INFO)
logger = logging.getLogger(__name__)

class InputFeatures(object):
 def __init__(self, tokens, input_ids, input_mask, input_type_ids):
 self.tokens = tokens
 self.input_ids = input_ids
 self.input_mask = input_mask
 self.input_type_ids = input_type_ids

def convert_sentences_to_features(sentences, max_seq_length, tokenizer): 
 features = []
 for sentence in sentences:
 # tokenizer will also separate on punctuation
 # see https://github.com/google-research/bert#tokenization
 tokens = tokenizer.tokenize(sentence)

 # limit size of tokens
 if len(tokens) > max_seq_length - 2:
 tokens = tokens[0:(max_seq_length - 2)]

 # add [CLS] and [SEP], as expected in BERT
 tokens = ['[CLS]', *tokens, '[SEP]']


 input_type_ids = [0] * len(tokens)
 input_ids = tokenizer.convert_tokens_to_ids(tokens)

 # The mask has 1 for real tokens and 0 for padding tokens. Only real
 # tokens are attended to.
 input_mask = [1] * len(input_ids)

 # Zero-pad up to the sequence length.
 while len(input_ids) < max_seq_length:
 input_ids.append(0)
 input_mask.append(0)
 input_type_ids.append(0)

 features.append(InputFeatures(tokens=tokens, 
 input_ids=input_ids, 
 input_mask=input_mask, 
 input_type_ids=input_type_ids)
 )
 return features


def main(sentences, layers='-1, -2, -3, -4', max_seq_length=512, bert_model='bert-large-uncased', 
 do_lower_case=True, batch_size=32, no_cuda=False):
 device = torch.device('cuda' if torch.cuda.is_available() and not no_cuda else 'cpu')

 # 'layers' indicates which layers we want to concatenate
 layer_idxs = [int(l) for l in layers.split(',')]

 # init tokenizer
 tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=do_lower_case)

 # returns a list of 'InputFeatures'
 features = convert_sentences_to_features(sentences, max_seq_length, tokenizer)

 # init model and move to device
 model = BertModel.from_pretrained(bert_model)
 model.to(device) 

 # extract IDs and mask from the features
 all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
 all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)

 # prepare dataset and dataloader
 eval_data = TensorDataset(all_input_ids, all_input_mask)
 eval_sampler = SequentialSampler(eval_data)
 eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=batch_size)

 model.eval()

 for input_ids, input_mask in eval_dataloader:
 input_ids = input_ids.to(device)
 input_mask = input_mask.to(device)

 all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask)

 # put layers to concatenate in list, and use torch.cat
 layers_to_concat = [all_encoder_layers[idx] for idx in layer_idxs]
 concat = torch.cat(layers_to_concat, dim=-1)

 logger.info(concat.size())
 logger.info(concat)

if __name__ == "__main__":
 proc_args = 
 'sentences': ['I saw Bert today !', 'Do you like bananas ?', 'Some sentences are really horrendous to parse .'],
 'max_seq_length': 32
 
 main(**proc_args)

What is the best way to extract the sentence representation from a sequence of representations in a transformer? The size would be batch_size, hidden_size.

deep-learning pytorch natural-language-process transformer bert

asked Mar 22 at 11:37

Bram Vanroy

359

asked Mar 22 at 11:37

Bram Vanroy

359

asked Mar 22 at 11:37

Bram Vanroy

359

asked Mar 22 at 11:37

Bram Vanroy

359

asked Mar 22 at 11:37

Bram Vanroy

359

add a comment |

0

active

oldest

votes

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47782%2fsentence-representation-in-transformer%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

0

active

oldest

votes

0

active

oldest

votes

draft saved

draft discarded

Thanks for contributing an answer to Data Science Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

cFd KA Hd5fF75B5h,5bNem6zEfM8,CB2nc N1r4 c bAQgH4eQ8ReEDsZ AcpHSVy t,O4lgwrC19WU fpnCAR3df1hM8ZHVV2GCmPayNcyJX

搜尋此網誌

Trjtdtk

0

Your Answer

Post as a guest

0

0

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli

0

Your Answer

Sign up or log in

Post as a guest

Post as a guest

0

0

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Tähtien Talli Jäsenet | Lähteet | NavigointivalikkoSuomen Hippos – Tähtien Talli