Sentence representation in Transformer The Next CEO of Stack Overflow2019 Community Moderator ElectionImage Captioning example in keras Approach?How to user Keras's Embedding Layer properly?Logic in sentence : tree representationTensorflow regression predicting 1 for all inputsHow to concatenate feature vectors of different dimensions?Ways to Encode context for text classification?How to dual encode two sentences to show similarity scoreArchitecture for linear regression with variable input where each input is n-sized one-hot encodedArchitecture help for multivariate input and output LSTM modelsHow to choose the number of output channels in a convolutional layer?
Is there a difference between "Fahrstuhl" and "Aufzug"
What happens if you roll doubles 3 times then land on "Go to jail?"
What is the difference between "behavior" and "behaviour"?
Is it safe to use c_str() on a temporary string?
How to count occurrences of text in a file?
LWC - Unit Testing NavigationMixin.GenerateUrl
What is the purpose of the Evocation wizard's Potent Cantrip feature?
Monthly twice production release for my software project
What is the difference between Sanyaas and Vairagya?
How can I get through very long and very dry, but also very useful technical documents when learning a new tool?
Why do airplanes bank sharply to the right after air-to-air refueling?
How easy is it to start Magic from scratch?
Opposite of a diet
How to start emacs in "nothing" mode (`fundamental-mode`)
Robert Sheckley short story about vacation spots being overwhelmed
Is HostGator storing my password in plaintext?
Complex fractions
BOOM! All Clear for Mr. T
Why do we use the plural of movies in this phrase "We went to the movies last night."?
What can we do to stop prior company from asking us questions?
Can we say or write : "No, it'sn't"?
Why did we only see the N-1 starfighters in one film?
How to write the block matrix in LaTex?
Are there languages with no euphemisms?
Sentence representation in Transformer
The Next CEO of Stack Overflow2019 Community Moderator ElectionImage Captioning example in keras Approach?How to user Keras's Embedding Layer properly?Logic in sentence : tree representationTensorflow regression predicting 1 for all inputsHow to concatenate feature vectors of different dimensions?Ways to Encode context for text classification?How to dual encode two sentences to show similarity scoreArchitecture for linear regression with variable input where each input is n-sized one-hot encodedArchitecture help for multivariate input and output LSTM modelsHow to choose the number of output channels in a convolutional layer?
$begingroup$
I am trying my hand at BERT and I got so far that I can feed a sentence into BertTokenizer, and run that through the BERT model which gives me the output layers back. Modified from PyTorch code over at HuggingFace.
import logging
import torch
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler
from pytorch_pretrained_bert.tokenization import BertTokenizer
from pytorch_pretrained_bert.modeling import BertModel
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
datefmt = '%m/%d/%Y %H:%M:%S',
level = logging.INFO)
logger = logging.getLogger(__name__)
class InputFeatures(object):
def __init__(self, tokens, input_ids, input_mask, input_type_ids):
self.tokens = tokens
self.input_ids = input_ids
self.input_mask = input_mask
self.input_type_ids = input_type_ids
def convert_sentences_to_features(sentences, max_seq_length, tokenizer):
features = []
for sentence in sentences:
# tokenizer will also separate on punctuation
# see https://github.com/google-research/bert#tokenization
tokens = tokenizer.tokenize(sentence)
# limit size of tokens
if len(tokens) > max_seq_length - 2:
tokens = tokens[0:(max_seq_length - 2)]
# add [CLS] and [SEP], as expected in BERT
tokens = ['[CLS]', *tokens, '[SEP]']
input_type_ids = [0] * len(tokens)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
# The mask has 1 for real tokens and 0 for padding tokens. Only real
# tokens are attended to.
input_mask = [1] * len(input_ids)
# Zero-pad up to the sequence length.
while len(input_ids) < max_seq_length:
input_ids.append(0)
input_mask.append(0)
input_type_ids.append(0)
features.append(InputFeatures(tokens=tokens,
input_ids=input_ids,
input_mask=input_mask,
input_type_ids=input_type_ids)
)
return features
def main(sentences, layers='-1, -2, -3, -4', max_seq_length=512, bert_model='bert-large-uncased',
do_lower_case=True, batch_size=32, no_cuda=False):
device = torch.device('cuda' if torch.cuda.is_available() and not no_cuda else 'cpu')
# 'layers' indicates which layers we want to concatenate
layer_idxs = [int(l) for l in layers.split(',')]
# init tokenizer
tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=do_lower_case)
# returns a list of 'InputFeatures'
features = convert_sentences_to_features(sentences, max_seq_length, tokenizer)
# init model and move to device
model = BertModel.from_pretrained(bert_model)
model.to(device)
# extract IDs and mask from the features
all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)
# prepare dataset and dataloader
eval_data = TensorDataset(all_input_ids, all_input_mask)
eval_sampler = SequentialSampler(eval_data)
eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=batch_size)
model.eval()
for input_ids, input_mask in eval_dataloader:
input_ids = input_ids.to(device)
input_mask = input_mask.to(device)
all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask)
# put layers to concatenate in list, and use torch.cat
layers_to_concat = [all_encoder_layers[idx] for idx in layer_idxs]
concat = torch.cat(layers_to_concat, dim=-1)
logger.info(concat.size())
logger.info(concat)
if __name__ == "__main__":
proc_args =
'sentences': ['I saw Bert today !', 'Do you like bananas ?', 'Some sentences are really horrendous to parse .'],
'max_seq_length': 32
main(**proc_args)
This works and gives me an output size of 3, 32, 4096
in this case. This comes down to batch_size, seq_length, layers*hidden_size
. In practice, this means that I have a representation of each token in its context. But I would like to extract the sentence representation from this. In an (bidirectional) RNN, you would typically take the top leftmost node's output which contained the latest hidden state and latest output, but I am not sure whether this is also true for transformers, as the architecture is different.
What is the best way to extract the sentence representation from a sequence of representations in a transformer? The size would be batch_size, hidden_size
.
deep-learning pytorch natural-language-process transformer bert
$endgroup$
add a comment |
$begingroup$
I am trying my hand at BERT and I got so far that I can feed a sentence into BertTokenizer, and run that through the BERT model which gives me the output layers back. Modified from PyTorch code over at HuggingFace.
import logging
import torch
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler
from pytorch_pretrained_bert.tokenization import BertTokenizer
from pytorch_pretrained_bert.modeling import BertModel
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
datefmt = '%m/%d/%Y %H:%M:%S',
level = logging.INFO)
logger = logging.getLogger(__name__)
class InputFeatures(object):
def __init__(self, tokens, input_ids, input_mask, input_type_ids):
self.tokens = tokens
self.input_ids = input_ids
self.input_mask = input_mask
self.input_type_ids = input_type_ids
def convert_sentences_to_features(sentences, max_seq_length, tokenizer):
features = []
for sentence in sentences:
# tokenizer will also separate on punctuation
# see https://github.com/google-research/bert#tokenization
tokens = tokenizer.tokenize(sentence)
# limit size of tokens
if len(tokens) > max_seq_length - 2:
tokens = tokens[0:(max_seq_length - 2)]
# add [CLS] and [SEP], as expected in BERT
tokens = ['[CLS]', *tokens, '[SEP]']
input_type_ids = [0] * len(tokens)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
# The mask has 1 for real tokens and 0 for padding tokens. Only real
# tokens are attended to.
input_mask = [1] * len(input_ids)
# Zero-pad up to the sequence length.
while len(input_ids) < max_seq_length:
input_ids.append(0)
input_mask.append(0)
input_type_ids.append(0)
features.append(InputFeatures(tokens=tokens,
input_ids=input_ids,
input_mask=input_mask,
input_type_ids=input_type_ids)
)
return features
def main(sentences, layers='-1, -2, -3, -4', max_seq_length=512, bert_model='bert-large-uncased',
do_lower_case=True, batch_size=32, no_cuda=False):
device = torch.device('cuda' if torch.cuda.is_available() and not no_cuda else 'cpu')
# 'layers' indicates which layers we want to concatenate
layer_idxs = [int(l) for l in layers.split(',')]
# init tokenizer
tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=do_lower_case)
# returns a list of 'InputFeatures'
features = convert_sentences_to_features(sentences, max_seq_length, tokenizer)
# init model and move to device
model = BertModel.from_pretrained(bert_model)
model.to(device)
# extract IDs and mask from the features
all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)
# prepare dataset and dataloader
eval_data = TensorDataset(all_input_ids, all_input_mask)
eval_sampler = SequentialSampler(eval_data)
eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=batch_size)
model.eval()
for input_ids, input_mask in eval_dataloader:
input_ids = input_ids.to(device)
input_mask = input_mask.to(device)
all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask)
# put layers to concatenate in list, and use torch.cat
layers_to_concat = [all_encoder_layers[idx] for idx in layer_idxs]
concat = torch.cat(layers_to_concat, dim=-1)
logger.info(concat.size())
logger.info(concat)
if __name__ == "__main__":
proc_args =
'sentences': ['I saw Bert today !', 'Do you like bananas ?', 'Some sentences are really horrendous to parse .'],
'max_seq_length': 32
main(**proc_args)
This works and gives me an output size of 3, 32, 4096
in this case. This comes down to batch_size, seq_length, layers*hidden_size
. In practice, this means that I have a representation of each token in its context. But I would like to extract the sentence representation from this. In an (bidirectional) RNN, you would typically take the top leftmost node's output which contained the latest hidden state and latest output, but I am not sure whether this is also true for transformers, as the architecture is different.
What is the best way to extract the sentence representation from a sequence of representations in a transformer? The size would be batch_size, hidden_size
.
deep-learning pytorch natural-language-process transformer bert
$endgroup$
add a comment |
$begingroup$
I am trying my hand at BERT and I got so far that I can feed a sentence into BertTokenizer, and run that through the BERT model which gives me the output layers back. Modified from PyTorch code over at HuggingFace.
import logging
import torch
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler
from pytorch_pretrained_bert.tokenization import BertTokenizer
from pytorch_pretrained_bert.modeling import BertModel
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
datefmt = '%m/%d/%Y %H:%M:%S',
level = logging.INFO)
logger = logging.getLogger(__name__)
class InputFeatures(object):
def __init__(self, tokens, input_ids, input_mask, input_type_ids):
self.tokens = tokens
self.input_ids = input_ids
self.input_mask = input_mask
self.input_type_ids = input_type_ids
def convert_sentences_to_features(sentences, max_seq_length, tokenizer):
features = []
for sentence in sentences:
# tokenizer will also separate on punctuation
# see https://github.com/google-research/bert#tokenization
tokens = tokenizer.tokenize(sentence)
# limit size of tokens
if len(tokens) > max_seq_length - 2:
tokens = tokens[0:(max_seq_length - 2)]
# add [CLS] and [SEP], as expected in BERT
tokens = ['[CLS]', *tokens, '[SEP]']
input_type_ids = [0] * len(tokens)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
# The mask has 1 for real tokens and 0 for padding tokens. Only real
# tokens are attended to.
input_mask = [1] * len(input_ids)
# Zero-pad up to the sequence length.
while len(input_ids) < max_seq_length:
input_ids.append(0)
input_mask.append(0)
input_type_ids.append(0)
features.append(InputFeatures(tokens=tokens,
input_ids=input_ids,
input_mask=input_mask,
input_type_ids=input_type_ids)
)
return features
def main(sentences, layers='-1, -2, -3, -4', max_seq_length=512, bert_model='bert-large-uncased',
do_lower_case=True, batch_size=32, no_cuda=False):
device = torch.device('cuda' if torch.cuda.is_available() and not no_cuda else 'cpu')
# 'layers' indicates which layers we want to concatenate
layer_idxs = [int(l) for l in layers.split(',')]
# init tokenizer
tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=do_lower_case)
# returns a list of 'InputFeatures'
features = convert_sentences_to_features(sentences, max_seq_length, tokenizer)
# init model and move to device
model = BertModel.from_pretrained(bert_model)
model.to(device)
# extract IDs and mask from the features
all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)
# prepare dataset and dataloader
eval_data = TensorDataset(all_input_ids, all_input_mask)
eval_sampler = SequentialSampler(eval_data)
eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=batch_size)
model.eval()
for input_ids, input_mask in eval_dataloader:
input_ids = input_ids.to(device)
input_mask = input_mask.to(device)
all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask)
# put layers to concatenate in list, and use torch.cat
layers_to_concat = [all_encoder_layers[idx] for idx in layer_idxs]
concat = torch.cat(layers_to_concat, dim=-1)
logger.info(concat.size())
logger.info(concat)
if __name__ == "__main__":
proc_args =
'sentences': ['I saw Bert today !', 'Do you like bananas ?', 'Some sentences are really horrendous to parse .'],
'max_seq_length': 32
main(**proc_args)
This works and gives me an output size of 3, 32, 4096
in this case. This comes down to batch_size, seq_length, layers*hidden_size
. In practice, this means that I have a representation of each token in its context. But I would like to extract the sentence representation from this. In an (bidirectional) RNN, you would typically take the top leftmost node's output which contained the latest hidden state and latest output, but I am not sure whether this is also true for transformers, as the architecture is different.
What is the best way to extract the sentence representation from a sequence of representations in a transformer? The size would be batch_size, hidden_size
.
deep-learning pytorch natural-language-process transformer bert
$endgroup$
I am trying my hand at BERT and I got so far that I can feed a sentence into BertTokenizer, and run that through the BERT model which gives me the output layers back. Modified from PyTorch code over at HuggingFace.
import logging
import torch
from torch.utils.data import TensorDataset, DataLoader, SequentialSampler
from pytorch_pretrained_bert.tokenization import BertTokenizer
from pytorch_pretrained_bert.modeling import BertModel
logging.basicConfig(format = '%(asctime)s - %(levelname)s - %(name)s - %(message)s',
datefmt = '%m/%d/%Y %H:%M:%S',
level = logging.INFO)
logger = logging.getLogger(__name__)
class InputFeatures(object):
def __init__(self, tokens, input_ids, input_mask, input_type_ids):
self.tokens = tokens
self.input_ids = input_ids
self.input_mask = input_mask
self.input_type_ids = input_type_ids
def convert_sentences_to_features(sentences, max_seq_length, tokenizer):
features = []
for sentence in sentences:
# tokenizer will also separate on punctuation
# see https://github.com/google-research/bert#tokenization
tokens = tokenizer.tokenize(sentence)
# limit size of tokens
if len(tokens) > max_seq_length - 2:
tokens = tokens[0:(max_seq_length - 2)]
# add [CLS] and [SEP], as expected in BERT
tokens = ['[CLS]', *tokens, '[SEP]']
input_type_ids = [0] * len(tokens)
input_ids = tokenizer.convert_tokens_to_ids(tokens)
# The mask has 1 for real tokens and 0 for padding tokens. Only real
# tokens are attended to.
input_mask = [1] * len(input_ids)
# Zero-pad up to the sequence length.
while len(input_ids) < max_seq_length:
input_ids.append(0)
input_mask.append(0)
input_type_ids.append(0)
features.append(InputFeatures(tokens=tokens,
input_ids=input_ids,
input_mask=input_mask,
input_type_ids=input_type_ids)
)
return features
def main(sentences, layers='-1, -2, -3, -4', max_seq_length=512, bert_model='bert-large-uncased',
do_lower_case=True, batch_size=32, no_cuda=False):
device = torch.device('cuda' if torch.cuda.is_available() and not no_cuda else 'cpu')
# 'layers' indicates which layers we want to concatenate
layer_idxs = [int(l) for l in layers.split(',')]
# init tokenizer
tokenizer = BertTokenizer.from_pretrained(bert_model, do_lower_case=do_lower_case)
# returns a list of 'InputFeatures'
features = convert_sentences_to_features(sentences, max_seq_length, tokenizer)
# init model and move to device
model = BertModel.from_pretrained(bert_model)
model.to(device)
# extract IDs and mask from the features
all_input_ids = torch.tensor([f.input_ids for f in features], dtype=torch.long)
all_input_mask = torch.tensor([f.input_mask for f in features], dtype=torch.long)
# prepare dataset and dataloader
eval_data = TensorDataset(all_input_ids, all_input_mask)
eval_sampler = SequentialSampler(eval_data)
eval_dataloader = DataLoader(eval_data, sampler=eval_sampler, batch_size=batch_size)
model.eval()
for input_ids, input_mask in eval_dataloader:
input_ids = input_ids.to(device)
input_mask = input_mask.to(device)
all_encoder_layers, _ = model(input_ids, token_type_ids=None, attention_mask=input_mask)
# put layers to concatenate in list, and use torch.cat
layers_to_concat = [all_encoder_layers[idx] for idx in layer_idxs]
concat = torch.cat(layers_to_concat, dim=-1)
logger.info(concat.size())
logger.info(concat)
if __name__ == "__main__":
proc_args =
'sentences': ['I saw Bert today !', 'Do you like bananas ?', 'Some sentences are really horrendous to parse .'],
'max_seq_length': 32
main(**proc_args)
This works and gives me an output size of 3, 32, 4096
in this case. This comes down to batch_size, seq_length, layers*hidden_size
. In practice, this means that I have a representation of each token in its context. But I would like to extract the sentence representation from this. In an (bidirectional) RNN, you would typically take the top leftmost node's output which contained the latest hidden state and latest output, but I am not sure whether this is also true for transformers, as the architecture is different.
What is the best way to extract the sentence representation from a sequence of representations in a transformer? The size would be batch_size, hidden_size
.
deep-learning pytorch natural-language-process transformer bert
deep-learning pytorch natural-language-process transformer bert
asked Mar 22 at 11:37
Bram VanroyBram Vanroy
359
359
add a comment |
add a comment |
0
active
oldest
votes
Your Answer
StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
);
);
, "mathjax-editing");
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "557"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47782%2fsentence-representation-in-transformer%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
0
active
oldest
votes
0
active
oldest
votes
active
oldest
votes
active
oldest
votes
Thanks for contributing an answer to Data Science Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdatascience.stackexchange.com%2fquestions%2f47782%2fsentence-representation-in-transformer%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown