Think2SQL-3B / dataset_examples.txt
anonymous-2321's picture
Commit folder
ec95e6d verified
Training dataset length: 9428
Training Example 0:
{'db_id': 'movie_platform', 'question': 'Name movie titles released in year 1945. Sort the listing by the descending order of movie popularity.', 'evidence': 'released in the year 1945 refers to movie_release_year = 1945;', 'target_sql': 'SELECT movie_title FROM movies WHERE movie_release_year = 1945 ORDER BY movie_popularity DESC LIMIT 1', 'prompt': [{'content': 'You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>', 'role': 'system'}, {'content': 'Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema. Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question.\nQuestion:\nName movie titles released in year 1945. Sort the listing by the descending order of movie popularity.\n\nEvidence:\nreleased in the year 1945 refers to movie_release_year = 1945;\n\nDatabase Schema:\n`"movies"` (movie_id INTEGER not null primary key, movie_title TEXT, movie_release_year INTEGER, movie_url TEXT, movie_title_language TEXT, movie_popularity INTEGER, movie_image_url TEXT, director_id TEXT, director_name TEXT, director_url TEXT)\n\nReturn only the SQL script enclosed in <answer> tags.', 'role': 'user'}]}
--------------------------------------------------
Training Example 1:
{'db_id': 'movie_platform', 'question': 'State the most popular movie? When was it released and who is the director for the movie?', 'evidence': 'most popular movie refers to MAX(movie_popularity); when it was released refers to movie_release_year; director for the movie refers to director_name;', 'target_sql': 'SELECT movie_title, movie_release_year, director_name FROM movies ORDER BY movie_popularity DESC LIMIT 1 ', 'prompt': [{'content': 'You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>', 'role': 'system'}, {'content': 'Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema. Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question.\nQuestion:\nState the most popular movie? When was it released and who is the director for the movie?\n\nEvidence:\nmost popular movie refers to MAX(movie_popularity); when it was released refers to movie_release_year; director for the movie refers to director_name;\n\nDatabase Schema:\n`"movies"` (movie_id INTEGER not null primary key, movie_title TEXT, movie_release_year INTEGER, movie_url TEXT, movie_title_language TEXT, movie_popularity INTEGER, movie_image_url TEXT, director_id TEXT, director_name TEXT, director_url TEXT)\n\nReturn only the SQL script enclosed in <answer> tags.', 'role': 'user'}]}
--------------------------------------------------
Training Example 2:
{'db_id': 'movie_platform', 'question': 'What is the name of the longest movie title? When was it released?', 'evidence': 'longest movie title refers to MAX(LENGTH(movie_title)); when it was released refers to movie_release_year;', 'target_sql': 'SELECT movie_title, movie_release_year FROM movies ORDER BY LENGTH(movie_popularity) DESC LIMIT 1', 'prompt': [{'content': 'You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>', 'role': 'system'}, {'content': 'Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema. Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question.\nQuestion:\nWhat is the name of the longest movie title? When was it released?\n\nEvidence:\nlongest movie title refers to MAX(LENGTH(movie_title)); when it was released refers to movie_release_year;\n\nDatabase Schema:\n`"movies"` (movie_id INTEGER not null primary key, movie_title TEXT, movie_release_year INTEGER, movie_url TEXT, movie_title_language TEXT, movie_popularity INTEGER, movie_image_url TEXT, director_id TEXT, director_name TEXT, director_url TEXT)\n\nReturn only the SQL script enclosed in <answer> tags.', 'role': 'user'}]}
--------------------------------------------------
Training Example 3:
{'db_id': 'movie_platform', 'question': 'Name the movie with the most ratings.', 'evidence': 'movie with the most rating refers to MAX(SUM(rating_score));', 'target_sql': 'SELECT movie_title FROM movies GROUP BY movie_title ORDER BY COUNT(movie_title) DESC LIMIT 1', 'prompt': [{'content': 'You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>', 'role': 'system'}, {'content': 'Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema. Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question.\nQuestion:\nName the movie with the most ratings.\n\nEvidence:\nmovie with the most rating refers to MAX(SUM(rating_score));\n\nDatabase Schema:\n`"movies"` (movie_id INTEGER not null primary key, movie_title TEXT, movie_release_year INTEGER, movie_url TEXT, movie_title_language TEXT, movie_popularity INTEGER, movie_image_url TEXT, director_id TEXT, director_name TEXT, director_url TEXT)\n\nReturn only the SQL script enclosed in <answer> tags.', 'role': 'user'}]}
--------------------------------------------------
Training Example 4:
{'db_id': 'movie_platform', 'question': 'What is the average number of Mubi users who love movies directed by Stanley Kubrick?', 'evidence': 'average = AVG(movie_popularity); number of Mubi users who loves the movie refers to movie_popularity;', 'target_sql': "SELECT AVG(movie_popularity) FROM movies WHERE director_name = 'Stanley Kubrick'", 'prompt': [{'content': 'You are a helpful AI Assistant that provides well-reasoned and detailed responses. You first think about the reasoning process as an internal monologue and then provide the user with the answer. Respond in the following format: <think>\n...\n</think>\n<answer>\n...\n</answer>', 'role': 'system'}, {'content': 'Answer the following question with the SQL code. Use the piece of evidence and base your answer on the database schema. Given the question, the evidence and the database schema, return in the <answer> tags only the SQL script that addresses the question.\nQuestion:\nWhat is the average number of Mubi users who love movies directed by Stanley Kubrick?\n\nEvidence:\naverage = AVG(movie_popularity); number of Mubi users who loves the movie refers to movie_popularity;\n\nDatabase Schema:\n`"movies"` (movie_id INTEGER not null primary key, movie_title TEXT, movie_release_year INTEGER, movie_url TEXT, movie_title_language TEXT, movie_popularity INTEGER, movie_image_url TEXT, director_id TEXT, director_name TEXT, director_url TEXT)\n\nReturn only the SQL script enclosed in <answer> tags.', 'role': 'user'}]}
--------------------------------------------------