我有一个很大的表,需要将其导出到文本文件,因此可以将其读取到其他程序(R)中。为了避免编写/读取大文本文件的麻烦,我想通过用连续索引替换所有不同的数值来转换其中一列(所讨论的列是一个ID,它恰好是数字。设置不同的字符)。因此,我想用一个连续的索引替换此ID的所有不同值(将n
不同的值替换1
为n
)。
我的数据如下所示:
ID SOMECHAR SOMECOUNT --------------------------------- 1534561 FISH 2 7537281 CAT 14 4738920 CAT 3 7537281 BIRD 7 7537281 FISH 6 1534561 CAT 10 ... ... ...
我想写一个像这样的表:
ID SOMECHAR SOMECOUNT --------------------------------- 1 FISH 2 2 CAT 14 3 CAT 3 2 BIRD 7 2 FISH 6 1 CAT 10 ... ... ...
1)我可以期望节省大量空间吗?我大约有1亿行和1000万个不同的ID,原始ID为10位数字。
2)如何在SQL或PL / SQL中执行此操作?该REPLACE
命令适用于字符串...我是否需要制作第二张表作为具有每个不同ID和连续索引的查找?如果是,如何获取索引?
You can just use DENSE_RANK
as following:
Here is the demo:
WITH DATAA(ID) AS ( SELECT 123 FROM DUAL UNION ALL SELECT 121 FROM DUAL UNION ALL SELECT 123 FROM DUAL UNION ALL SELECT 121 FROM DUAL UNION ALL SELECT 124 FROM DUAL ) SELECT ID, DENSE_RANK() OVER (ORDER BY ID) FROM DATAA;
Output:
ID DENSE_RANK()OVER(ORDERBYID) ---------- --------------------------- 121 1 121 1 123 2 123 2 124 3
Cheers!!
-- UPDATE --
Example of updating the values:
-- Sample data preparation CREATE TABLE MY_TABLE ( ID NUMBER, SOMECHAR VARCHAR2(100), SOMECOUNT NUMBER ); INSERT INTO MY_TABLE SELECT 1534561 ,'FISH', 2 FROM DUAL UNION ALL SELECT 7537281 ,'CAT', 14 FROM DUAL UNION ALL SELECT 4738920 ,'CAT', 3 FROM DUAL UNION ALL SELECT 7537281 ,'BIRD', 7 FROM DUAL UNION ALL SELECT 7537281 ,'FISH', 6 FROM DUAL UNION ALL SELECT 1534561 ,'CAT', 10 FROM DUAL ;
-- output before --
SELECT * FROM MY_TABLE; ID SOMECHAR SOMECOUNT ---------- ---------- ---------- 1534561 FISH 2 7537281 CAT 14 4738920 CAT 3 7537281 BIRD 7 7537281 FISH 6 1534561 CAT 10
-- Query to modify the ID --
MERGE INTO MY_TABLE M USING (SELECT ROWID, DENSE_RANK() OVER( ORDER BY ID ) MODIFIED_ID FROM MY_TABLE) MODIFIED ON (M.ROWID = MODIFIED.ROWID) WHEN MATCHED THEN UPDATE SET M.ID = MODIFIED.MODIFIED_ID;
-- Output after --
SELECT * FROM MY_TABLE; ID SOMECHAR SOMECOUNT ---------- ---------- ---------- 1 FISH 2 3 CAT 14 2 CAT 3 3 BIRD 7 3 FISH 6 1 CAT 10
Cheers!!